```Date: Sun, 4 Aug 1996 15:41:51 EDT Reply-To: whitloi1@WESTATPO.WESTAT.COM Sender: "SAS(r) Discussion" From: Ian Whitlock Subject: Re: sql help Comments: To: "A.A. McCulloch" Subject: sql help Summary: Find the minimum relevant year not spent in an educational institution. Respondent: Ian Whitlock A.A. McCulloch asks an interesting "sets" type of SQL question. We used the UNION operator and the EXCEPT operator. The example is simple enough to follow yet rich enough to be informative. >I have three data files which contain the dates when people started >and left education of three different types (school, college, >university). I would like to calculate the first year (>16th birth) >when each person was not in any sort of education. I've been trying >to use SQL but don't know where I should start. Below is data giving >16th birthday and start and end dates of the three classes of >education. The first thing to realize is that SQL must have available appropriate years. (When you ask for something that is not an orange you must have in mind a universe of possibilities from which to choose.) I took the simplest route and said the possible years range from 16th birthday plus 1 to 95 since all the dates seemed to end in 1995 or before. (At the risk of complexity one might choose to narrow the list of years.) Here is the code. The code to generate Andrews data is given at the bottom. /* generate list of relevant years */ data years ( keep = id yr ) ; set b16 ; by id ; do yr = birth16 + 1 to 95 ; output ; end ; run ; Now we are prepared to think about SQL. First we want to eliminate years in an institution and then pick the minimum remaining year. How do we get the years in an institution? Match the possible years against records for a type of institution and grab any year within any records limit. Note that the same year may be listed more than once for an institution or even for more than one institution. We don't care. The we take the union of the three types of years. This is the view V created in the first step. Then we eliminate these years from consideration and take the minimum of the remaining to provide an answer when there is one. This is view W created in the second step. Finally we need to do a left join with the IDs from B16 to pick up anyone who had all their years eliminated. (One could of course eliminate this last step by adding enough relevant years to the YEARS data set.) Here is the SQL code. proc sql ; /* list years in any educational institution */ create view v as select id , yr from ( select * from ( select distinct m.id , m.yr from years as m , school where m.id = school.id and m.yr between strt and end ) ) union corresponding ( select * from ( select distinct m.id , m.yr from years as m , college where m.id = college.id and m.yr between strt and end ) ) union corresponding ( select * from ( select distinct m.id , m.yr from years as m , univ where m.id = univ.id and m.yr between strt and end ) ) ; /* get the minimum relevant year not in an educational */ /* institution */ create view w as select id , min ( yr ) as yr from ( select id , yr from years except corresponding select id , yr from v ) group by id order by id ; title "First year after 16th birthday out of educational system" ; select b.id , w.yr from b16 as b left join w on b.id = w.id ; quit ; Originally I tried to intersect the years out of each type of institution. This approach did not work since there may be multiple records for one ID in a type of institution. The code to generate the data sets referred to above is: /* test data */ data b16 ; input id birth16 ; cards ; 11 86 12 84 13 47 14 65 15 90 16 64 17 69 18 72 19 81 20 82 ; data school ; input ID STRT END ; cards ; 11 84 86 11 88 90 12 82 83 13 45 55 14 85 87 15 88 89 16 62 72 16 77 87 17 67 68 18 83 86 ; data college ; input ID STRT END ; cards ; 11 90 91 12 84 85 13 93 93 14 64 65 14 88 90 15 94 95 16 93 94 17 83 85 18 88 90 19 89 92 ; data univ ; input ID STRT END ; cards ; 11 87 93 12 91 94 13 46 47 14 65 69 15 90 93 16 92 94 17 79 83 18 71 74 19 80 83 20 81 84 ; I guess there wasn't a place for this type of question on SAS-L in the "old days" because the then proper response "RTFM" would have been too dull to hold interest. Ian Whitlock ```

Back to: Top of message | Previous page | Main SAS-L page