LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 14 Apr 2005 14:18:59 GMT
Reply-To:     alanm <nospam@NOSPAM.NOSPAM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         alanm <nospam@NOSPAM.NOSPAM>
Organization: BigPond Internet Services
Subject:      Re: merging data sets problem (potential proc sql?)
Comments: To: sas-l@uga.edu

"Cornel Lencar" <clencar@INTERCHANGE.UBC.CA> wrote in message news:200504140030.j3E0U0cc003262@listserv.cc.uga.edu... > Hi, > > I am facing a problem that I suspect can be solved with proc sql, > on which I am not proficient. I cannot work the problem in a data > step /or several data step due to the large files that I have to work with. > > Basically I have three files: > 1. a distance file that contains all distances between a set of ambient stations > and a much larger set of postal codes > > PCODE STATION_ID DISTANCE > A0A0A0 E3245 4567.7 > A0A0A0 E4567 34567.9 > A0A0A0 M456 34.7 > B0B0B0 E3245 56987 > B0B0B0 .. .. > > 2. a file with subjects' data > > SUBJ_ID PCODE DOB EOFU > 1 A0A0A0 11/29/1999 12/31/2002 > 2 A0A0A0 10/09/2000 12/31/2002 > 3 B0B0B0 09/27/2000 11/18/2000 > 4 .. .. .. > > 3. A file with daily ambient exposure information, for each ambient station, > between > 1/01/1998 and 12/31/2002. It is important to note that for some days (few) > measurements are missing (not collected or insuficient hourly measurements -no > need to go in detail here). Also, not all ambient stations cover the whole > window interval mentioned. Some started monitoring earlier and closed in 1999 > for instance, or others started in 2000 on. > > STATION_ID DATE CO_LEVEL > E3245 1/01/1998 125.98 > E3245 1/02/1998 132.90 > E3245 1/03/1998 125.98 > E3245 1/04/1998 . > E3245 1/04/1998 117.34 > > So for each station I will have 365*5 observations > > I need to put together these three file something like this: > > SUBJ_ID PCODE DOB EOFU STATION_ID > 1 A0A0A0 11/29/1999 12/31/2002 M456 > > > DISTANCE DATE CO_LEVEL > 34.7 11/29/1999 123.45 > > in such a way that for each individual I would have the corresponding > "valid"=non-missing measurement of CO_LEVEL from the closest > ambient station (if the value for the closest ambient station is > missing, the value for the next closest station should be taken), in the > window interval given by DOB and EOFU. > > Any help or suggestion with this problem would be greately appreciated. > > Sincerely > > Cornel Lencar

In Proc Sql try this:

create index i01_reading on reading (subj_id, date); create index station_id on station (station_id);

create table subj_reading_minco as select su.subj_id, su.pcode, su.dob, su.eofu, st.station_id, st.distance, r.date, r.co_level from subject su, station st, reading r where st.pcode = su.pcode and r.subj_id = su.subj_id and st.station_id = r.station_id and r.date between su.dob and su.eofu and r.co_level is not null and st.distance = (select min(st1.distance) from station st1, reading r1 where r1.subj_id = r.subj_id and r1.date = r.date and r1.co_level is not null and st1.station_id = r1.station_id);

This will only work if the station distances are different for each postcode. If 2 or more are the same, the subselect will return more than one row and will fail.


Back to: Top of message | Previous page | Main SAS-L page