Date: Thu, 14 Apr 2005 14:18:59 GMT
Reply-To: alanm <nospam@NOSPAM.NOSPAM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: alanm <nospam@NOSPAM.NOSPAM>
Organization: BigPond Internet Services
Subject: Re: merging data sets problem (potential proc sql?)
"Cornel Lencar" <clencar@INTERCHANGE.UBC.CA> wrote in message
news:200504140030.j3E0U0cc003262@listserv.cc.uga.edu...
> Hi,
>
> I am facing a problem that I suspect can be solved with proc sql,
> on which I am not proficient. I cannot work the problem in a data
> step /or several data step due to the large files that I have to work
with.
>
> Basically I have three files:
> 1. a distance file that contains all distances between a set of ambient
stations
> and a much larger set of postal codes
>
> PCODE STATION_ID DISTANCE
> A0A0A0 E3245 4567.7
> A0A0A0 E4567 34567.9
> A0A0A0 M456 34.7
> B0B0B0 E3245 56987
> B0B0B0 .. ..
>
> 2. a file with subjects' data
>
> SUBJ_ID PCODE DOB EOFU
> 1 A0A0A0 11/29/1999 12/31/2002
> 2 A0A0A0 10/09/2000 12/31/2002
> 3 B0B0B0 09/27/2000 11/18/2000
> 4 .. .. ..
>
> 3. A file with daily ambient exposure information, for each ambient
station,
> between
> 1/01/1998 and 12/31/2002. It is important to note that for some days (few)
> measurements are missing (not collected or insuficient hourly
measurements -no
> need to go in detail here). Also, not all ambient stations cover the whole
> window interval mentioned. Some started monitoring earlier and closed in
1999
> for instance, or others started in 2000 on.
>
> STATION_ID DATE CO_LEVEL
> E3245 1/01/1998 125.98
> E3245 1/02/1998 132.90
> E3245 1/03/1998 125.98
> E3245 1/04/1998 .
> E3245 1/04/1998 117.34
>
> So for each station I will have 365*5 observations
>
> I need to put together these three file something like this:
>
> SUBJ_ID PCODE DOB EOFU STATION_ID
> 1 A0A0A0 11/29/1999 12/31/2002 M456
>
>
> DISTANCE DATE CO_LEVEL
> 34.7 11/29/1999 123.45
>
> in such a way that for each individual I would have the corresponding
> "valid"=non-missing measurement of CO_LEVEL from the closest
> ambient station (if the value for the closest ambient station is
> missing, the value for the next closest station should be taken), in the
> window interval given by DOB and EOFU.
>
> Any help or suggestion with this problem would be greately appreciated.
>
> Sincerely
>
> Cornel Lencar
In Proc Sql try this:
create index i01_reading on reading (subj_id, date);
create index station_id on station (station_id);
create table subj_reading_minco as
select su.subj_id, su.pcode, su.dob, su.eofu, st.station_id, st.distance,
r.date, r.co_level
from subject su, station st, reading r
where st.pcode = su.pcode
and r.subj_id = su.subj_id
and st.station_id = r.station_id
and r.date between su.dob and su.eofu
and r.co_level is not null
and st.distance =
(select min(st1.distance)
from station st1, reading r1
where r1.subj_id = r.subj_id
and r1.date = r.date
and r1.co_level is not null
and st1.station_id = r1.station_id);
This will only work if the station distances are different for each
postcode. If 2 or more are the same, the subselect will return more than one
row and will fail.
|