Date: Thu, 14 Apr 2005 21:57:35 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: merging data sets problem (potential proc sql?)
Please indicate the scale of the problem, in particular the number of
stations and the number of distinct postal codes which appear in the
subjects' file.
On Wed, 13 Apr 2005 20:30:00 -0400, Cornel Lencar
<clencar@INTERCHANGE.UBC.CA> wrote:
>Hi,
>
>I am facing a problem that I suspect can be solved with proc sql,
>on which I am not proficient. I cannot work the problem in a data
>step /or several data step due to the large files that I have to work
with.
>
>Basically I have three files:
>1. a distance file that contains all distances between a set of ambient
stations
>and a much larger set of postal codes
>
>PCODE STATION_ID DISTANCE
>A0A0A0 E3245 4567.7
>A0A0A0 E4567 34567.9
>A0A0A0 M456 34.7
>B0B0B0 E3245 56987
>B0B0B0 .. ..
>
>2. a file with subjects' data
>
>SUBJ_ID PCODE DOB EOFU
>1 A0A0A0 11/29/1999 12/31/2002
>2 A0A0A0 10/09/2000 12/31/2002
>3 B0B0B0 09/27/2000 11/18/2000
>4 .. .. ..
>
>3. A file with daily ambient exposure information, for each ambient
station,
>between
>1/01/1998 and 12/31/2002. It is important to note that for some days (few)
>measurements are missing (not collected or insuficient hourly
measurements -no
>need to go in detail here). Also, not all ambient stations cover the whole
>window interval mentioned. Some started monitoring earlier and closed in
1999
>for instance, or others started in 2000 on.
>
>STATION_ID DATE CO_LEVEL
>E3245 1/01/1998 125.98
>E3245 1/02/1998 132.90
>E3245 1/03/1998 125.98
>E3245 1/04/1998 .
>E3245 1/04/1998 117.34
>
>So for each station I will have 365*5 observations
>
>I need to put together these three file something like this:
>
>SUBJ_ID PCODE DOB EOFU STATION_ID
>1 A0A0A0 11/29/1999 12/31/2002 M456
>
>
>DISTANCE DATE CO_LEVEL
>34.7 11/29/1999 123.45
>
>in such a way that for each individual I would have the corresponding
>"valid"=non-missing measurement of CO_LEVEL from the closest
>ambient station (if the value for the closest ambient station is
>missing, the value for the next closest station should be taken), in the
>window interval given by DOB and EOFU.
>
>Any help or suggestion with this problem would be greately appreciated.
>
>Sincerely
>
>Cornel Lencar
|