Please indicate the scale of the problem, in particular the number of
stations and the number of distinct postal codes which appear in the
On Wed, 13 Apr 2005 20:30:00 -0400, Cornel Lencar
>I am facing a problem that I suspect can be solved with proc sql,
>on which I am not proficient. I cannot work the problem in a data
>step /or several data step due to the large files that I have to work
>Basically I have three files:
>1. a distance file that contains all distances between a set of ambient
>and a much larger set of postal codes
>PCODE STATION_ID DISTANCE
>A0A0A0 E3245 4567.7
>A0A0A0 E4567 34567.9
>A0A0A0 M456 34.7
>B0B0B0 E3245 56987
>B0B0B0 .. ..
>2. a file with subjects' data
>SUBJ_ID PCODE DOB EOFU
>1 A0A0A0 11/29/1999 12/31/2002
>2 A0A0A0 10/09/2000 12/31/2002
>3 B0B0B0 09/27/2000 11/18/2000
>4 .. .. ..
>3. A file with daily ambient exposure information, for each ambient
>1/01/1998 and 12/31/2002. It is important to note that for some days (few)
>measurements are missing (not collected or insuficient hourly
>need to go in detail here). Also, not all ambient stations cover the whole
>window interval mentioned. Some started monitoring earlier and closed in
>for instance, or others started in 2000 on.
>STATION_ID DATE CO_LEVEL
>E3245 1/01/1998 125.98
>E3245 1/02/1998 132.90
>E3245 1/03/1998 125.98
>E3245 1/04/1998 .
>E3245 1/04/1998 117.34
>So for each station I will have 365*5 observations
>I need to put together these three file something like this:
>SUBJ_ID PCODE DOB EOFU STATION_ID
>1 A0A0A0 11/29/1999 12/31/2002 M456
>DISTANCE DATE CO_LEVEL
>34.7 11/29/1999 123.45
>in such a way that for each individual I would have the corresponding
>"valid"=non-missing measurement of CO_LEVEL from the closest
>ambient station (if the value for the closest ambient station is
>missing, the value for the next closest station should be taken), in the
>window interval given by DOB and EOFU.
>Any help or suggestion with this problem would be greately appreciated.