Date: Tue, 6 Dec 2005 17:30:25 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Eliminating rows, based on date values
Content-Type: text/plain; charset="us-ascii"
Tomas:
Good idea to include an example of data and of what you want a program
operating on those data to yield ... Your question also includes a
statement of what you are trying to achieve. Now all you need to do is
to covert that statement of a constraint to a program and check results
of executing the program.
This type of problem has appeared in many forms on SAS-L. A quick check
of SAS-L archives may give a variety of solutions, and, more important,
ideas for extending solutions to this type of problem to related
problems.
I'll suggest a SQL solution because we can declare a solution in a
relatively simple form. A procedural program may eliminate rows 3 to 6
yet not solve your problem except in the specific case of your example
of data. A logical solution will work for any group of ID's and any
cardinal sequence of numbers, such as SAS date values.
Given any one SASValue in a row of your example, how do you know that
you want to exclude it? As you have stated the condition for excluding a
row, it must have a SASValue less than 90 days prior to the SASValue in
the given row. So for a possible exclusion, 'row2', that we can compare
to a different 'row1', we have three conditions to check:
ID[row1]=ID[row2], SASValue[row1]>SASValue[row2], and SASValue[row2]
between SASValue[row1] -89 and SASValue[row1]. (The second Boolean in
the condition guarantees that the values of SASValue come from different
rows.) This program implements that logic:
data test;
input
obs: var_a: Date: $char9. Sasvalue: ;
cards;
1 1 2/apr/02 15432
2 1 12/jun/03 15868
3 1 13/jun/03 15869
4 1 14/jun/03 15870
5 1 15/jun/03 15871
6 1 16/jun/03 15872
7 2 26/mar/02 15425
8 2 11/sep/03 15959
9 3 1/jan/03 15706
10 4 1/jan/03 15706
11 4 1/apr/03 15796
12 4 30/jun/03 15886
;
run;
/*
With var_a being the id for a person. Based on the variable "Date" or
the Sasvalue of that variable, I want to eliminate observations 3 to 6.
I do not want to end up with only one observation per person, I want to
keep the observations between which there are at least 90 days! So, my
end-file has to be the following:
obs var_a Date Sasvalue(date)
1 1 2/apr/02 15432
2 1 12/jun/03 15868
7 2 26/mar/02 15425
8 2 11/sep/03 15959
9 3 1/jan/03 15706
*/
%let dif=89;
proc sql;
create table results as
select *
from test as t1
where not exists(select t2.SASValue
from test as t2
where t1.var_a=t2.var_a and t1.SASValue>t2.SASValue
and t2.SASValue between (t1.SASValue-&dif.) and
t1.SASValue
)
;
quit;
This solution works in general for different required intervals, and
will work with any series of cardinal numbers. The 'correlated subquery'
essentially joins two copies of a dataset on an indexable key. More
critical for database programming, the query has a logical foundation
that lines up with the conditions you stated.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Tomas
Sent: Tuesday, December 06, 2005 10:29 AM
To: sas-l@uga.edu
Subject: Eliminating rows, based on date values
Dear,
I'm trying desperately to eliminate rows from the following type of
file:
obs var_a Date Sasvalue(date)
1 1 2/apr/02 15432
2 1 12/jun/03 15868
3 1 13/jun/03 15869
4 1 14/jun/03 15870
5 1 15/jun/03 15871
6 1 16/jun/03 15872
7 2 26/mar/02 15425
8 2 11/sep/03 15959
9 3 1/jan/03 15706
With var_a being the id for a person. Based on the variable "Date" or
the Sasvalue of that variable, I want to eliminate observations 3 to 6.
I do not want to end up with only one observation per person, I want to
keep the observations between which there are at least 90 days! So, my
end-file has to be the following:
obs var_a Date Sasvalue(date)
1 1 2/apr/02 15432
2 1 12/jun/03 15868
7 2 26/mar/02 15425
8 2 11/sep/03 15959
9 3 1/jan/03 15706
I've tried "retain" statements working on the Sasvalue of the
datevariable, but I haven't been able to compute the correct formula
which eliminates only observations 3 to 6. Or I eliminate to much, or I
eliminate to little. Can someone help me with this problem?
Many thanks,
Tomas Corveleyn.