Date: Wed, 19 Oct 2011 11:08:46 -0400
Reply-To: Jack Clark <jclark@HILLTOP.UMBC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Clark <jclark@HILLTOP.UMBC.EDU>
Subject: Re: PROC SQL Question
In-Reply-To: A<201110191344.p9JAoF92019182@willow.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Ross,
The approach below can work - although you have to consider the size of
the NEW data set, as it could get large depending on your data and data
spans.
You basically write out an observation for each date in a span -
containing only the Person ID information and that date. When spans
overlap, you will have multiple records for the same person with the
same date. Then use SQL to select each person and the number of unique
date values they have.
* create sample data ;
data old;
input person $ date1 mmddyy10. date2 mmddyy10.;
format date1 date2 mmddyy10.;
datalines;
Person1 1/1/2009 1/1/2010
Person1 1/1/2009 6/30/2009
Person2 1/1/2009 1/31/2009
Person2 3/1/2009 3/31/2009
;
run;
* write out 1 obs for each date in span ;
data new (keep=person employed);
set old;
do employed = date1 to date2;
output;
end;
run;
* count distinct values of dates for each person ;
proc sql;
create table blank as
select person, count(distinct employed) as total_time
from new
group by person
;
quit;
proc print data = blank;
run;
Jack
Jack Clark
Senior Programmer
phone: 410-455-6256
fax: 410-455-6850
jclark@hilltop.umbc.edu
University of Maryland, Baltimore County
Sondheim Hall, 3rd Floor
1000 Hilltop Circle
Baltimore, MD 21250
Please consider the environment before printing this e-mail and/or any attachments.
Confidentiality Notice: This e-mail may contain information that is legally privileged and that is intended only for the use of the addressee(s) named above. If you are not the intended recipient, you are hereby notified that any disclosure, copying of this e-mail, distribution, or action taken in reliance on the contents of this e-mail and/or documents attributed to this e-mail is strictly prohibited. If you have received this information in error, please notify the sender immediately by phone and delete this entire e-mail. Thank you.-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Ross Clark
Sent: Wednesday, October 19, 2011 9:44 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: PROC SQL Question
Hello:
I'm attempting to summarize transactional data over several people and
their
previous employment dates, date1 represents Starting Date and Date2
Ending
date at given employer:
person1 Date1 Date2
person1 Date1 Date2
.
.
.
.
I select the following:
proc sql;
create table as blank as select
sum(date2-date1) as total_time
from table
group by person1;
quit;
I essentially want to find the total working time spent for each person
.
However, many of the dates overlap.
IE: Person1 employed in two jobs at the same time:
Person1 1/1/2009 1/1/2010
Person1 1/1/2009 6/30/2009
If I use this aggregate function it will give me 1.5 years roughly as
the
time the person worked. When really I care about just the time spent
employed.
Is there an elegant solution out there to create this field?