| Date: | Tue, 20 Feb 2001 09:04:05 -0500 |
| Reply-To: | Nigel Tufnel <dousk8@HOTMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Nigel Tufnel <dousk8@HOTMAIL.COM> |
| Subject: | Wanted: One pass summary solution |
| Content-Type: | text/plain; format=flowed |
|---|
I have a data set that I want to summarize by day. I use two primary
columns "startdate" and "completedate". I want the count the number of
starts and completes by day. To do this I summarize the data set twice
(once by startdate and again by completedate) and merge the two.
This data set has millions of records and a few hundred other variables. Is
there a way to get the desired results in one pass?
The data set looks like (among other columns):
startdate completedate
1/1/01 1/1/01
1/1/01 1/1/01
1/1/01 1/3/01
1/2/01 1/2/01
1/2/01 1/3/01
1/2/01 1/4/01
1/2/01 1/4/01
1/3/01 1/3/01
1/3/01 1/3/01
1/3/01 1/4/01
1/3/01 1/4/01
1/3/01 1/5/01
The desired result is:
date starts completes
1/1/01 3 2
1/2/01 4 1
1/3/01 5 4
1/4/01 0 4
1/5/01 0 1
Like I mentioned, the two pass solution is:
proc means data = source nway noprint missing;
class startdate;
var _whatever_;
output out=starts(rename=(_freq_ = starts
startdate = date))
sum=;
run;
proc means data = source nway noprint missing;
class completedate;
var _whatever_;
output out=completes(rename=(_freq_ = completes
completedate = date))
sum=;
run;
data summ;
merge startdate
completedate
;
by date;
run;
A one pass solution here could save a few hours. Any Ideas?
Thanks,
Nigel
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
|