Date: Thu, 3 Jul 2008 10:03:39 -0500
Reply-To: "Raynaud, Scott" <Scott.Raynaud@TARGETBASE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Raynaud, Scott" <Scott.Raynaud@TARGETBASE.COM>
Subject: Re: sort
In-Reply-To: <482249F865060740AE33815802042D2F8E83F9@LTA3VS012.ees.hhs.gov>
Content-Type: text/plain; charset="us-ascii"
I looked at some stuff on indexes. My data look like this:
Name account_number day transaction_number other data(sales,
gross margin, ext)
First I rollup on transaction number getting sums of the other data.
Then, rollup by name, account_number and day (the combination of name
and account_number identifies a person and persons might have purchases
on different days), again getting sums along the way. This strategy
requires separate sorts for each rollup. The final file would look
something like this:
Name account_number sum_of_sales sum_of_gross_margin.....
Indexing was proposed as a method along the path to a solution to this
problem. One way to index would be to use proc datasets to set up a
composite index using transaction number, name, account number and day,
correct? The index statement would be:
index create x=(transaction_number name account_number day)
To do the first rollup, my standard code looks like this:
Data
By transaction_number notsorted;
If first.transaction_number then do;
SAS statements
End;
SAS statements
If last.transaction_number;
SAS statements;
End;
Run;
All this being said, I'm not sure where indexing gets me. Doesn't the
file still have to be sorted to use the by statement in the rollup? I
need to avoid sorting. Can someone enlighten me?
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Fehd, Ronald J. (CDC/CCHIS/NCPHI)
Sent: Monday, June 30, 2008 4:41 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: sort
> From: Scott Raynaud
>
> >Looked at some prior posts on this subject but I'm still a little
> >unclear. I run v 9.1.2 on a Unix box (don't have SAS
> Connect). I need
> to
> >rollup a transactional file of several hundred million records. I'm
> >afraid the sort prior to the rollup will fail due to space and time
> >expense. Any ideas on the best alternative? I'm leaning toward
> splitting
> >the file into several smaller ones like this one I saw on
> the listserv:
Based on my recent experience of combining several years of data
I suggest that you index the data.
The following posts contain code and times for:
algorithm: sort, and set
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0806D&L=sas-l&P=R31423
algorithm: append and index
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0806D&L=sas-l&P=R31529
Howard suggested that I do a view of the four years of data I have
and sort the view.
I have that on my todo list and will report the times when I get it
done.
untested:
data Work.Four_Years /view;
do until(EndoFile);
set Library.Year2002
Library.Year2003
Library.Year2004
Library.Year2005
end = EndoFile;
output;
end;
stop;
Proc Sort data = Work.Four_Years
out = Library.The_Four_Years;
by <the ID list>;
Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
[ALERT] -- Access Manager: This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mail or the information herein by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is prohibited. If you have received this e-mail in error, please immediately notify us by calling our North American Help Desk at (972)506-3939.
Targetbase Messaging Services provided by DMSP