LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 3 Jul 2008 10:03:39 -0500
Reply-To:     "Raynaud, Scott" <Scott.Raynaud@TARGETBASE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Raynaud, Scott" <Scott.Raynaud@TARGETBASE.COM>
Subject:      Re: sort
Comments: To: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
In-Reply-To:  <482249F865060740AE33815802042D2F8E83F9@LTA3VS012.ees.hhs.gov>
Content-Type: text/plain; charset="us-ascii"

I looked at some stuff on indexes. My data look like this:

Name account_number day transaction_number other data(sales, gross margin, ext)

First I rollup on transaction number getting sums of the other data. Then, rollup by name, account_number and day (the combination of name and account_number identifies a person and persons might have purchases on different days), again getting sums along the way. This strategy requires separate sorts for each rollup. The final file would look something like this:

Name account_number sum_of_sales sum_of_gross_margin.....

Indexing was proposed as a method along the path to a solution to this problem. One way to index would be to use proc datasets to set up a composite index using transaction number, name, account number and day, correct? The index statement would be:

index create x=(transaction_number name account_number day)

To do the first rollup, my standard code looks like this:

Data By transaction_number notsorted; If first.transaction_number then do; SAS statements End; SAS statements If last.transaction_number; SAS statements; End; Run;

All this being said, I'm not sure where indexing gets me. Doesn't the file still have to be sorted to use the by statement in the rollup? I need to avoid sorting. Can someone enlighten me?

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Fehd, Ronald J. (CDC/CCHIS/NCPHI) Sent: Monday, June 30, 2008 4:41 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: sort

> From: Scott Raynaud > > >Looked at some prior posts on this subject but I'm still a little > >unclear. I run v 9.1.2 on a Unix box (don't have SAS > Connect). I need > to > >rollup a transactional file of several hundred million records. I'm > >afraid the sort prior to the rollup will fail due to space and time > >expense. Any ideas on the best alternative? I'm leaning toward > splitting > >the file into several smaller ones like this one I saw on > the listserv:

Based on my recent experience of combining several years of data I suggest that you index the data.

The following posts contain code and times for:

algorithm: sort, and set

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0806D&L=sas-l&P=R31423

algorithm: append and index

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0806D&L=sas-l&P=R31529

Howard suggested that I do a view of the four years of data I have and sort the view.

I have that on my todo list and will report the times when I get it done.

untested:

data Work.Four_Years /view; do until(EndoFile); set Library.Year2002 Library.Year2003 Library.Year2004 Library.Year2005 end = EndoFile; output; end; stop;

Proc Sort data = Work.Four_Years out = Library.The_Four_Years; by <the ID list>;

Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov [ALERT] -- Access Manager: This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Dissemination, distribution or copying of this e-mail or the information herein by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is prohibited. If you have received this e-mail in error, please immediately notify us by calling our North American Help Desk at (972)506-3939. Targetbase Messaging Services provided by DMSP


Back to: Top of message | Previous page | Main SAS-L page