Date: Mon, 7 Jul 2008 18:03:33 -0400
Reply-To: Bucher Scott <SBucher@SCHOOLS.NYC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Bucher Scott <SBucher@SCHOOLS.NYC.GOV>
Subject: Re: Sample 26140: Creating a new data set for each BY-Group in a
data set
In-Reply-To: A<482249F865060740AE33815802042D2F8E8480@LTA3VS012.ees.hhs.gov>
Content-Type: text/plain; charset="US-ASCII"
May I ask what is the cause of the '(soon to be formerly)'? Or perhaps I
am reading too much into it?
Regards,
Scott Bucher
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Fehd, Ronald J. (CDC/CCHIS/NCPHI)
Sent: Monday, July 07, 2008 5:49 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Sample 26140: Creating a new data set for each BY-Group in
a data set
This is from my R&D on sorting.
One of the factoids I realized was that for some applications,
generating a report from each subset with a (where = (Var = "&Value."))
clause requires N-reports passes thru the data.
We always call to Q those who ask how to break a data set into subsets,
but this seems to me to be a legitimate task where the data set is
large.
This routine requires you to
* allocate fileref in autoexec
* prepare a list processing data set
* prepare a program which calls the routine
http://tinyurl.com/5abz5h
http://www.sascommunity.org/wiki/Split_Data_into_Subsets
parameters allow you to
* use default data naming convention: _1 _2 ... _N
* provide either character or numeric variable
whose values are used to name subsets:
* char: _A _B ... _Z
* num: _38 _19 _6
for instance, using an output data set from proc freq order = frequency
if you use Count as the RowId
then the make of if statements is optimized
i.e.:
if <most frequently occuring value> then output _<MaxCount>; else if
<next frequently occuring value> then output _<Max-minus-1>; etc.
Ron Fehd the list processing w/call execute
or parameterized includes
or (soon to be formerly) macro maven CDC Atlanta GA USA RJF2
at cdc dot gov