Date: Mon, 27 Jun 2005 23:59:44 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Optimization question
In-Reply-To: <200506272247.j5RMl1xg024859@listserv.cc.uga.edu>
Content-Type: text/plain; format=flowed
Mike,
Proc format ;
value Subset
1 = '1'
2 = '2'
3 = '3'
4 = '4'
5 = '5'
Other = 'Other'
;
run ;
data huge ;
do permno = 1 to 100 ;
output ;
end ;
run ;
Data Subset ;
set huge ;
where (put(permno,Subset.) ne 'Other' ) ;
run ;
proc print ;
run ;
Toby Dunn
From: Michael Murff <mjm33@MSM1.BYU.EDU>
Reply-To: Michael Murff <mjm33@MSM1.BYU.EDU>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Optimization question
Date: Mon, 27 Jun 2005 16:46:47 -0600
Several persons of great wisdom have suggested that a format might be
profitably used for table lookup in huge. Would someone be so kind as to
demonstrate this technique for this situation? I think I could create a
custom format on my list of 2k ids, but it's very unclear to me how this can
be used when the format in Huge is different. Wouldn't the format have to be
applied after the preprocessing where statement in the dstep?
Thanks all,
MM
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dennis
Diskin
Sent: Monday, June 27, 2005 12:52 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Optimization question
Roy,
I would personnaly prefer to put a long list of selections into a format.
Formats are pretty well optimized and can be used in where statements.
The BY statement and associated first.s will cetainly add some processing
time and at least in the code you've shown, accomplishes nothing since
var1_i and var2_i are not kept in the dataset.
HTH,
Dennis Diskin
On 6/27/05, Pardee, Roy <pardee.r@ghc.org> wrote:
>
> I eagerly await the responses to this question--I've got a very similar
> set of sql queries I'd like to optimize.
>
> In the meantime--is your dset indexed & if not, is creating one
> feasible? If so, what do you have indexed?
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Michael Murff
> Sent: Monday, June 27, 2005 11:25 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Optimization question
>
>
> Hi SAS-L,
>
>
>
> I'm accessing a very large dataset (6 gigs) with the following code:
>
>
>
> data subset;
>
> set huge(keep=date id var1-var5);
>
> where "01Jan1970"d <= date <= "31DEC2003"d;
>
> by id date;
>
> year=year(date);
>
> if permno in(&list);
>
> if first.date then
>
> do;
>
> var1_l = lag(var1);
>
> var2_l = lag(var2);
>
> end;
>
> if first.id <http://first.id> then
>
> do;
>
> var1_l = .;
>
> var2_l = .;
>
> end;
>
> run;
>
>
>
> &list contains a list of 2000 ids (sorted) that I care about. Each id
> will have a daily entry between the given dates. Huge dataset is already
> sorted by ID and DATE. I need a more efficient way to run this datastep
> as it takes several hours on our server. I have access to 8.2 and 9.1.3
> SAS versions in Unix environments.
>
>
>
> I tried putting &list in a compound where statement but I reach the 8.2
> where byte limit discussed recently on the -l (haven't tried this on
> 9.1.3 yet). Does the by statement slow this down? And what about the
> subsetting if statement. The final dataset "subset" should a few hundred
> MBs. I can write a gig with our SCSI drives in about 15 minutes? so it
> seems like this little dstep could be written to go faster.
>
>
>
> Thanks,
>
>
>
> Michael Murff
>
> Provo, UT
>
|