LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 22 Dec 2008 14:29:00 -0600
Reply-To:     Gregg Snell <sas-l@DATASAVANTCONSULTING.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Gregg Snell <sas-l@DATASAVANTCONSULTING.COM>
Subject:      Re: More efficient way to sort/subset a large dataset
Comments: To: Ya Huang <ya.huang@AMYLIN.COM>
Content-Type: text/plain; charset=iso-8859-1

Ya,

Here is a hashing solution that took just over 4 seconds to create work.yy from your example. You may run into memory issues when the product of (xx vars)*(xx var lengths)*(unique subset id's) exceeds your system limits, but your example of 53*8*200 was certainly within limits.

data xx0/view=xx0; retain id_count 0; set xx(obs=0); run; data _null_; length id id_count 8; retain id_count 1; declare hash h_sortsub(); rc=h_sortsub.DefineKey('id'); rc=h_sortsub.DefineData('id_count'); rc=h_sortsub.DefineDone(); do while (not eof1); set subset(keep=id) end=eof1; rc=h_sortsub.add(); end; declare hash h_yy(dataset:'xx0'); rc=h_yy.DefineKey('id','id_count'); rc=h_yy.DefineDone(); do while (not eof2); set xx end=eof2; if h_sortsub.find()=0 then do; h_yy.add(); id_count+1; h_sortsub.replace(); end; end; h_yy.output(dataset:'yy'); stop; run;

Merry Christmas!

Gregg Snell

> Hi there, > > Assuming I have a non-sorted large dataset, I want to sort it > by an id var. In the meantime subset it to certain ids. What's the > better way than using where statement in the dataset option for > the subsetting? > > I can't use view. I don't think sort the whole data than merge > with a subset data would do any better. > > Another issue is that the id list (a macro var, see below sample) could > be vary long. When id is string, the macro var may cause overflow > due to memory limit. > > Thanks > > Ya > > My current code is like below: > > libname c "c:\temp"; > > /* create testing data > data c.xx; > array vv(50) v1-v50; > do id=1 to 1000; > do i=1 to int(ranuni(1)*100); > do j=1 to 50; > vv(j)=ranuni(2); > output; > end; > end; > end; > run; > */ > > * create id list; > data subset; > do i=1 to 200; > id=int(ranuni(2)*1000); > output; > end; > run; > > proc sql noprint; > select distinct id into :idlst separated by ',' > from subset > ; > > %put &idlst; > > * sort and subsetting; > proc sort data=c.xx (where=(id in (&idlst))) > out=yy; > by id; > run; > >

-- NeoMail - http://neomail.sourceforge.net


Back to: Top of message | Previous page | Main SAS-L page