LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 22 May 2006 15:53:31 -0400
Reply-To:     Ya Huang <ya.huang@AMYLIN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ya Huang <ya.huang@AMYLIN.COM>
Subject:      Re: Finding observation # associated with a value
Comments: To: David Cassell -PNWSUG- <davidlcassell@MSN.COM>
Content-Type: text/plain; charset=ISO-8859-1

I got the impression that the original poster already know how to use where options to subset the data. He just wanted to know if there is a way to get the firstobs without reading all the data. His assumption is that firstobs= option is more efficient than where= option. To confirm that firstobs method is faster, I did the following simple test:

1 data a; 2 do i=1 to 1000000; 3 output; 4 end; 5 run;

NOTE: The data set WORK.A has 1000000 observations and 1 variables. NOTE: DATA statement used: real time 0.17 seconds cpu time 0.17 seconds

6 7 data c; 8 set a (firstobs=500000); 9 run;

NOTE: There were 500001 observations read from the data set WORK.A. NOTE: The data set WORK.C has 500001 observations and 1 variables. NOTE: DATA statement used: real time 0.20 seconds cpu time 0.18 seconds

10 11 data d; 12 set a (where=(i >=500000)); 13 run;

NOTE: There were 500001 observations read from the data set WORK.A. WHERE i>=500000; NOTE: The data set WORK.D has 500001 observations and 1 variables. NOTE: DATA statement used: real time 0.29 seconds cpu time 0.29 seconds

As you can see, the firstobs= options took 0.18 seconds cpu time, while the where= options took 0.29 seconds.

So it seems that if we can somehow get the firstobs without reading all the data, it can really save some time. I myself don't think it is possible. Anyone not think so?

Ya

On Mon, 22 May 2006 12:10:39 -0700, David L Cassell <davidlcassell@MSN.COM> wrote:

>paulvonhippel@YAHOO.COM wrote: >>I have a large data set that looks something like this: >> >>X Y >>garbage 2 >>garbage 1 >>garbage 3 >>... ... >>usable 4 >>usable 16 >>usable 31 >>... ... >> >>I'd like to know which observation is the first one with X="usable" so >>that I don't have to waste time reading the observations with >>X="garbage". For example, if the first 10,000 lines are garbage, I can >>save time by using the subset mydata (firstobs=10001). >> >>Any suggestions most appreciated. > >One simple approach you might try is the use of the WHERE clause. > > >data new; > set old(where=(X="usable")); > <more code here> > . > . > . > > >This will really help if you have garbage lines interspersed with >usable lines. And it will permit you to cut the data down on >a wide variaety of criteria, so SAS can skip over the dreck. > >HTH, >David >-- >David L. Cassell >mathematical statistician >Design Pathways >3115 NW Norwood Pl. >Corvallis OR 97330 > >_________________________________________________________________ >Don’t just search. Find. Check out the new MSN Search! >http://search.msn.click-url.com/go/onm00200636ave/direct/01/


Back to: Top of message | Previous page | Main SAS-L page