Date: Thu, 25 May 2006 22:35:53 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: replicated sampling
In-Reply-To: <200605251910.k4PFlewU007115@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
datamatter@GMAIL.COM wrote:
>Does anyone has experience using replicated sampling under PROC
>SURVEYSELECT? I'd like to know its efficiency if I'm drawing say 500
>samples from a gigantic data set (say tens of millions of records).
>How many times would it loop through the data set?
>
>Thanks
>DM
How many loops through the data set? Umm, maybe 500.
If your data are too large to fit in RAM, then you can't use the SASFILE
statement to speed things up.
So let me ask you a question. What are you doing here? A
bootstrap? A simulation?
The alternative is probably nearly as bad: a large process which
maintains a record of the sampling process for each of your 500
replicates, so that you try each record 500 times and spit out a
copy for every replicate which 'hits' that record. So you only
make one pass through the data. Then, afterward, you sort by
replicate. This is manageable as long as you stick with simple
random sampling or simple random sampling with replacement.
Even if you have to write the code by hand.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement