Date: Fri, 13 Jan 2006 14:53:08 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Creating a random sample from a flat file
In-Reply-To: <200601132212.k0DKjK37013783@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
k_monal_99@YAHOO.COM wrote:
>Does someone know how to create a random sample from a flat file,
>like reading the file and simultaneously outputting a random sample .
>I have never done it ,but I have to do it now because
>the files from which I want to create the random
>samples are in millions and I don't want to load them
>to a dataset and then do the process again.
Well, what do you need to do with the random sample? Does it even need to
be random? Are you looking for test data? If you really need a random
sample
that meets some particular probability requirements, then it might not be
easy
to do it in the data step. (Or maybe I'm just trying to find a way to sneak
in
PROC SURVEYSELECT when you're not looking. :-)
If all you need is a bunch of records and the exact count doesn't matter,
then
just add a line like this toward the bottom of your data step:
if ranuni(37474) < 0.02 then output;
and about 2% of your records will get spit out. But this won't give you an
exact count. Just 'about' the proportion you list in the inequality.
If you are concerned about getting test data to ensure that your code
handles
all boundary cases, then you may need some very different approaches to
handling
this. As I said, it depends on what you need the sample for.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/