Date: Fri, 13 Jan 2006 14:53:08 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Creating a random sample from a flat file
Content-Type: text/plain; format=flowed
>Does someone know how to create a random sample from a flat file,
>like reading the file and simultaneously outputting a random sample .
>I have never done it ,but I have to do it now because
>the files from which I want to create the random
>samples are in millions and I don't want to load them
>to a dataset and then do the process again.
Well, what do you need to do with the random sample? Does it even need to
be random? Are you looking for test data? If you really need a random
that meets some particular probability requirements, then it might not be
to do it in the data step. (Or maybe I'm just trying to find a way to sneak
PROC SURVEYSELECT when you're not looking. :-)
If all you need is a bunch of records and the exact count doesn't matter,
just add a line like this toward the bottom of your data step:
if ranuni(37474) < 0.02 then output;
and about 2% of your records will get spit out. But this won't give you an
exact count. Just 'about' the proportion you list in the inequality.
If you are concerned about getting test data to ensure that your code
all boundary cases, then you may need some very different approaches to
this. As I said, it depends on what you need the sample for.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
FREE pop-up blocking with the new MSN Toolbar – get it now!