Date: Fri, 16 Apr 1999 15:53:42 -0600
Reply-To: Aleece Hoffmann <AleeceHoffmann@FIRSTHEALTH.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Aleece Hoffmann <AleeceHoffmann@FIRSTHEALTH.COM>
Subject: Re: Random sample from a large dataset.
Content-Type: text/plain; charset=US-ASCII
Chapter 12 of "SAS Language and Procedures, Usage 2" covers various methods of random sampling (aproximate or exact size, with or without replacement, simple or stratified, etc.) in depth. If you can live with an approximate-sized sample, try using the RANUNI() function in a WHERE statement in your PROC like this:
%LET library = WORK;
%LET dataset = mydata;
%LET sample_n = 50000;
%LET randseed = 12345;
/** Use any positive integer for a repeatable sample, or use zero or negative
integer for a different sample each run (i.e. system time used as seed) **/
/** Use your favorite method to put the total number of obs into a macro variable (called total_n here).**/
/** It could be as simple as a %LET if you already know the value, but I like to use an autocall utility macro (see below). **/
%obscount (method=sysfunc, dsname=&library..&dataset, result=total_n)
/** run your favorite PROC using random sampling **/
PROC MEANS DATA=&library..&dataset;
WHERE RANUNI(&seed) <= &sample_n/&total_n;
TITLE1 "Summary Statistics for Random Sample from &library..&dataset";
RUN;
/** <cut here> **/
/** Put this macro definition in an autocall library or %INCLUDE it **before** calling it **/
%MACRO obscount (method=sysfunc, dsname=_LAST_, result=_nobs_);
%** This is the stripped-down guts of an autocall utility macro I usually use to put the number **;
%** of obs in a dataset into a macro variable using my choice of several methods. **;
%** This piece of the code is based on an example on p. 242 of "SAS Macro Language Reference." **;
%GLOBAL &result;
%IF %UPCASE(&method) = %UPCASE(sysfunc) %THEN %DO;
%LOCAL dsid rc;
%LET dsid = %SYSFUNC(OPEN(&dsname));
%IF &dsid %THEN %DO;
%LET &result = %SYSFUNC(ATTRN(&dsid, NOBS), 32.0);
%LET rc = %SYSFUNC(CLOSE(&dsid));
%END;
%ELSE %PUT Open for dataset &dsname failed - %SYSFUNC(SYSMSG());
%END;
%LET &result = &&&result;
%IF &&&result ~= %THEN %PUT NOTE: Dataset %UPCASE(&dsname) has &&&result observations.;
%MEND obscount;
/** <cut here> **/
Aleece Hoffmann
Programmer Analyst
Information Systems
First Health
750 Riverpoint Drive
West Sacramento, CA 95605
916-374-4808
916-374-4929 fax
AleeceHoffmann@FirstHealth.com
>>> "Towns, Brian" <BTowns@FLEETCC.COM> 04/16/99 12:33:38 >>>
Is there any way to select a random sample of 50,000 obs from a
large table as input into proc means for example.
Thanks in advance,
Brian