LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1999, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 16 Apr 1999 15:53:42 -0600
Reply-To:     Aleece Hoffmann <AleeceHoffmann@FIRSTHEALTH.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Aleece Hoffmann <AleeceHoffmann@FIRSTHEALTH.COM>
Subject:      Re: Random sample from a  large dataset.
Comments: To: SAS-L@listserv.uga.edu
Content-Type: text/plain; charset=US-ASCII

Chapter 12 of "SAS Language and Procedures, Usage 2" covers various methods of random sampling (aproximate or exact size, with or without replacement, simple or stratified, etc.) in depth. If you can live with an approximate-sized sample, try using the RANUNI() function in a WHERE statement in your PROC like this:

%LET library = WORK; %LET dataset = mydata; %LET sample_n = 50000; %LET randseed = 12345; /** Use any positive integer for a repeatable sample, or use zero or negative integer for a different sample each run (i.e. system time used as seed) **/

/** Use your favorite method to put the total number of obs into a macro variable (called total_n here).**/ /** It could be as simple as a %LET if you already know the value, but I like to use an autocall utility macro (see below). **/ %obscount (method=sysfunc, dsname=&library..&dataset, result=total_n)

/** run your favorite PROC using random sampling **/ PROC MEANS DATA=&library..&dataset; WHERE RANUNI(&seed) <= &sample_n/&total_n; TITLE1 "Summary Statistics for Random Sample from &library..&dataset"; RUN;

/** <cut here> **/ /** Put this macro definition in an autocall library or %INCLUDE it **before** calling it **/ %MACRO obscount (method=sysfunc, dsname=_LAST_, result=_nobs_);

%** This is the stripped-down guts of an autocall utility macro I usually use to put the number **; %** of obs in a dataset into a macro variable using my choice of several methods. **; %** This piece of the code is based on an example on p. 242 of "SAS Macro Language Reference." **;

%GLOBAL &result;

%IF %UPCASE(&method) = %UPCASE(sysfunc) %THEN %DO; %LOCAL dsid rc; %LET dsid = %SYSFUNC(OPEN(&dsname)); %IF &dsid %THEN %DO; %LET &result = %SYSFUNC(ATTRN(&dsid, NOBS), 32.0); %LET rc = %SYSFUNC(CLOSE(&dsid)); %END; %ELSE %PUT Open for dataset &dsname failed - %SYSFUNC(SYSMSG()); %END;

%LET &result = &&&result; %IF &&&result ~= %THEN %PUT NOTE: Dataset %UPCASE(&dsname) has &&&result observations.;

%MEND obscount; /** <cut here> **/

Aleece Hoffmann Programmer Analyst Information Systems First Health 750 Riverpoint Drive West Sacramento, CA 95605 916-374-4808 916-374-4929 fax AleeceHoffmann@FirstHealth.com

>>> "Towns, Brian" <BTowns@FLEETCC.COM> 04/16/99 12:33:38 >>> Is there any way to select a random sample of 50,000 obs from a large table as input into proc means for example.

Thanks in advance,

Brian


Back to: Top of message | Previous page | Main SAS-L page