LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 1999, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 10 Dec 1999 08:27:33 -0800
Reply-To:     "Lund, Pete" <Peter.Lund@CFC.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Lund, Pete" <Peter.Lund@CFC.WA.GOV>
Subject:      Re: a random sample. I published 2 macro program ...
Content-Type: text/plain; charset="windows-1252"

Ian brings up a good point that's been mentioned a few times on SAS-L over the years. The pitfalls of creating variables in macro code that may collide with variables in the incoming dataset(s) [Ian's point #3]. In this case, the variable X is not the only variable that causes problems: V, J, I and NLOBS cannot be on the incoming dataset if you expect predictable results. The convention of using variable names with leading underscores (i.e., _X) that are dropped in the macro code can solve many of these problems [Ian's point #2].

Also, this is a good example of the effects of a non-zero seed to RANUNI() [Ian's point #1].

Thought this was timely as we'd just had some discussion of coding standards.

Pete Lund WA State Caseload Forecast Council (360) 902-0086 voice (360) 902-0084 fax peter.lund@cfc.wa.gov

-----Original Message----- From: WHITLOI1 [mailto:WHITLOI1@WESTAT.COM] Sent: Friday, December 10, 1999 6:21 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: a random sample. I published 2 macro program ...

Subject: Re: a random sample. I published 2 macro program ... Summary: Problems with the code. Respondent: Ian Whitlock <whitloi1@westat.com>

Renaud Harduin <r.harduin@ABS-TECHNOLOGIES.COM> offered two programs on a popular subject - drawing random samples. He wrote

> Go to the www.SAShelp.com web site, I published 2 macro program : > > %ECH_SPLE : simple random sample (optimized in I/O, MEM and CPU) > with distinct observation ==> Efficency > %ECH_ALEA : Make a stratified random sample but requires more I/O > and CPU

I looked at the first program and found the following problems:

1) For any two "random" samples from a given data set generated by this program, the larger sample will contain the smaller sample. For example the code,

data w ; do s = 1 to 100 ; output ; end ; run ;

%ech_sple ( data = w , out = s10 , size = 10 ) %ech_sple ( data = w , out = s23 , size = 23 )

proc compare data = s10 compare = s23 ( obs = 10 ) ; run ;

produced a report with no differences found.

2) The variables I, J, and DSID are on the output sample.

3) The variable X cannot be on the input data set.

4) The last record can never be in the sample.

5) The probability of choosing the 0th obs (there isn't any) is 1/sample_size.

6) The number of logical obs is referenced but the program can produce incorrect result for every logically missing observation.

7) Duplicate choices must be eliminated in a subsequent step.

8) On efficiency - a nonworking linear search was used.

I didn't look at the second macro.

The site itself is impressive although I did get a glimmer of why the SAS Institute objects to sites using the SAS name. It is unfortunate that the quality of the programs is not monitored. This does not mean the other 93 tip/programs have the same quality, I didn't look at them.

I can go along with the SAS-L rational that discussion must be free and open, hence code posted need not work. In this context the reader has a clear warning. But I find it frightening, to see a professional looking web site without any monitoring of the quality of posted programs.

Ian Whitlock


Back to: Top of message | Previous page | Main SAS-L page