```Date: Fri, 16 Feb 1996 19:43:30 +0000 Reply-To: John Whittington Sender: "SAS(r) Discussion" From: John Whittington Subject: Re: Multible random function calls Comments: To: "WEILER, R. BLAKE" On Fri, 16 Feb 1996, "WEILER, R. BLAKE" rightly drew our attention to some of the subtleties of the SAS random number functions, some of which are rather confusing. I thought I would take a few minutes to expand on these points a little and to indicate the confusions/frustrations I share with Blake. In terms of Blake's main point, IMHO the important thing to be aware of is that, since all random number functions derive from the uniform distribution (i.e. RANUNI), if there are multiple random number functions (e.g. RANUNI, RANNOR, RANPOI) within the same DATA step that utilise the same seed, then the random variables created will be highly dependent upon one another. However, this problem only occurs if the same seed is used for the different functions, as in Blake's example: >data ONE; > do i=1 to 10000; > Y = ranpoi(-1, 10); > x = ranuni(-1); > if x<.5 then group = 'A'; > else group = 'B'; > output; > end; If the seeds of either of the two random functions is changed to *anything* different from the other, then the two functions then generate random variables that are essentially independent of one another - such that, in Blake's example, the means for Group A and group B become the same. Blake goes on to say: > The way around this is to use the random number call routines. But >even here one has to be careful not to use different seed variable for >the different routines (unless you specify an initial and different seed >for each rather then using the time of day) or else the same results >occurs. One must use a single seed variable for all random number call >routines used within a data step. As Blake says, the random number CALL routines introduce new confusions. For I start, I nearly always forget that the 'seed' argument to these calls has to be a *variable*, not a numeric constant (why on earth is this, I wonder!). The behaviour that Blake reports is, to my mind, totally bizarre, almost the opposite of what intuition would suggest. If one uses the same, explicit, value for two different random number CALLS, for example: data ONE; seed1=12345678; do i=1 to 10000; call ranpoi(seed1, 10, y); call ranuni(seed1, x); if x<.5 then group = 'A'; else group = 'B'; output; end; run; .. then (probably contrary to my expectations) one gets the desired independence of the two random variables. On the other hand, if one attempts to do the same using two different seed variable names (but with the same value), for example: data ONE; seed1=12345678; seed2=12345678; do i=1 to 10000; call ranpoi(seed1, 10, y); call ranuni(seed2, x); if x<.5 then group = 'A'; else group = 'B'; output; end; run; ... then we are back to the situation of having the highly NON-independent results from the two CALL routines that I might have expected. Finally, if one has different seed variables with different values, then one has INDEPENDENT random variables again. I am sure that someone will be able to explain the 'logic' in this behaviour, but it escapes me at present :-) A related issues arises if one uses the SAME random number function two or more times in a single DATA step. Then, only the first seed encountered is of any relevance and the sequence of random numbers returned, at each invocation of the function, is the same sequence that would have been generated had their just been one occurrence of the function, with the same seed. Hence, if the code: data one; do i=1 to 6; x=ranuni(12345678); output; end; run; resulted in values of x of (for convenience!) 1001, 1002, 1003, 1004, 1005 & 1006, then the code: data one; do i=1 to 2; x=ranuni(12345678); y=ranuni(anything); z=ranuni(anythingelse); output; end; run; would result in assignments: x y z 1001 1002 1003 1004 1005 1006 regardless of the vales of 'anything' and 'anythingelse'. Use of CALL RANUNI() etc., with different seeds for each occurrence, allows independent series of random variables to be created in the same DATA step. data one; seed1=12345678; seed2=34567890; seed3=67890123; do i=1 to 2; call ranuni(seed1, x); call ranuni(seed2, y); call ranuni(seed3, z); output; end; run; .. in this situation, use of three different seed variables, with different values seems to behave as one wants. If two or more of the seed values are identical, then the series of random values generated will be identical for the variables concerned - again, fairly intuitive. As a final point, often discussed on SAS-L, I would strongly discourage the use of time of day seeds or anything like them in situations like this. The obvious problem is that one does not know what seed(s) have been used, such that the program can never be run to produce the same results - and, as has been pointed out before, one is often surprised when a need to 're-run' such a program arises, even when one thought that such an eventuality would never arise. It is probably best to always hard-code seeds as numerical constants (with plenty of digits). If one real must use 'time of day' or whatever, one should at least capture the resulting seed value in a variable (which is then used as the seed), and record its value, so that an identical run of the program could be undertaken subsequently. .. my two pence/cents worth, anyway! John ----------------------------------------------------------- Dr John Whittington, Voice: +44 1296 730225 Mediscience Services Fax: +44 1296 738893 Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk Buckingham MK18 4EL, UK CompuServe: 100517,3677 ----------------------------------------------------------- ```

Back to: Top of message | Previous page | Main SAS-L page