Date: Fri, 16 Feb 1996 19:43:30 +0000
Reply-To: John Whittington <johnw@MAG-NET.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Whittington <johnw@MAG-NET.CO.UK>
Subject: Re: Multible random function calls
On Fri, 16 Feb 1996, "WEILER, R. BLAKE" <rbw.wilson@MHS.UNC.EDU> rightly
drew our attention to some of the subtleties of the SAS random number
functions, some of which are rather confusing. I thought I would take a few
minutes to expand on these points a little and to indicate the
confusions/frustrations I share with Blake.
In terms of Blake's main point, IMHO the important thing to be aware of is
that, since all random number functions derive from the uniform distribution
(i.e. RANUNI), if there are multiple random number functions (e.g. RANUNI,
RANNOR, RANPOI) within the same DATA step that utilise the same seed, then
the random variables created will be highly dependent upon one another.
However, this problem only occurs if the same seed is used for the different
functions, as in Blake's example:
>data ONE;
> do i=1 to 10000;
> Y = ranpoi(-1, 10);
> x = ranuni(-1);
> if x<.5 then group = 'A';
> else group = 'B';
> output;
> end;
If the seeds of either of the two random functions is changed to *anything*
different from the other, then the two functions then generate random
variables that are essentially independent of one another - such that, in
Blake's example, the means for Group A and group B become the same.
Blake goes on to say:
> The way around this is to use the random number call routines. But
>even here one has to be careful not to use different seed variable for
>the different routines (unless you specify an initial and different seed
>for each rather then using the time of day) or else the same results
>occurs. One must use a single seed variable for all random number call
>routines used within a data step.
As Blake says, the random number CALL routines introduce new confusions. For
I start, I nearly always forget that the 'seed' argument to these calls has
to be a *variable*, not a numeric constant (why on earth is this, I
wonder!). The behaviour that Blake reports is, to my mind, totally bizarre,
almost the opposite of what intuition would suggest. If one uses the same,
explicit, value for two different random number CALLS, for example:
data ONE;
seed1=12345678;
do i=1 to 10000;
call ranpoi(seed1, 10, y);
call ranuni(seed1, x);
if x<.5 then group = 'A';
else group = 'B';
output;
end;
run;
.. then (probably contrary to my expectations) one gets the desired
independence of the two random variables. On the other hand, if one
attempts to do the same using two different seed variable names (but with
the same value), for example:
data ONE;
seed1=12345678; seed2=12345678;
do i=1 to 10000;
call ranpoi(seed1, 10, y);
call ranuni(seed2, x);
if x<.5 then group = 'A';
else group = 'B';
output;
end;
run;
... then we are back to the situation of having the highly NON-independent
results from the two CALL routines that I might have expected. Finally, if
one has different seed variables with different values, then one has
INDEPENDENT random variables again. I am sure that someone will be able to
explain the 'logic' in this behaviour, but it escapes me at present :-)
A related issues arises if one uses the SAME random number function two or
more times in a single DATA step. Then, only the first seed encountered is
of any relevance and the sequence of random numbers returned, at each
invocation of the function, is the same sequence that would have been
generated had their just been one occurrence of the function, with the same
seed. Hence, if the code:
data one;
do i=1 to 6;
x=ranuni(12345678);
output;
end;
run;
resulted in values of x of (for convenience!) 1001, 1002, 1003, 1004, 1005 &
1006, then the code:
data one;
do i=1 to 2;
x=ranuni(12345678);
y=ranuni(anything);
z=ranuni(anythingelse);
output;
end;
run;
would result in assignments:
x y z
1001 1002 1003
1004 1005 1006
regardless of the vales of 'anything' and 'anythingelse'. Use of CALL
RANUNI() etc., with different seeds for each occurrence, allows independent
series of random variables to be created in the same DATA step.
data one;
seed1=12345678; seed2=34567890; seed3=67890123;
do i=1 to 2;
call ranuni(seed1, x);
call ranuni(seed2, y);
call ranuni(seed3, z);
output;
end;
run;
.. in this situation, use of three different seed variables, with different
values seems to behave as one wants. If two or more of the seed values are
identical, then the series of random values generated will be identical for
the variables concerned - again, fairly intuitive.
As a final point, often discussed on SAS-L, I would strongly discourage the
use of time of day seeds or anything like them in situations like this. The
obvious problem is that one does not know what seed(s) have been used, such
that the program can never be run to produce the same results - and, as has
been pointed out before, one is often surprised when a need to 're-run' such
a program arises, even when one thought that such an eventuality would never
arise. It is probably best to always hard-code seeds as numerical constants
(with plenty of digits). If one real must use 'time of day' or whatever,
one should at least capture the resulting seed value in a variable (which is
then used as the seed), and record its value, so that an identical run of
the program could be undertaken subsequently.
.. my two pence/cents worth, anyway!
John
-----------------------------------------------------------
Dr John Whittington, Voice: +44 1296 730225
Mediscience Services Fax: +44 1296 738893
Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk
Buckingham MK18 4EL, UK CompuServe: 100517,3677
-----------------------------------------------------------