Date: Sun, 6 Sep 2009 03:43:33 -0500
Reply-To: OR Stats <stats112@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: OR Stats <stats112@GMAIL.COM>
Subject: Re: Fastest Steps for Simulating: Anderson-Darling Goodness of
Fit test for Non-typical distn
In-Reply-To: <6eca73440909050418g10431f71g3541716ddd32a4bf@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
More literally,
NOTE: Argument 4 to function CDF at line 3 column 5 is invalid.
is referring to the function provided
log(cdf('normal',_x&N{i},mu,sd)
The output tables that inserts -1 * sample size for the column of AD or
answer to
AD = -&N - S;
is i n c o r r e c t, where S is probably set to zero b.c. 'argument 4'
of CDF is invalid.
On Sat, Sep 5, 2009 at 6:18 AM, OR Stats <stats112@gmail.com> wrote:
> It now creates the datasets. But the S column is just all zeros and AD
> column is all -samplesize (i.e., -50, -100, -200 etc.)
>
> the error log now is
>
> NOTE: Argument 4 to function CDF at line 3 column 5 is invalid.
>
> NOTE: Argument 4 to function CDF at line 3 column 52 is invalid.
>
> On Sat, Sep 5, 2009 at 12:39 AM, Dale McLerran <stringplayer_2@yahoo.com
> > wrote:
>
>> My mistake. There was a legacy reference to array X
>> from when you had asked first asked how to compute the
>> A-D test for a distribution which you wish to specify.
>> We now have four different arrays of various lengths.
>> The macro should reference the array of the length
>> currently being simulated. In order to reference the
>> correct array, replace the code
>>
>> S + ((2*i - 1)/&N) * (log(cdf('normal',x{i},mu,sd)) +
>> log(1 - cdf('normal',x{&N+1-i},mu,sd)));
>>
>> with
>>
>> S + ((2*i - 1)/&N) * (log(cdf('normal',_x&N{i},mu,sd)) +
>> log(1 - cdf('normal',_x&N{&N+1-i},mu,sd)));
>>
>> Dale
>>
>> ---------------------------------------
>> Dale McLerran
>> Fred Hutchinson Cancer Research Center
>> mailto: dmclerra@NO_SPAMfhcrc.org
>> Ph: (206) 667-2926
>> Fax: (206) 667-5977
>> ---------------------------------------
>>
>>
>> --- On Fri, 9/4/09, OR Stats <stats112@GMAIL.COM> wrote:
>>
>> > From: OR Stats <stats112@GMAIL.COM>
>> > Subject: Re: Fastest Steps for Simulating: Anderson-Darling Goodness of
>> Fit test for Non-typical distn
>> > To: SAS-L@LISTSERV.UGA.EDU
>> > Date: Friday, September 4, 2009, 7:34 PM
>> > cool, good. The undeclared
>> > array is still giving problems
>> >
>> > ERROR: Undeclared array referenced: x.
>> >
>> > ERROR: Variable x has not been declared as an array.
>> >
>> > ERROR: Undeclared array referenced: x.
>> >
>> > ERROR: Variable x has not been declared as an array.
>> >
>> > 1218 %AD(N=100)
>> >
>> >
>> > On Fri, Sep 4, 2009 at 9:28 PM, Data _null_; <iebupdte@gmail.com>
>> > wrote:
>> >
>> > > That is incorrect syntax for an iterative DO.
>> > You need.
>> > >
>> > > do s=5,5.2,5.4;
>> > >
>> > > On 9/4/09, OR Stats <stats112@gmail.com>
>> > wrote:
>> > > > Hmm... still same error
>> > > > 1124 do S=[5 5.2 5.4]; /* This line needs correct
>> > specification */
>> > > > -
>> > > > 386
>> > > > -
>> > > > 200
>> > > >
>> > > > ERROR 386-185: Expecting an arithmetic
>> > expression.
>> > > >
>> > > > ERROR 200-322: The symbol is not recognized and
>> > will be ignored.
>> > > >
>> > > > On Fri, Sep 4, 2009 at 9:15 PM, OR Stats <stats112@gmail.com>
>> > wrote:
>> > > >
>> > > > > Ok. Too much coding on a Friday!
>> > Thx!!
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Sep 4, 2009 at 9:13 PM, Data _null_;
>> > <iebupdte@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > From your original post....
>> > > > > >
>> > > > > >
>> > > > > > > 1 Million times using 50, 100,
>> > 200, and 300 rows of data at each
>> > > > > > > iteration for three different
>> > values of s (s1, s2, s3)?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On 9/4/09, OR Stats <stats112@gmail.com>
>> > wrote:
>> > > > > > > Not sure what S1 S2 and S3 are
>> > referring to?
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Sep 4, 2009 at 8:56 PM,
>> > Data _null_; <iebupdte@gmail.com>
>> > > > wrote:
>> > > > > > > > Did you notice this
>> > comment...
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > /* This line needs
>> > correct specification */
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On 9/4/09, OR Stats <stats112@gmail.com>
>> > wrote:
>> > > > > > > > > I am getting the
>> > following error msg's
>> > > > > > > > >
>> > > > > > > > > do S={S1 S2 S3}; /* This
>> > line needs correct specification */
>> > > > > > > > >
>> > > > > > > > >
>> > -
>> > > > > > > > >
>> > > > > > > > >
>> > 386
>> > > > > > > > >
>> > > > > > > > >
>> > 76
>> > > > > > > > >
>> > > > > > > > >
>> > --
>> > > > > > > > >
>> > > > > > > > >
>> > 202
>> > > > > > > > >
>> > > > > > > > > ERROR 386-185: Expecting
>> > an arithmetic expression.
>> > > > > > > > >
>> > > > > > > > > ERROR 76-322: Syntax
>> > error, statement will be ignored.
>> > > > > > > > >
>> > > > > > > > > ERROR 202-322: The
>> > option or parameter is not recognized and
>> > > will
>> > > > be
>> > > > > > > > > ignored.
>> > > > > > > > > ERROR: Undeclared
>> > array referenced: x.
>> > > > > > > > >
>> > > > > > > > > ERROR: Variable x has
>> > not been declared as an array.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > And what is S for as the
>> > 2nd statement of ranuni(p,S)?
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Sun, Aug 30, 2009 at
>> > 11:19 PM, Dale McLerran
>> > > > > > > <stringplayer_2@yahoo.com>wrote:
>> > > > > > > > >
>> > > > > > > > > > One million
>> > times? Why? I really think that is overkill.
>> > > > > > > > > > I would try to
>> > cover more parameter combinations if it were
>> > > > > > > > > > me.
>> > > > > > > > > >
>> > > > > > > > > > But you should be
>> > able to use a single data step to generate
>> > > > > > > > > > A-D statistics for
>> > all of your parameter combinations. The
>> > > > > > > > > > code below should
>> > be pretty efficient.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > %macro AD(N=);
>> > > > > > > > > > do i=1 to
>> > &N;
>> > > > > > > > > >
>> > /* The next line needs completion with the
>> > appropriate G
>> > > */
>> > > > > > > > > >
>> > _x&N{i} = G(ranuni(6923479,S));
>> > > > > > > > > > end;
>> > > > > > > > > >
>> > > > > > > > > > call sortn(of
>> > _X&N(*));
>> > > > > > > > > > mu = mean(of
>> > x1-x&N);
>> > > > > > > > > > var = var(of
>> > x1-x&N);
>> > > > > > > > > > sd =
>> > sqrt(var);
>> > > > > > > > > > S=0;
>> > > > > > > > > > do i=1 to
>> > &N;
>> > > > > > > > > > S +
>> > ((2*i - 1)/&N) * (log(cdf('normal',x{i},mu,sd)) +
>> > > > > > > > > >
>> > log(1 -
>> > > > cdf('normal',x{&N+1-i},mu,sd)));
>> > > > > > > > > > end;
>> > > > > > > > > > AD = -&N
>> > - S;
>> > > > > > > > > > output
>> > AD_&N;
>> > > > > > > > > > %mend;
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > /* Generate 10000
>> > samples of same size (N=9 in this case)
>> > > > following */
>> > > > > > > > > > /* a normal
>> > distribution and compute AD statistic for each
>> > > > sample. */
>> > > > > > > > > > data AD_50
>> > > > > > > > > >
>> > AD_100
>> > > > > > > > > >
>> > AD_200
>> > > > > > > > > >
>> > AD_300;
>> > > > > > > > > > array _x50
>> > {50} x1-x50;
>> > > > > > > > > > array _x100
>> > {100} x1-x100;
>> > > > > > > > > > array _x200
>> > {200} x1-x200;
>> > > > > > > > > > array _X300
>> > {300} x1-x300;
>> > > > > > > > > > do S={S1 S2
>> > S3}; /* This line
>> > needs correct
>> > > > specification */
>> > > > > > > > > > do
>> > rep=1 to 10000;
>> > > > > > > > > >
>> > %AD(N=50)
>> > > > > > > > > >
>> > %AD(N=100)
>> > > > > > > > > >
>> > %AD(N=200)
>> > > > > > > > > >
>> > %AD(N=300)
>> > > > > > > > > > end;
>> > > > > > > > > > end;
>> > > > > > > > > > keep S AD;
>> > > > > > > > > > run;
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > /* Determine
>> > probability of observed data */
>> > > > > > > > > > /* using simulated
>> > data AD distribution. */
>> > > > > > > > > > proc sort
>> > data=AD_50;
>> > > > > > > > > > by S AD;
>> > > > > > > > > > run;
>> > > > > > > > > >
>> > > > > > > > > > proc sort
>> > data=AD_100;
>> > > > > > > > > > by S AD;
>> > > > > > > > > > run;
>> > > > > > > > > >
>> > > > > > > > > > proc sort
>> > data=AD_200;
>> > > > > > > > > > by S AD;
>> > > > > > > > > > run;
>> > > > > > > > > >
>> > > > > > > > > > proc sort
>> > data=AD_300;
>> > > > > > > > > > by S AD;
>> > > > > > > > > > run;
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > The above is
>> > untested code and should be tested with a
>> > > > > > > > > > small number of
>> > replicates before using it for a final
>> > > > > > > > > > simulation.
>> > Also, there will obviously need to be some
>> > > > > > > > > > final step where
>> > you determine the quantiles of the AD
>> > > > > > > > > > statistics.
>> > > > > > > > > >
>> > > > > > > > > > Dale
>> > > > > > > > > >
>> > > > > > > > > >
>> > ---------------------------------------
>> > > > > > > > > > Dale McLerran
>> > > > > > > > > > Fred Hutchinson
>> > Cancer Research Center
>> > > > > > > > > > mailto: dmclerra@NO_SPAMfhcrc.org
>> > > > > > > > > > Ph: (206)
>> > 667-2926
>> > > > > > > > > > Fax: (206)
>> > 667-5977
>> > > > > > > > > >
>> > ---------------------------------------
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > --- On Sun,
>> > 8/30/09, OR Stats <stats112@GMAIL.COM>
>> > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > From: OR Stats
>> > <stats112@GMAIL.COM>
>> > > > > > > > > > > Subject:
>> > Fastest Steps for Simulating: Anderson-Darling
>> > > > Goodness of
>> > > > > > > Fit
>> > > > > > > > > > test for
>> > Non-typical distn
>> > > > > > > > > > > To: SAS-L@LISTSERV.UGA.EDU
>> > > > > > > > > > > Date: Sunday,
>> > August 30, 2009, 8:14 PM
>> > > > > > > > > > > This is
>> > good. I am ready now to run a large scale
>> > > > simulation.
>> > > > > > > What
>> > > > > > > > > > that
>> > > > > > > > > > > means is that
>> > I want to compute the goodness of fit
>> > > statistic
>> > > > for (M
>> > > > > > > x
>> > > > > > > > > > > S) groups and
>> > n times each group.
>> > > > > > > > > > >
>> > > > > > > > > > > Group defined
>> > by (m,s); S = s1 s2 s3 and M = 50 100 200
>> > > 300.
>> > > > > > > Basically,
>> > > > > > > > > > > M is my
>> > different sample sizes for which I am testing their
>> > > > fit to
>> > > > > > > > > > > function
>> > G(random#,s) (i.e., inverse distribution). I
>> > > would
>> > > > like to
>> > > > > > > run
>> > > > > > > > > > > each group 1
>> > million times. For each s group, by
>> > > generating
>> > > > random
>> > > > > > > > > > > numbers just
>> > by 300 x 1million times, I'll have enough
>> > > > simulated
>> > > > > > > data
>> > > > > > > > > > > y(s) to use
>> > for the largest and smaller sample sizes.
>> > > > > > > > > > >
>> > > > > > > > > > > My final
>> > column space would look like
>> > > > > > > > > > >
>> > i ranuni y_s1=G(ranuni,s1) y_s2=G(ranuni,s2)
>> > > > > > > y_s3=G(ranuni,s3)
>> > > > > > > > > > > 1
>> > > > > > > > > > > .
>> > > > > > > > > > > .
>> > > > > > > > > > > .
>> > > > > > > > > > > m
>> > > > > > > > > > > All rows in
>> > the above table would be used to caculate
>> > > function
>> > > > f_s1,
>> > > > > > > > > > > f_s2, f_s3
>> > (i.e., AD). This last step is repeated 1
>> > > Million
>> > > > times.
>> > > > > > > > > > >
>> > > > > > > > > > > Can we do this
>> > in one to two DATA STEPS? Which syntax
>> > > would
>> > > > be
>> > > > > > > fastest
>> > > > > > > > > > > since we have
>> > to generate 300 Million random numbers, from
>> > > > which we
>> > > > > > > would
>> > > > > > > > > > > split the
>> > sample by 1 Million disjoint sets that we would
>> > > then
>> > > > > > > compute a
>> > > > > > > > > > > statistic 1
>> > Million times using 50, 100, 200, and 300 rows
>> > > of
>> > > > data
>> > > > > > > at
>> > > > > > > > > > > each iteration
>> > for three different values of s (s1, s2,
>> > > s3)?
>> > > > > > > > > > >
>> > > > > > > > > > > Thank Q!
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>
|