Date: Tue, 24 May 2005 20:21:43 0700
ReplyTo: shiling99@YAHOO.COM
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: shiling99@YAHOO.COM
Organization: http://groups.google.com
Subject: Re: proc mixed model with ar(1) processes inter and intra blocks
InReplyTo: <OF2618244F.73E2D889ON8825700B.008372B18825700C.00014136@epamail.epa.gov>
ContentType: text/plain; charset="iso88591"
David & Dale,
Thanks for all comments and suggestions.
Here is the link for ar(1) process
http://www1.elsevier.com/hes/books/02/04/050/0204050.htm
and sas online help for sas code.
Example 20.16: AR(1) Process in ETS.
In term of small sample, I said in last post "I agree that there is
difference when sample size is small, the initial value becomes not
trivial."
Using proc arima just want to understand what is the same and what is
different compared with proc mixed. That is the way I do crosscheck
and help me to understand. David's comments about high order arima and
complex time series model seems irrelevant, but always welcome.
"David L. Cassell" wrote:
> Dale McLerran <stringplayer_2@yahoo.com> replied:
> > I still must disagree. If you were to allow a burnin of, say,
> > 50 iterations before you start outputting u or e values, then
> > your approach would generate data which follow an AR(1) process.
> > Note that your code produces
> >
> > u0 ~ N(0,1),
> >
> > /* First iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2)
> > output u1
> > assign u0=u1
> >
> > /* Second iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2+rho**4)
> > output u1
> > assign u0=u1
> >
> > /* Third iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2+rho**4+rho**6)
> > output u1
> > assign u0=u1
> >
> > ...
> >
> > /* Kth iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2+rho**4+rho**6+...+rho**(2K))
> > output u1
> > assign u0=u1
> >
> >
> > The difference between variance of u1 at iteration m and variance
> > of u1 at iteration m+1 is
> >
> > V(u1{m+1})  V(u1{m}) = rho**(2*(m+1))
> >
> >
> > For m sufficiently large, the difference in variances does become
> > negligible. However, for the first two iterations (m=1), the
> > difference in variance is rho**4. That is not small unless
> > rho is near zero. The following code shows the variance function
> > by iteration for rho=0.7.
> >
> > data var;
> > rho=0.7;
> > V=1;
> > do iter=1 to 50;
> > V = 1 + (rho**2)*V;
> > output;
> > end;
> > run;
> >
> > proc print data=var;
> > by rho;
> > id rho;
> > var iter V;
> > format V 10.8;
> > run;
> >
> >
> > The variance approaches a stable value by about the 5th
> > iteration. However, for iterations 1 through 3, the variance
> > is quite far from the stable value.
> >
> > Now, to demonstrate the validity of your AR(1) generation process
> > you use the code
> >
> > data test;
> > seed=1234579; u0 = rannor(seed);
> > do iter=1 to 100000;
> > /* u{t} = rho*u{t1} + err */
> > u1 = 0.7*u0 + rannor(seed);
> > output;
> > u0=u1;
> > end;
> > run;
> > title "Covariance matrix: errors constructed as u{t}=rho*u{t1} +
> > err";
> >
> > proc corr data=test cov;
> > var u:;
> > run;
> >
> >
> > This is a very long process, and certainly the variance is stable
> > for most of the 100000 iterations. If you were using a long
> > sequence like this, then I would have little concern with assuming
> > that ALL the data follow the same AR(1) process. However, you have
> > relatively short processes (20 or 40 iterations). The first few
> > terms of a short process could have strong effect on your parameter
> > estimates.
> >
> > Note that the simulation which I previously posted shows the
> > variance structure generated by your data generation process
> > through five iterations. That is, the generation model
> >
> > u0 = rannor(seed);
> > u1 = 0.7*u0 + rannor(seed);
> > u2 = 0.7*u1 + rannor(seed);
> > u3 = 0.7*u2 + rannor(seed);
> > u4 = 0.7*u3 + rannor(seed);
> > u5 = 0.7*u4 + rannor(seed);
> >
> > is exactly the same as
> >
> > u0~N(0,1)
> >
> > /* First iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2)
> > output u1
> > assign u0=u1
> >
> > /* Second iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2+rho**4)
> > output u1
> > assign u0=u1
> >
> > ...
> >
> > /* Fifth iteration */
> > u1=rho*u0 + err
> > err~N(0,1)
> > u1~N(0,1+rho**2+rho**4+rho**6+rho**8+rho**10)
> > output u1
> > assign u0=u1
> >
> >
> > It most certainly does demonstrate that for the first few
> > iterations, your process is NOT AR(1). But the alternative
> > process in which we have u{i}=rho*u{i1} + sqrt(1rho*rho)*err
> > DOES FOLLOW THE AR(1) PROCESS FROM ITERATION 1.
> >
> > IF YOU WANT YOUR DATA GENERATION MODEL TO BE AR(1) FOR ALL
> > OBSERVATIONS, THEN YOU NEED TO MODIFY YOUR CODE. As I stated
> > previously, one way to modify your code so that all the data
> > follow an AR(1) process is to allow a burnin period of some M
> > iterations before you start accepting data as following the
> > AR(1) process. Better yet is to use
> >
> > u0 = err{0};
> > do iter=1 to n;
> > u1 = rho*u0 + (1rho*rho)*err{iter};
> > output;
> > u0=u1;
> > end;
> >
> > where err{j}~N(0,V). This follows the AR(1) process from the
> > very first iteration.
>
> I'm going to side with Dale on this one. Mostly. :)
>
> I agree with Dale that if you want the generated stream to be correct
> from the first iteration, you need to do this:
>
>
> u0 = rannor(seed);
> do i=1 to k;
> u1 = rho*u0 + sqrt(1rho*rho)*rannor(seed);
> output;
> u0 = u1;
> end;
>
>
> This is the formulation that I learned in intro time series courses,
> back when dinosaurs still ruled the earth. Shiling is writing the
> time series in a traditional manner, using the duality of AR processes
> as infiniteorder MA processes. But that's not particularly efficient
> for process generation. It *is* really useful once you start using
> the backshift operator and you want to do a little theory. I think
> that Dale's formulation will make the generation process a lot easier
> once Shiling moves on up to more complex time series models, with
> higherorder ARIMA(p,d,q) models.
>
> But the idea of 'burnin' is not new. It's a fundamental part of
> some statistical tools. The Gibbs sampler comes to mind, for one.
> So Shiling could make this work. I would recommend a burnin period
> of at least M=10 for this case, although the computer cost is so
> small that there's really no point in not using a burnin period
> of, say, M=1000. If Shiling builds a much more complex time series
> model, he may want to use an M of 100 or so, just to address any
> longterm hysteresis of his models.
>
> So here's my version of Shiling's code:
>
>
> u0 = rannor(seed);
> M = 1000; /* just pick something big enough */
> do i=1 to k+M;
> u1 = rho*u0 + rannor(seed);
> if i > M then output;
> u0 = u1;
> end;
>
>
> With this sort of minor mod, Shiling's code will do just what he
> wants.
>
> Just sticking my nose in,
> David
> 
> David Cassell, CSC
> Cassell.David@epa.gov
> Senior computing specialist
> mathematical statistician
