Date: Wed, 5 Jul 2006 06:57:06 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Array issue
On Thu, 29 Jun 2006 21:54:03 -0700, sum <sas.sumanta@GMAIL.COM> wrote:
>Hi all,
>
>I'm an user of "R", and recenly shifted to SAS, and i only have
>SAS-BASE. But, i'm not aware whether i can do rowwise operations in
>this. Can DO loop be used for this. Suppose, i have a matrix stock data
>of 41*40 dimension, and i want to calculate returns of this data which
>will be 31*40 dimension. How can i do this? And if i want to work
>blockwise, like 4*40 (10 blocks ), how can i do that? Simply i want to
>know whether these operations can be done without SAS/IML or not?
>Please advice.
>
>Thanks,
>Sum.
The DATA step supports multidimensional arrays and nested loops, so you
should be able to do anything in the vein you are describing. However, you
will find yourself fussing over the "housekeeping": establishing the arrays
and loading and unloading the data.
It gets even messier if you have the kind of cross calculations which entail
matrix multiplication. Worse yet would be simultaneity, requiring matrix
inversion; as far as I know you would have to dig up an algorithm and
implement it. Maybe somebody's done a macro, but it's still going to be a
lot more trouble than using IML.
On Fri, 30 Jun 2006 02:50:20 -0700, SUM <sumanta24@GMAIL.COM> wrote:
>Hi Jim,
>
>Thanks for help. I'm going to state what i want to do. I have a matrix
>of stock returns data of (540*502) dimension. That means the first
>column being Date, second one being S&P index and other 500 columns are
>S&P companies return data. Now i want to calculate rolling beta of a
>stock taking window size of 52. It needs Market index and that stock
>returns. I have 500 companies and 10 years weekly data. How can i do
>this rolling job? Please suggest or if you can suggest any good
>weblink?
This is more concrete than your earlier post, making it easier to respond
constructively.
For this kind of work, you would probably benefit in terms of productivity
if you had IML and also SAS/ETS (the time series package). Still, Base SAS
can handle this task if it must.
In fact, I can see four ways to proceed.
1. Use a two-dimensional array and loops. This is what you have been
contemplating.
2. Transpose your data, so that you have 500 rows (for the individual
stocks) and 540 columns. In other words, each observation would contain one
time series. You would still use arrays, but they would be one-dimensional.
The DATA step's automatic looping would do more of the work.
3. Consider that you are using a matrix frame of reference just to express
parallel but separate operations on a series of vectors. The DATA step has a
different paradigm for that: BY group processing. It rests on a different,
more normalized, data structure.
To illustrate, let's shrink things down to 3 stocks, 6 time periods, and
windows of length 4. Generate some data:
data returns;
do obs = 1 to 6; drop obs;
Month = mdy(obs,1,2006);
array stock(*) IBM MS SAS;
do i = 1 to dim(stock); drop i;
stock(i) = rannor(21) / (15 - 2*i);
end;
Index = mean(of stock(*));
output;
end;
format _numeric_ percent7.1 month monyy7.;
run;
Results:
Month IBM MS SAS Index
JAN2006 ( 2.6%) (11.7%) ( 9.8%) ( 8.0%)
FEB2006 5.3% (18.1%) 19.8% 2.3%
MAR2006 ( 2.6%) 1.1% ( 8.9%) ( 3.5%)
APR2006 0.4% ( 5.6%) 8.6% 1.1%
MAY2006 ( 0.0%) (10.9%) 16.4% 1.8%
JUN2006 ( 7.7%) ( 9.8%) 0.8% ( 5.6%)
Next, normalize:
proc transpose data=returns(drop = index)
out=long(rename=(col1=Return) );
by month;
run;
This produces:
Month _NAME_ Return
JAN2006 IBM ( 2.6%)
JAN2006 MS (11.7%)
JAN2006 SAS ( 9.8%)
FEB2006 IBM 5.3%
FEB2006 MS (18.1%)
FEB2006 SAS 19.8%
MAR2006 IBM ( 2.6%)
MAR2006 MS 1.1%
MAR2006 SAS ( 8.9%)
APR2006 IBM 0.4%
APR2006 MS ( 5.6%)
APR2006 SAS 8.6%
MAY2006 IBM ( 0.0%)
MAY2006 MS (10.9%)
MAY2006 SAS 16.4%
JUN2006 IBM ( 7.7%)
JUN2006 MS ( 9.8%)
JUN2006 SAS 0.8%
This is the wrong sort order for doing the calculations, but it's right for
re-attaching the index series as an additional column:
data series;
merge long returns(keep = month index);
by month;
label _name_= 'Stock';
run;
Now sort:
proc sort data=series;
by _name_ month;
run;
Finally, calculate:
data betas(drop = roll : );
set series;
by _name_;
roll_x + sum(index , -lag4(index ) );
roll_y + sum(return , -lag4(return ) );
roll_xy + sum(return*index , -lag4(return*index ) );
roll_xsq + sum(index**2 , -lag4(index**2 ) );
if _name_ = lag3(_name_) then
Beta = (4*roll_xy - roll_x*roll_y) /
(4*roll_xsq - roll_x**2 )
;
format beta 6.2;
run;
This DATA step is pretty terse. Developing and understanding the code
requires knowing some subtle stuff about LAG functions and about propagation
of missing values. Many DATA steps are simpler, but this one I think
reflects the power of the tool.
proc print label; run;;
End product:
Month Stock Return Index Beta
JAN2006 IBM ( 2.6%) ( 8.0%) .
FEB2006 IBM 5.3% 2.3% .
MAR2006 IBM ( 2.6%) ( 3.5%) .
APR2006 IBM 0.4% 1.1% 0.65
MAY2006 IBM ( 0.0%) 1.8% 0.96
JUN2006 IBM ( 7.7%) ( 5.6%) 0.97
JAN2006 MS (11.7%) ( 8.0%) .
FEB2006 MS (18.1%) 2.3% .
MAR2006 MS 1.1% ( 3.5%) .
APR2006 MS ( 5.6%) 1.1% -0.41
MAY2006 MS (10.9%) 1.8% -2.68
JUN2006 MS ( 9.8%) ( 5.6%) -0.36
JAN2006 SAS ( 9.8%) ( 8.0%) .
FEB2006 SAS 19.8% 2.3% .
MAR2006 SAS ( 8.9%) ( 3.5%) .
APR2006 SAS 8.6% 1.1% 2.75
MAY2006 SAS 16.4% 1.8% 4.72
JUN2006 SAS 0.8% ( 5.6%) 2.39
4. An alternative, also predicated on a normalized data structure, is PROC SQL.
I suspect that at this moment #2 is the best choice for you. However, if you
are going to rely on Base SAS you should learn its strengths; that would
lead you to #3 and #4.