```Date: Wed, 5 Jul 2006 06:57:06 -0400 Reply-To: "Howard Schreier " Sender: "SAS(r) Discussion" From: "Howard Schreier " Subject: Re: Array issue On Thu, 29 Jun 2006 21:54:03 -0700, sum wrote: >Hi all, > >I'm an user of "R", and recenly shifted to SAS, and i only have >SAS-BASE. But, i'm not aware whether i can do rowwise operations in >this. Can DO loop be used for this. Suppose, i have a matrix stock data >of 41*40 dimension, and i want to calculate returns of this data which >will be 31*40 dimension. How can i do this? And if i want to work >blockwise, like 4*40 (10 blocks ), how can i do that? Simply i want to >know whether these operations can be done without SAS/IML or not? >Please advice. > >Thanks, >Sum. The DATA step supports multidimensional arrays and nested loops, so you should be able to do anything in the vein you are describing. However, you will find yourself fussing over the "housekeeping": establishing the arrays and loading and unloading the data. It gets even messier if you have the kind of cross calculations which entail matrix multiplication. Worse yet would be simultaneity, requiring matrix inversion; as far as I know you would have to dig up an algorithm and implement it. Maybe somebody's done a macro, but it's still going to be a lot more trouble than using IML. On Fri, 30 Jun 2006 02:50:20 -0700, SUM wrote: >Hi Jim, > >Thanks for help. I'm going to state what i want to do. I have a matrix >of stock returns data of (540*502) dimension. That means the first >column being Date, second one being S&P index and other 500 columns are >S&P companies return data. Now i want to calculate rolling beta of a >stock taking window size of 52. It needs Market index and that stock >returns. I have 500 companies and 10 years weekly data. How can i do >this rolling job? Please suggest or if you can suggest any good >weblink? This is more concrete than your earlier post, making it easier to respond constructively. For this kind of work, you would probably benefit in terms of productivity if you had IML and also SAS/ETS (the time series package). Still, Base SAS can handle this task if it must. In fact, I can see four ways to proceed. 1. Use a two-dimensional array and loops. This is what you have been contemplating. 2. Transpose your data, so that you have 500 rows (for the individual stocks) and 540 columns. In other words, each observation would contain one time series. You would still use arrays, but they would be one-dimensional. The DATA step's automatic looping would do more of the work. 3. Consider that you are using a matrix frame of reference just to express parallel but separate operations on a series of vectors. The DATA step has a different paradigm for that: BY group processing. It rests on a different, more normalized, data structure. To illustrate, let's shrink things down to 3 stocks, 6 time periods, and windows of length 4. Generate some data: data returns; do obs = 1 to 6; drop obs; Month = mdy(obs,1,2006); array stock(*) IBM MS SAS; do i = 1 to dim(stock); drop i; stock(i) = rannor(21) / (15 - 2*i); end; Index = mean(of stock(*)); output; end; format _numeric_ percent7.1 month monyy7.; run; Results: Month IBM MS SAS Index JAN2006 ( 2.6%) (11.7%) ( 9.8%) ( 8.0%) FEB2006 5.3% (18.1%) 19.8% 2.3% MAR2006 ( 2.6%) 1.1% ( 8.9%) ( 3.5%) APR2006 0.4% ( 5.6%) 8.6% 1.1% MAY2006 ( 0.0%) (10.9%) 16.4% 1.8% JUN2006 ( 7.7%) ( 9.8%) 0.8% ( 5.6%) Next, normalize: proc transpose data=returns(drop = index) out=long(rename=(col1=Return) ); by month; run; This produces: Month _NAME_ Return JAN2006 IBM ( 2.6%) JAN2006 MS (11.7%) JAN2006 SAS ( 9.8%) FEB2006 IBM 5.3% FEB2006 MS (18.1%) FEB2006 SAS 19.8% MAR2006 IBM ( 2.6%) MAR2006 MS 1.1% MAR2006 SAS ( 8.9%) APR2006 IBM 0.4% APR2006 MS ( 5.6%) APR2006 SAS 8.6% MAY2006 IBM ( 0.0%) MAY2006 MS (10.9%) MAY2006 SAS 16.4% JUN2006 IBM ( 7.7%) JUN2006 MS ( 9.8%) JUN2006 SAS 0.8% This is the wrong sort order for doing the calculations, but it's right for re-attaching the index series as an additional column: data series; merge long returns(keep = month index); by month; label _name_= 'Stock'; run; Now sort: proc sort data=series; by _name_ month; run; Finally, calculate: data betas(drop = roll : ); set series; by _name_; roll_x + sum(index , -lag4(index ) ); roll_y + sum(return , -lag4(return ) ); roll_xy + sum(return*index , -lag4(return*index ) ); roll_xsq + sum(index**2 , -lag4(index**2 ) ); if _name_ = lag3(_name_) then Beta = (4*roll_xy - roll_x*roll_y) / (4*roll_xsq - roll_x**2 ) ; format beta 6.2; run; This DATA step is pretty terse. Developing and understanding the code requires knowing some subtle stuff about LAG functions and about propagation of missing values. Many DATA steps are simpler, but this one I think reflects the power of the tool. proc print label; run;; End product: Month Stock Return Index Beta JAN2006 IBM ( 2.6%) ( 8.0%) . FEB2006 IBM 5.3% 2.3% . MAR2006 IBM ( 2.6%) ( 3.5%) . APR2006 IBM 0.4% 1.1% 0.65 MAY2006 IBM ( 0.0%) 1.8% 0.96 JUN2006 IBM ( 7.7%) ( 5.6%) 0.97 JAN2006 MS (11.7%) ( 8.0%) . FEB2006 MS (18.1%) 2.3% . MAR2006 MS 1.1% ( 3.5%) . APR2006 MS ( 5.6%) 1.1% -0.41 MAY2006 MS (10.9%) 1.8% -2.68 JUN2006 MS ( 9.8%) ( 5.6%) -0.36 JAN2006 SAS ( 9.8%) ( 8.0%) . FEB2006 SAS 19.8% 2.3% . MAR2006 SAS ( 8.9%) ( 3.5%) . APR2006 SAS 8.6% 1.1% 2.75 MAY2006 SAS 16.4% 1.8% 4.72 JUN2006 SAS 0.8% ( 5.6%) 2.39 4. An alternative, also predicated on a normalized data structure, is PROC SQL. I suspect that at this moment #2 is the best choice for you. However, if you are going to rely on Base SAS you should learn its strengths; that would lead you to #3 and #4. ```

Back to: Top of message | Previous page | Main SAS-L page