| Date: | Wed, 8 Jul 2009 02:18:47 GMT |
| Reply-To: | theorbo <reply@TO-GROUP.NFO> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | theorbo <reply@TO-GROUP.NFO> |
| Subject: | Re: Macro Looping - Efficiencies Needed (code included) |
|
| In-Reply-To: | <200907062345.n66Hv5Ev003520@malibu.cc.uga.edu> |
| Content-Type: | text/plain; format=flowed; charset="iso-8859-1";
reply-type=original |
Thank you very much - that worked beautifully (and much faster also). I
will keep your tip in mind about transposing first for future work.
David
"Ya Huang" <ya.huang@AMYLIN.COM> wrote in message
news:200907062345.n66Hv5Ev003520@malibu.cc.uga.edu...
> Many times, an easy solution can be found by transposing the original
> data first:
>
> proc transpose data=shares out=x0;
> by shopper;
> run;
>
> proc sql;
> create table x1 as
> select distinct a.shopper as shopper1,b.shopper as shopper2,sum((a.col1-
> b.col1)**2) as ss
> from a1 a, a1 b
> where a.shopper ^=b.shopper and a._name_=b._name_
> group by a.shopper,b.shopper
> order by a.shopper,b.shopper
> ;
>
> proc transpose data=x1 out=x2 (drop=_name_) prefix=s_ss;
> by shopper1;
> var ss;
> id shopper2;
> run;
>
> proc print;
> run;
>
> shopper1 s_ss2 s_ss3 s_ss4 s_ss5 s_ss6 s_ss7
> s_ss8 s_ss1
>
> 1 0.0100 0.0961 0.8082 0.1117 0.9040 0.2886
> 0.1315 .
> 2 . 0.1661 0.8762 0.1637 0.9860 0.3706
> 0.2115 0.0100
> 3 0.1661 . 0.6251 0.0506 0.6785 0.1271
> 0.0044 0.0961
> 4 0.8762 0.6251 . 0.3561 0.0074 0.6884
> 0.5813 0.8082
> 5 0.1637 0.0506 0.3561 . 0.4069 0.1971
> 0.0494 0.1117
> 6 0.9860 0.6785 0.0074 0.4069 . 0.7574
> 0.6251 0.9040
> 7 0.3706 0.1271 0.6884 0.1971 0.7574 .
> 0.1279 0.2886
> 8 0.2115 0.0044 0.5813 0.0494 0.6251
> 0.1279 . 0.1315
>
> I believe the result about match yours. As far as the mean and median
> you can easily added just like in your code the last step.
>
> Your code took about 6 seconds, mine took less then 1 second real time.
>
>
> On Mon, 6 Jul 2009 21:34:14 GMT, theorbo <reply@TO-GROUP.NFO> wrote:
>
>>Hi, I'm using 9.1.3 via batch processing on OpenVMS.
>>I have purchasing data for 300 shoppers (obs) and
>>some 3000 products (vars). I know the product share %
>>for each shopper (rows sum to 100%).I am creating a matrix that compares
> one shopper's
>>shares to all the others' shares. My process takes
>>an inordinate amount of time (wall-clock and not CPU)
>>and appears to be due to the large amount of
>>buffered IO time.
>>
>>I'm wondering if I'm making the mistake of using
>>macros to do something that a proc could much
>>more efficiently do.
>>
>>I know that the resulting matrix is symmetric
>>so I guess I could do something
>>to halve the processing but I need to do other
>>(similar) things with this data where I compare
>>one observation to all the other
>>observations and my process
>>or looping through each individual comparing
>>it to the others
>>
>>I looked at SASFILE - with the thought that since I
>>have all these one value datasets (cell_&i) that I am
>>appending together into one column, it might help
>>if all that was resident in memory rather than writing
>>them to and from disk. I couldn't figure out
>>how to use SASFILE in this instance or how to keep
>>that in memory (rather than in a dataset in the work
>>directory on disk).
>>
>>Can you offer any suggestions to make this
>>more efficient?
>>
>>Thanks for any help.
>>David
>>
>>
>>
>>Here is some example data with 8 individuals and
>>5 products.
>>
>>*********************************************;
>>*********************************************;
>>DATA shares;
>>INPUT shopper A B C D E;
>>CARDS;
>>1 0.00 0.00 0.00 0.00 0.36
>>2 0.00 0.00 0.00 0.00 0.46
>>3 0.04 0.00 0.03 0.06 0.06
>>4 0.04 0.00 0.00 0.85 0.07
>>5 0.00 0.00 0.00 0.26 0.15
>>6 0.00 0.00 0.00 0.88 0.00
>>7 0.37 0.11 0.00 0.10 0.00
>>8 0.03 0.00 0.00 0.09 0.01
>>;
>>RUN;
>>
>>%LET max_num = 8;
>>
>>%MACRO by_shopper;
>>%DO i = 1 %to &max_num;
>> %LOCAL j;
>> %DO j = 1 %to &max_num;
>>
>> ** LIMIT DATA TO TWO SHOPPERS;
>> DATA for_transposing_&i; SET shares(WHERE=(shopper IN(&i, &j)));
>> RUN;
>>
>> PROC TRANSPOSE DATA=for_transposing_&i(DROP=shopper)
>> OUT=transposed_&i(RENAME=(_NAME_=product_name)) PREFIX=Shopper_;
>> RUN;
>>
>> ** CREATE PSEUDO-DISTANCE MEASURE;
>> DATA transposed_ss_&i(DROP=shopper_1 shopper_2);
>> SET transposed_&i;
>> SS = (shopper_1 - shopper_2)**2;
>> RUN;
>>
>> PROC MEANS DATA=transposed_ss_&i NWAY NOPRINT;
>> VAR SS;
>> OUTPUT OUT=sum_squares_&i
>> SUM=sum_squares;
>> RUN;
>>
>> DATA cell_&i (KEEP=shopper shopper_&i._uss);
>> SET sum_squares_&i;
>> shopper = &j;
>> shopper_&i._uss = sum_squares;
>> RUN;
>>
>> PROC APPEND DATA=cell_&i(KEEP=shopper_&i._uss shopper)
>> BASE=column_&i;
>> RUN;
>>
>> %END;
>>
>> PROC SORT DATA=column_&i; BY shopper; RUN;
>>
>> %IF &i = 1 %THEN %DO;
>> DATA matrix;
>> SET column_&i;
>> RUN;
>> %END;
>>
>> %ELSE %DO;
>> PROC SORT DATA=matrix OUT=matrix_sorted_&i; BY shopper; RUN;
>>
>> DATA matrix;
>> MERGE matrix_sorted_&i
>> column_&i;
>> BY shopper;
>> RUN;
>> %END;
>>
>> PROC DATASETS LIBRARY=WORK; *** DELETE SO THAT YOU DO NOT HIT LIMIT
>> ON
>>FILE VERSION #;
>> DELETE matrix_sorted_&i transposed_&i transposed_ss_&i sum_squares_&i
>>cell_&i column_&i;
>> RUN;
>>
>>%END;
>>%MEND by_shopper;
>>
>>%by_shopper
>>
>>DATA matrix;
>> SET matrix;
>> Mean_SS = MEAN(of Shopper_:);
>> Median_SS = MEDIAN(of Shopper_:);
>> RUN;
>>
>>PROC PRINT DATA=matrix;
>> RUN;
>>
>>
>>
>>
>>
>>_______OUTPUT_______
>>
>>
> shopper_shopper_shopper_shopper_shopper_shopper_shopper_shopper_
> Median_
>>Obs shopper 1_uss 2_uss 3_uss 4_uss 5_uss 6_uss 7_uss
> 8_uss Mean_SS SS
>>
>>1 1 . 0.01 0.31 0.98 0.11 0.90 0.29
> 0.13 0.39 0.28
>>2 2 0.01 . 0.38 1.05 0.16 0.99 0.37
> 0.21 0.45 0.37
>>3 3 0.31 0.38 . 0.63 0.26 0.89 0.03
> 0.19 0.38 0.30
>>4 4 0.98 1.05 0.63 . 0.53 0.18 0.58
> 0.73 0.67 0.62
>>5 5 0.11 0.16 0.26 0.53 . 0.41 0.20
> 0.05 0.25 0.19
>>6 6 0.90 0.99 0.89 0.18 0.41 . 0.76
> 0.63 0.68 0.75
>>7 7 0.29 0.37 0.03 0.58 0.20 0.76 .
> 0.13 0.34 0.28
>>8 8 0.13 0.21 0.19 0.73 0.05 0.63
> 0.13 . 0.30 0.18
|