Date: Fri, 20 Jun 2008 11:12:10 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: Comparing across respondents
Content-Type: text/plain; charset="iso-8859-1"
Prasad,
It would be interesting to me if you'd set up runs of some of the various approaches and set them up to run this weekend (assuming that you are not working!) and report back on what the time took was.
I chose to use PROC Compare because with small modifications I can save the actual differences that exist as opposed to just having counts of them (use the ODS CompareDifferences output and append that to a running dataset of results). I can see that as being more useful in a database application where there are multiple duplicates of the same key, and where one would want to find out not just the count of the duplicates, but where the files differ. That's one of the reasons I attempted your problem- I have hundreds of lab results for the same blood sample conducted at different times, and want to know if the data differs at different times, and where. But in these cases, I would have not too many rows (less than 5), but many variables (hundreds), so the time factor would not be too great.
But yes, I thought about using Proc IML, as it would clearly be the fastest just to produce counts of differences. As I just said a little bit ago, Proc IML is a great tool to use as an additional programming language to SAS.
If you could report back on the relative times, that would be great! Welcome to SAS-L, and thanks for posing such an interesting problem, the solution of which actually solves a problem for me as well!
-Mary
----- Original Message -----
From: Prasad Samala
To: SAS-L@LISTSERV.UGA.EDU
Sent: Friday, June 20, 2008 10:48 AM
Subject: Re: Comparing across respondents
Thank you very much. I am very new to the listserv and the response to my
first post is overhelming. I bet I am gonna spend good amount of time here.
Coming to the problem in hand, now I have 6 ways of arriving at the
solution. I have my own 3 ways and 1 each from Mary, Datanull & Barry.
Thank you, guys.
I am using proc sql on one version(almost similar to what Datanull was
suggesting), proc freq on the second one and IML on the third one. I find
all the solutions you provided, to be very efficient way of doing things
(like the one where Mary suggested to check for duplicates). But the
common problem on all these codes except one is the time it takes to
arrive at the solution, as I am dealing with 5000 respondents here. I came
up with a code in IML, which looks naive compared to your styles, but
gives me a solution with in 3 min real time. AS I don't have much exposure
to IML and pitfalls of using it, I would really appreciate if you can
provide me some feedback on my code (pasted below).
data in;
infile cards;
input id $3. v1 v2 v3;
cards;
101 1 3 3
102 2 2 4
103 2 1 2
104 1 2 1
105 2 3 3
106 1 2 4
;
run;
proc transpose data = in out = in_t;run;
%let resp = 6;
%let var = 3;
%let data = in_t;
proc iml;
show space;
use &data;
read all into Y;
iter = 0;
fc = shape(0,%eval(&resp*&var),&resp);
summer = shape(0,&resp,&resp);
do i = 1 to &resp;
do j = 1 to &resp;
do k = 1 to &var;
if Y[k,i] = Y[k,j] then fc[k+iter,j] = 1;
end;
end;
iter = iter+&var;
end;
do i = 1 to &resp;
summer[i,] = fc[((i-1)*&var)+1:i*&var,][+,];
end;
create out from summer;
append from summer;
quit;
Also, a very silly question...the code above works fine, but on the
program editor the words 'do' and 'if' are appearing in red color...Any
one has any idea? Its not hurting the result, but it looks strange!!
Thanks guys.