LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 4 Mar 2008 17:22:34 -0500
Reply-To:     Jerry <greenmt@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Jerry <greenmt@GMAIL.COM>
Subject:      [revisited] Re: a better alternative to collapse data?
Comments: To: Paul Dorfman <sashole@BELLSOUTH.NET>

Paul,

If I want to use the "DO UNTIL (EOF);" loop (shown below) to wrap around your loop of "do _i_ = 1 by 1 until (last.idindex) ;" (see your code below) , how can I EASILY reset the values in the PDV to be missing at the top of each iteration, especially when the data set has A LOT of variables?

The DO loop I'm tempted (due to the potential efficiency improvement it can make) to use is here /****************/ DO UNTIL (EOF); SET DSIN (END=EOF); ....... OUTPUT; END; /****************/

So your code with my revision (shown in capital letters) will look like this: /****************/ data dsout (keep = idindex cd:) ; DO UNTIL (EOF); do _i_ = 1 by 1 until (last.idindex) ; /**sample data set "dsin" is given at bottom*/ set dsin END=EOF; by idindex ; array cd $ 3 cd1 - cd4 ; cd = code ; end; OUTPUT; END; run ; /****************/

Apparently, this code above is WRONG, as it doesn't reset the values in the PDV to be missing at the top of each iteration, and thus will not produce the intended data set.

I'm aware that without using the "DO UNTIL (EOF);" loop together with the explicit OUTPUT statement, there will be no problem, as the implicit DATA step loop will automatically reset the values to be missing at the top of each iteration.

So could you or anyone here on the list please offers me some clues as to how to EASILY reset the values in the PDV to be missing at the top of each iteration, especially when the dataset has A LOT of variables, assuming that Do loop has to be used?

/*****sample data set*****/ data dsin; length idindex 3 code $3; input idindex code $; datalines; 1 256 1 287 1 985 2 660 2 966 3 811 3 809 3 178 3 271 4 352 4 210 ; run;

Thanks!

Jerry

On Mon, 18 Feb 2008 21:25:45 +0000, Paul Dorfman <sashole@BELLSOUTH.NET> wrote:

>Jerry, > >You code does not have glaring inefficiencies, and its logical structure is certainly the one fitting the talk like a glove. Still, it does not hurt to eliminate some unnecessary instructions (namely, the initialization of the counter to 0 and first.idindex comparison): > >data dsout (keep = idindex cd:) ; > do _i_ = 1 by 1 until (last.idindex) ; > set dsin ; > by idindex ; > array cd $ 3 cd1 - cd4 ; > cd = code ; > end; >run ; > >However, the trimming done above is pretty minor, so do not bet your house on an awesome run-time reduction, especially if your file is not way too large (if this is the case and you have a real problem with performance, it may very well lie in the hardware, say, a single smallish/slow disk drive). TRANSPOSE suggested by Jack is more concise, however it tends to be more resource-intensive than the DATA step (for a good reason). > >Kind regards >------------ >Paul Dorfman >Jax, FL >------------ > >-------------- Original message ---------------------- >From: Jerry <greenmt@GMAIL.COM> >> >> Hi, >> >> I have a data set "dsin" look like below, >> >> /**********/ >> data dsin; >> length idindex 3 code $3; >> input idindex code $; >> datalines; >> 1 256 >> 1 287 >> 1 985 >> 2 660 >> 2 966 >> 3 811 >> 3 809 >> 3 178 >> 3 271 >> 4 352 >> 4 210 >> ; >> run; >> >> To collapse the data above by "idindex" to the desired layout below: >> >> idindex cd_1 cd_2 cd_3 cd_4 >> 1 256 287 985 >> 2 660 966 >> 3 811 809 178 271 >> 4 352 210 >> >> >> I have the code below to do so >> >> /****/ >> proc sort data=dsin; >> by idindex; >> run; >> >> >> data dsout (drop=cdcnt code); >> >> /*because for each distinctive value of variable "idindex", there are at >> most four codes.*/ >> array cd {*} $3 cd1-cd4; >> >> do until(last.idindex); >> set dsin; by idindex; >> if first.idindex then cdcnt=0; >> cdcnt+1; >> cd{cdcnt}=code; >> end; >> run; >> /****/ >> >> I wonder if anyone can help me find a better alternative to my approach in >> terms of efficacy (i.e. minimizing running time). >> >> Any input is greatly appreciated. >> >> Jerry


Back to: Top of message | Previous page | Main SAS-L page