LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 18 Dec 1998 11:54:27 -0500
Reply-To:     pdorfma@FL6612MAILEX4.UCS.ATT.COM
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         pdorfma@FL6612MAILEX4.UCS.ATT.COM
Subject:      Re: Sort variables within an observation
Comments: To: "peter.flom@NDRI.ORG" <peter.flom@NDRI.ORG>
Content-Type: text/plain; charset="iso-8859-1"

Peter Flom <peter.flom@NDRI.ORG>, in part, wrote:

>I have a data set with several hundred observations. Each observation >contains (among a lot of other stuff) 5 variable corresponding to the age >at which the subject first did something. Each of these could be missing, >or range from 1 to 25. >The "somethings" are various drugs: Marijuana, cocaine, etc. >What I would like is, for each person, to get a data set containing which >drug the person did first, second, third, fourth, or fifth. There are a >couple complications. Each person could have done any, some, or all of >the drugs. They could do them in any order. And they could have started >doing two (or more) at the same age, thus yielding ties. >I could code this with "brute force", but that would take a couple hundred >lines of code. >Does anyone have a simple or elegant solution?

Peter,

A very similar problem was once discussed in the thread "Ordering word tokens" originated by a question posted by Robert Lokhamp on October 10, 1998. It was shown that, even though the task can be converted to making use of PROC SORT, the most efficient solution boils down to exactly what you stated in the title, that is, to using an explicitly coded sorting routine to order the variables in every observation. With only 5 variables to sort, there is no need in a sophisticated algorithm; simple sorting schemes, for instance, straight insertion sort, will run just as fast. First, you have to organize an array, say, D(*), incorporating the variables you need to sort; second, use insertion sort to order them. However, your problem has an additional twist in that you need to output 5 different variables holding the NAMES of the variables whose values have been enumerated as a result of sorting. Therefore, in addition to the first array, we shall create two extra arrays. One extra array, let us call it Z(5) _TEMPORARY_, will contain the enumeration of the variables in the array D(*), and in the process of sorting, we shall move the items in the array D (being actually sorted) and the elements in Z providing the enumeration around synchronously. The second extra array, S(5), will house the 5 new variables to be populated with the variable names from the array D(*) according to the order of first drug usage. The enumerating numbers rearranged along with the elements of D(*) will act as pointers telling us exactly which nodes in D(*) the names should come from.

Assume, for simplicity, that we have only 10 observations with some drug data in the range as you indicated, and some extra variables standing for "a lot of other stuff". The situation could be simulated using the following DATA step:

DATA DRUGS (DROP=I J); ARRAY D (*) MARI COCA HERO LSD OPIUM; DROP I J; DO I=1 TO 10; DO J=LBOUND(D) TO HBOUND(D); D(J) = INT(RANUNI(1)*25) - 1; IF D(J) LE 0 THEN D(J) = .; END; OTHER = CEIL(RANUNI(2)*10); STUFF = CEIL(RANUNI(3)*10); OUTPUT; END; RUN;

Printed, the dataset looks like that:

OBS MARI COCA HERO LCD OPIUM OTHER STUFF

1 3 23 8 5 22 10 6 2 12 . . 19 12 9 1 3 22 6 5 16 23 3 7 4 9 12 6 10 20 7 6 5 13 8 17 11 22 10 6 6 6 8 10 15 3 2 9 7 6 22 21 13 . 2 6 8 9 3 15 9 2 5 2 9 13 17 9 . 12 4 1 10 16 22 10 22 16 2 2

Now, we can translate the plan outlined above into the SAS Language:

DATA USAGE (KEEP=FIRST SECOND THIRD FOURH FIFTH); ARRAY D(*) MARI COCA HERO LSD OPIUM; ARRAY Z(5) _TEMPORARY_; ARRAY S(*) $8 FIRST SECOND THIRD FOURH FIFTH; SET DRUGS; *** Enumerate variables in D(*); DO I=1 TO DIM(Z); Z(I) = I; END; *** Insertion-sort D(*) and move Z-items along; DO J=LBOUND(D)+1 TO HBOUND(D); TD = D(J); TN = Z(J); DO I=J-1 TO 1 BY -1; IF TD => D(I) THEN LEAVE; D(I+1) = D(I); Z(I+1) = Z(I); END; D(I+1) = TD; Z(I+1) = TN; END; *** Use Z-items as pointers to assign names; DO I=1 TO 5; N = Z(I); CALL VNAME(D(N),S(I)); END; RUN;

Which yields:

OBS FIRST SECOND THIRD FOURTH FIFTH

1 MARI LSD HERO OPIUM COCA 2 COCA HERO MARI OPIUM LSD 3 HERO COCA LSD MARI OPIUM 4 HERO MARI LSD COCA OPIUM 5 COCA LSD MARI HERO OPIUM 6 OPIUM MARI COCA HERO LSD 7 OPIUM MARI LSD HERO COCA 8 OPIUM COCA MARI LSD HERO 9 LSD HERO OPIUM MARI COCA 10 HERO MARI OPIUM COCA LSD

One friendly warning: DO NOT try to shorten the code by using a nested array reference D(Z(I)) inside the VNAME routine unless you really want to spend a day figuring out why "Array subscript out of range at line..." whilst it is absolutely, positively within the range.

Have a happy holiday season!

Kind regards,

Paul

++++++++++++++++++++++++ Paul M. Dorfman Citibank UCS Decision Support Systems Jacksonville, FL ++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page