Date: Sat, 22 Nov 2003 19:48:59 -0500
Reply-To: John Fleming <johntwrl@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: John Fleming <johntwrl@HOTMAIL.COM>
Subject: Re: Function to select one record in each of multiple subsets of
On Sat, 22 Nov 2003 15:53:16 -0500, Theodore <tsmonk@EUDORAMAIL.COM> wrote:
>Thanks for the promtp help, John. You set up the problem that I have;
>however, there is a wrinkle that I do not see covered in your solution.
>Each record in the 'event' (jobs) file has a large number of variables'
>scores (over 150 variables in my file) and I want to write to the new
>file everything that is on the record with the longest job for employee X.
>It seems that your solution is writitng to the new file only the value
>for the job duratyion variable. If I am right, do you have any
>suggestions on how to deal with the wrinkle in your procedure?
The code as I wrote it will grab every variable from data set A. To keep
only the two variables EMPLOYEE and DURATION, I would have needed to to
use a keep statement, i.e., SET A (KEEP = EMPLOYEE DURATION);
I did introduce a different glitch, one that comes from our records being
ordered by an ID number and a case number. To make my code work more
correctly for the case described in my post, eliminate the KEY variable,
do the sort by EMPLOYEE, and in the last data step use BY EMPLOYEE and
LAST.EMPLOYEE instead of BY KEY and LAST.KEY.
The BY KEY line is simply telling SAS how to group the records. LAST.KEY
(or more correctly LAST.EMPLOYEE) is an instruction telling SAS to take
the last record in each group.
If there were 15 records for employee 00001 and 12 records for employee
00002, LAST.EMPLOYEE would output the fifteenth record for employeed 00001
and the twelth record for employee 00002.
>Thanks in advance.
>John Fleming wrote:
>> On Sat, 22 Nov 2003 10:46:06 -0500, Theodore <tsmonk@EUDORAMAIL.COM>
>>> Does anyone know of a way to compare valuss in a specified field
>>>a subset of records and write to a new file only one record that
>>>satisfies a given criterion (e.g. the longest job duration, among
>>>multiple jobs held by one respondent), and then repeat the process over
>>>a very large number of subsets (e.g. respondents in my example above --
>>>the has records for jobs held by respondents)? I can't find a SAS
>>>function that can do this, so I am seeking confirmation that it cannot
>>>be done in SAS. Thanks.
>> If I understand what you want to do, you have a data set, 'A' with
>> of the form -
>> Employee Duration
>> 00001 365 days
>> 00001 746 days
>> 00001 655 days
>> 00002 498 days
>> 00002 1103 days
>> 00002 965 days
>> For the example, you want just the largest values in a separate data
>> that is, the 746 day record for employee 00001 and the 1103 day record
>> employee 00002.
>> We do stuff like this a lot in our shop.
>> There is probably a better way, but here is how we do it;
>> DATA B;
>> SET A;
>> KEY = EMPLOYEE||DURATION;
>> PROC SORT;
>> BY KEY;
>> DATA C;
>> SET B;
>> BY KEY;
>> IF LAST.KEY;