Date: Thu, 19 Apr 2001 11:26:44 -0700
Reply-To: "Terjeson, Mark" <TerjeMW@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Terjeson, Mark" <TerjeMW@DSHS.WA.GOV>
Subject: Re: duplicate records
Content-Type: text/plain; charset=us-ascii
Hi Karen,
Here's are with and without sorted methods:
data sample;
time_sec=25; infcode='Inon';output;
time_sec=25; infcode='Ineu';output;
time_sec=30; infcode='Inon';output;
time_sec=41; infcode='Ipos';output;
time_sec=44; infcode='Inon';output;
time_sec=44; infcode='Ineu';output;
time_sec=45; infcode='Inon';output;
run;
data sample2;
time_sec=41; infcode='Ipos';output;
time_sec=25; infcode='Inon';output;
time_sec=25; infcode='Ineu';output;
time_sec=30; infcode='Inon';output;
time_sec=45; infcode='Inon';output;
time_sec=44; infcode='Inon';output;
time_sec=44; infcode='Ineu';output;
run;
* if the file is already sorted ;
* then the first. last. works ;
data myfile1;
set sample;
by time_sec;
if last.time_sec;
run;
* if not sorted, but the time_sec ;
* values are grouped together, ;
* and no missing time values. ;
data TrickToKeepLastValue;
run;
data myfile2(keep=holdtime holdcode
rename=(holdtime=time_sec holdcode=infcode));
retain holdtime .
holdcode ' ';
set sample2 TrickToKeepLastValue;
if time_sec ne holdtime
and holdtime ne . then output;
holdtime = time_sec;
holdcode = infcode;
run;
Hope this is helpful,
Mark Terjeson
Washington State Department of Social and Health Services
Division of Research and Data Analysis (RDA)
mailto:terjemw@dshs.wa.gov
-----Original Message-----
From: Karen Olson, Ph.D. [mailto:karen.olson@TCH.HARVARD.EDU]
Sent: Thursday, April 19, 2001 10:46 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: duplicate records
I have some data files from a video coding system that creates records
with two fields of interest: a code and a time. The time value is
hours:minutes:seconds:frames and frames are 30 per second. In the end,
I want a file with the time in seconds. I have most of the conversion
done but am stuck with one problem. Rounding the original time value to
the nearest second sometimes gives me >1 record with the same time. I
want to keep only the LAST of these duplicates. All the tricks I've
tried (e.g. proc sort nodupkeys; by subjid time_sec;) have given me the
FIRST of the duplicates and eliminated the rest. I tried 'descending
time_sec' and various things in data steps (e.g. the lag function) but
do not end up with what I want.
Here's some sample data: time_sec infcode
25 Inon
25 Ineu
30 Inon
41 Ipos
44 Inon
44 Ineu
45 Inon
I get: 25 Inon
30 Inon
41 Ipos
44 Inon
45 Inon
But I want: 25 Ineu
30 Inon
41 Ipos
44 Ineu
45 Inon
Any suggestions? Thanks so much in advance.
Karen Olson
Children's Hospital, Boston
karen.olson@tch.harvard.edu