Date: Wed, 21 Feb 2001 21:13:00 -0800
Reply-To: Chung-Jung Chung <cjc0121@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Chung-Jung Chung <cjc0121@YAHOO.COM>
Subject: delete a duplicate record
Content-Type: text/plain; charset=us-ascii
I created a HUGE dataset with 604 million records and
22 fields. The size is 55 GBytes. This dataset is
sorted by ID and DATE and indexed by ID.
Unfortunately, there are two identical records in this
dataset.
Obs ID DATE
601008847 A486358342 02/01/2001
601008848 A486358342 02/01/2001
I tried to delete one record by using proc sql. But I
got the error message.
proc sql noprint;
delete from lib.ts
where (id='A486358342') and (mod(_n_,2)=0);
quit;
ERROR: Function MOD requires a numeric expression as
argument 1.
ERROR: The following columns were not found in the
contributing tables: _n_.
Actually, I can save this record and delete two
records
by proc sql.
proc sql noprint;
delete from lib.ts
where (id='A486358342');
quit;
Then, I can use proc append to append this record to
the dataset. In this case, the physical order is
changed and it will need at least 10 hours to have
this dataset sorted.
Is there any easy and quick way to do this ?
Thanks in advance.
Chung
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices! http://auctions.yahoo.com/