Date: Thu, 22 Feb 2001 07:31:09 -0700
Reply-To: svpetrus@NEWWESTENERGY.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Petruso Steven V <svpetrus@NEWWESTENERGY.COM>
Subject: Re: delete a duplicate record
Content-Type: text/plain; charset="iso-8859-1"
Or....just delete both rows with an SQL delete, then re-add the row with an
SQL insert. Since the ds is indexed, it shouldn't matter that the physical
order is changed.
Be careful with appending new rows to an indexed dataset, If I remember, you
have to delete the indices first before you can append. And wow, rebuilding
the indices on a 604 million row table, hit enter and go home.
-----Original Message-----
From: Chung-Jung Chung [mailto:cjc0121@YAHOO.COM]
Sent: Wednesday, February 21, 2001 10:13 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: delete a duplicate record
I created a HUGE dataset with 604 million records and
22 fields. The size is 55 GBytes. This dataset is
sorted by ID and DATE and indexed by ID.
Unfortunately, there are two identical records in this
dataset.
Obs ID DATE
601008847 A486358342 02/01/2001
601008848 A486358342 02/01/2001
I tried to delete one record by using proc sql. But I
got the error message.
proc sql noprint;
delete from lib.ts
where (id='A486358342') and (mod(_n_,2)=0);
quit;
ERROR: Function MOD requires a numeric expression as
argument 1.
ERROR: The following columns were not found in the
contributing tables: _n_.
Actually, I can save this record and delete two
records
by proc sql.
proc sql noprint;
delete from lib.ts
where (id='A486358342');
quit;
Then, I can use proc append to append this record to
the dataset. In this case, the physical order is
changed and it will need at least 10 hours to have
this dataset sorted.
Is there any easy and quick way to do this ?
Thanks in advance.
Chung
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices!
http://auctions.yahoo.com/