**Date:** Fri, 11 Jun 2004 14:28:26 -0400
**Reply-To:** sashole@bellsouth.net
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
**Organization:** Sashole of Florida
**Subject:** Re: Truncating Data series
**In-Reply-To:** <751633cc.0406110827.6e455065@posting.google.com>
**Content-Type:** text/plain; charset="us-ascii"
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Microstructure
> Sent: Friday, June 11, 2004 12:28 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Truncating Data series
>
> Dear All:
> I have a data series of about 3 million observations. I
> want to truncate the series by eliminating the outliers (top
> 1% and bottom 1%). Can you suggest a way to do this.

MS,

You can directly kill the unwanted observations using the REMOVE statement:

40 data a ;

41 do x = 1 to 3e6 ;

42 output ;

43 end ;

44 run ;

NOTE: The data set WORK.A has 3000000 observations and 1 variables.

45 data a ;

46 do p = 1 to n * .01, n - n * .01 + 1 to n ;

47 modify a nobs = n point = p ;

48 remove ;

49 end ;

50 stop ;

51 run ;

NOTE: The data set WORK.A has been updated. There were 0 observations
rewritten,
0 observations added and 60000 observations deleted.

Or if you decide to adopt Toby's approach, it can be all done in the same
Data step:

57 data a ;
58 set a nobs = n ;
59 if n * .01 < _n_ < n - n * .01 + 1 ;
60 run ;

NOTE: There were 3000000 observations read from the data set WORK.A.
NOTE: The data set WORK.A has 2940000 observations and 1 variables.

However, it will run noticeably slower because it is more costly to read 3m
and write 2940000 records than to mark 60000 records as deleted.

Kind regards,
----------------
Paul M. Dorfman
Jacksonville, FL
----------------