| Date: | Fri, 11 Jun 2004 14:28:26 -0400 |
| Reply-To: | sashole@bellsouth.net |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Paul M. Dorfman" <sashole@BELLSOUTH.NET> |
| Organization: | Sashole of Florida |
| Subject: | Re: Truncating Data series |
|
| In-Reply-To: | <751633cc.0406110827.6e455065@posting.google.com> |
| Content-Type: | text/plain; charset="us-ascii" |
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Microstructure
> Sent: Friday, June 11, 2004 12:28 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Truncating Data series
>
> Dear All:
> I have a data series of about 3 million observations. I
> want to truncate the series by eliminating the outliers (top
> 1% and bottom 1%). Can you suggest a way to do this.
MS,
You can directly kill the unwanted observations using the REMOVE statement:
40 data a ;
41 do x = 1 to 3e6 ;
42 output ;
43 end ;
44 run ;
NOTE: The data set WORK.A has 3000000 observations and 1 variables.
45 data a ;
46 do p = 1 to n * .01, n - n * .01 + 1 to n ;
47 modify a nobs = n point = p ;
48 remove ;
49 end ;
50 stop ;
51 run ;
NOTE: The data set WORK.A has been updated. There were 0 observations
rewritten,
0 observations added and 60000 observations deleted.
Or if you decide to adopt Toby's approach, it can be all done in the same
Data step:
57 data a ;
58 set a nobs = n ;
59 if n * .01 < _n_ < n - n * .01 + 1 ;
60 run ;
NOTE: There were 3000000 observations read from the data set WORK.A.
NOTE: The data set WORK.A has 2940000 observations and 1 variables.
However, it will run noticeably slower because it is more costly to read 3m
and write 2940000 records than to mark 60000 records as deleted.
Kind regards,
----------------
Paul M. Dorfman
Jacksonville, FL
----------------
|