LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 11 Jun 2004 14:28:26 -0400
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject:      Re: Truncating Data series
Comments: To: Microstructure <randistan69@HOTMAIL.COM>
In-Reply-To:  <751633cc.0406110827.6e455065@posting.google.com>
Content-Type: text/plain; charset="us-ascii"

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Microstructure > Sent: Friday, June 11, 2004 12:28 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Truncating Data series > > Dear All: > I have a data series of about 3 million observations. I > want to truncate the series by eliminating the outliers (top > 1% and bottom 1%). Can you suggest a way to do this.

MS,

You can directly kill the unwanted observations using the REMOVE statement:

40 data a ;

41 do x = 1 to 3e6 ;

42 output ;

43 end ;

44 run ;

NOTE: The data set WORK.A has 3000000 observations and 1 variables.

45 data a ;

46 do p = 1 to n * .01, n - n * .01 + 1 to n ;

47 modify a nobs = n point = p ;

48 remove ;

49 end ;

50 stop ;

51 run ;

NOTE: The data set WORK.A has been updated. There were 0 observations rewritten, 0 observations added and 60000 observations deleted.

Or if you decide to adopt Toby's approach, it can be all done in the same Data step:

57 data a ; 58 set a nobs = n ; 59 if n * .01 < _n_ < n - n * .01 + 1 ; 60 run ;

NOTE: There were 3000000 observations read from the data set WORK.A. NOTE: The data set WORK.A has 2940000 observations and 1 variables.

However, it will run noticeably slower because it is more costly to read 3m and write 2940000 records than to mark 60000 records as deleted.

Kind regards, ---------------- Paul M. Dorfman Jacksonville, FL ----------------


Back to: Top of message | Previous page | Main SAS-L page