LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 6 Nov 2008 16:46:12 -0500
Reply-To:     T J <tj_noreply@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         T J <tj_noreply@YAHOO.COM>
Subject:      Re: Complex do-looping and/or conditional processing

As mentioned before, you don't have to re-invent the wheels. If you have proc REG, then you should have proc ROBUSTREG. They are all in SAS/STAT.

Here is some code:

proc robustreg data=one method=LTS; model ind_var = dep_var/leverage cutoff=3 ; output out=xout outlier=out1 leverage=lev1 sr=sr1 ; run; quit;

which produces output dataset XOUT:

Obs ind_var dep_var sr1 out1 lev1

1 2.0 7.0 -0.5175 0 0 2 2.5 9.2 -0.8274 0 0 3 2.7 8.8 0.0268 0 0 4 3.1 10.1 0.1305 0 0 5 3.3 11.6 -0.4673 0 0 6 3.2 9.9 0.5576 0 0 7 1.0 4.0 -0.9676 0 0 8 1.1 6.2 -2.3746 0 0 9 2.0 5.5 0.6288 0 0 10 3.0 8.0 1.4610 0 0 11 3.3 9.9 0.8318 0 0 12 0.8 18.0 -12.2150 1 1 13 1.2 4.0 -0.4190 0 0 14 1.0 2.0 0.5608 0 0 15 1.8 5.0 0.4624 0 0 16 3.3 4.8 4.7293 1 0

As you can see that obs # 12 and #16 are marked as outliers for you. If you read a little bit on Reussue(?) and LTS (least trimmed squares) on outlier detection, you would understand that some of you want has already been worked out and packaged in the proc.

-TJ

On Thu, 6 Nov 2008 16:11:13 -0500, Ryan Utz <rutz@AL.UMCES.EDU> wrote:

>Wow, thanks for the response. I hope I can make it clear what I'm looking for... > >Consider the data set below (a simplified version of what I'm working with). >There are two variables-one dependent and one independent: > >data one; input ind_var dep_var; cards; >2 7 >2.5 9.2 >2.7 8.8 >3.1 10.1 >3.3 11.6 >3.2 9.9 >1 4 >1.1 6.2 >2 5.5 >3 8 >3.3 9.9 >0.8 18 >1.2 4 >1 2 >1.8 5 >3.3 4.8 >; > >When I plot this out using gplot: > >proc gplot data=one; plot dep_var*ind_var; run; > >one can easily see that there is a quite obvious outlier (probably >representing an erroneous measurement or incorrectly entered data point) for >the point (18, 0.8) while a trend is evident in the rest of the data. There >may be another outlier point (4.8, 3.3), but who knows if I should consider >excluding it or not? At least the first one is obvious. Using PROC REG I can >get several useful regression diagnostics. I'm still working out which ones >to use and the criteria for point elimination, but the code is as follows: > >proc reg data=one noprint; >model dep_var=ind_var; output out=two (keep=dep_var ind_var r cd) >rstudent=r cookd=cd; run; > >'r' and 'cd' are regression diagnostics. For both, the greater the absolute >value, the more suspect the data point. Say, for instance, that I want to >exclude any point where 'r' exceeds 2 (a real criteria used by some). I can >easily do this manually. But once I do so, the regression dynamics shift and >other points that may be above '2' in the original regression may not do so >after removing this outlier point. So you see, I'm trying to come up with an >iterative process until each 'r' value for each point is <2, but I need to >do it one at a time. I have a number of variable sets to look at and most >of them have well over 200 points-that's why I'm trying to automate the process. > >Any thoughts?


Back to: Top of message | Previous page | Main SAS-L page