Date: Mon, 18 Dec 2006 22:16:24 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Macro Error
In-Reply-To: <1166357018.006145.280190@16g2000cwy.googlegroups.com>
Content-Type: text/plain; format=flowed
Excimer@163.COM wrote back:
>"David L Cassell дµÀ£º
>"
> > Oh dear.
> >
> > Is this really your company's process for evaluating outliers?
> >
> > Can you *really* have 50% or more of the data equal to
> > exactly one value? That doesn't sound like anything that would
> > relate to your choice of 4.9303 times the (Q2-Q1).. which doesn't
> > look that good anyway, since the standard rules relate to the
> > interquartile range or the H-spread, not the value of Q2-Q1 or
> > Q3-Q2.
> >
> > So I think that you may not need the comlexity you have
> > in this code; and
> > you may want to go back and make sure that the values used
> > for the checks are correct.
> >
> > Even if they are correct, I would not use them alone. Dropping
> > high and low values without regard to *other* variables is a
> > poor decision, since you end up losing important records which
> > may merely show strong relationships with other variables.
> >
> > HTH,
> > David
> > --
> > David L. Cassell
> > mathematical statistician
> > Design Pathways
> > 3115 NW Norwood Pl.
> > Corvallis OR 97330
>
>Hi David,
>
>Thanks for your comments.
>Actually, use Q2-Q1 and Q3-Q2 instead of interquartile range is the
>consideration of skewness.
I don't see that this is warranted. What are your citations for doing
this when a 'normality check' is intended?
And where do the cutoff values come from? This is crucial. The classic
cutoffs based on hinges are built using rules of thumb from work by
Tukey.
>The outliers are excluded before normality check.
Okay, that is a bad idea. If the data are normal, you do not need
this 'outlier hacking'. If the data are not normal, then hacking off
tails may completely distort the data and their usefulness. If the
data are normal except for data contamination, I do not see how
hard-wired cutoff points are the solution.
Plus, the whole idea of a 'normality check' puts the skewness
issues in a different light. What is the point? If the data are skewed,
there's no point in doing a normality check. If the data are not
skewed, there's no reason to do this Q3-Q2 vs. Q2-Q1 process.
*AND* if you really have to worry about this kind of skewness,
comparing Q3 to Q1 is just wrong. Q3-Q2 or Q2-Q1 may be 0 when
the other difference is not. Which invalidates the next steps of
your code.
> The purpose of this
>program is just for classification.
>If they are normal distributed, the other windows programs will control
>the data based on normal procedures,
>Otherwise, the other windows programs will take special care of them.
Okay, now you have made a bad statistical error. You have done an
_a_priori_ screen on the data, thrown out data points, and then done
a *conditional* statistical analysis without adjusting your hypothesis
evaluation for this conditionality. So your p-values and CIs and such
are now messed up.
Let me re-iterate. This looks like your company is doing The Wrong
Thing. You need to talk to your boss and get him to hire a
statistical consultant to fix this stuff and get better analytical
procedures in place.
>However, the current program is really slow since the oracle database
>is too large.
The problem may be elsewhere: are you absolutely positive that the
bottleneck is not in the data reads or the data transport?
But yes, the process is long and clunky, and needs to be fixed. I
think it needs to be fixed from the ground up, beginning with a
re-examination of the fundamental rules for your process.
>I really thanks if anyone can help suggest some improve directions.
>
I just wrote some, but you're probably not going to like them.
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Your Hotmail address already works to sign into Windows Live Messenger! Get
it now
http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview