Date:         Wed, 2 May 2007 20:04:04 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Sample data set with about five or more predictor and one
              Response variable
In-Reply-To:  <>
Content-Type: text/plain; format=flowed sagely replied: >David L Cassell <davidlcassell@MSN.COM> wrote > > > >Wouldn't it be a lot better to build the data sets yourself? > > > >SAS is an excellent data generation tool, you know. And building the > >data sets yourself ensures that all the features you want will be found >in > >the data, while none of the nightmares you want to avoid will be >present... > >David is, as usual, entirely correct. > >But in some circumstances, it is better to NOT know what features are >there. > >If one wants to practice some actual data analysis, then one might be >better served analyzing a data set where one does NOT know all about the >features. > >Is that point an outlier? hmmmm >Is the distribution normal? Well, it's CLOSE to it close >enough? > >Why doesn't the model fit? > >etc. > >Not that I am disagreeing with David. It just depends on what the original >poster wanted to do > >Peter

I take the point of view that a 'weird' value is only an outlier after the subject- matter experts get through examining it and the QC people have checked it out throughly. After all, in lots of data sets it is the outliers which carry the most interesting information, and throwing them out would therefore be bad. Ask any astrophysicist. :-)

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

