|Date: ||Fri, 6 Aug 1999 09:43:52 -0700|
|Reply-To: ||"Karsten M. Self" <kmself@IX.NETCOM.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||"Karsten M. Self" <kmself@IX.NETCOM.COM>|
|Organization: ||Self Analysis|
|Subject: ||Re: Real stats on real big data?|
|Content-Type: ||text/plain; charset=us-ascii|
John Whittington wrote:
> At 11:51 05/08/99 -0700, Karsten M. Self wrote:
> >As I'm given to understand many SAS/STAT products, they are essentially
> >based on IML and matrix manipulations. I'm again largely ignorant of
> >the area, but if a procedure requires loading an entire matrix in
> >memory, or exploding it out (my mind goes to the analog of a cartesian
> >join under SQL, I'm under the impression that matrix inversion is
> >similar in scope), then even moderate quantities of data may introduce
> >very high resource demands.
> What I don't know is exactly how SAS deals with rank-based activities such
> as 'non-parametric tests' (e.g. NPAR1WAY), or even the rank-based
> statistics provided by UNIVARIATE. On the face of it, these activities
> appear to require sorts of the entire dataset (or dervivations thereof,
> with the same number of obs), and storage of the resultant stored dataset.
I don't even know what NPAR1WAY is or does (I'm a programmer,
dammit...), much less how it works.
Univariate, if I remember my basic stats well enough, can generate most
of its stats by computing statistical moments -- single point measures
-- for variables, for most statistics. Certainly this works for mean,
n, standard deviation, and measures derived from these. A short array
will do for accumulating extreme values (the n largest and smallest
values). Mode, median, and percentiles are a different animal AFAIK,
thought there are probably some algorithmic shortcuts. My earlier
investigation of BASS (Jeff Bass's early PC-based SAS run-alike) showed
that most of the algorithms in use in SAS, at least at the time BASS was
created, were based on published work. According to the Procedure
Manual, p 625, this is:
Fisher, R.A. (1973), /Statistical Methods for Research Workers/,
14th Edition, New York: Hafner Publishing Company.
Amazon.com lists this as out-of-print
Karsten M. Self (firstname.lastname@example.org)
What part of "Gestalt" don't you understand?
SAS for Linux: http://www.netcom.com/~kmself/SAS/SAS4Linux.html
Mailing List: body "subscribe sas-linux"