LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 1999, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 6 Aug 1999 09:43:52 -0700
Reply-To:   "Karsten M. Self" <kmself@IX.NETCOM.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Karsten M. Self" <kmself@IX.NETCOM.COM>
Organization:   Self Analysis
Subject:   Re: Real stats on real big data?
Comments:   To: John Whittington <medisci@powernet.com>
Content-Type:   text/plain; charset=us-ascii

John Whittington wrote: > > At 11:51 05/08/99 -0700, Karsten M. Self wrote:

<snip>

> >As I'm given to understand many SAS/STAT products, they are essentially > >based on IML and matrix manipulations. I'm again largely ignorant of > >the area, but if a procedure requires loading an entire matrix in > >memory, or exploding it out (my mind goes to the analog of a cartesian > >join under SQL, I'm under the impression that matrix inversion is > >similar in scope), then even moderate quantities of data may introduce > >very high resource demands.

> What I don't know is exactly how SAS deals with rank-based activities such > as 'non-parametric tests' (e.g. NPAR1WAY), or even the rank-based > statistics provided by UNIVARIATE. On the face of it, these activities > appear to require sorts of the entire dataset (or dervivations thereof, > with the same number of obs), and storage of the resultant stored dataset.

I don't even know what NPAR1WAY is or does (I'm a programmer, dammit...), much less how it works.

Univariate, if I remember my basic stats well enough, can generate most of its stats by computing statistical moments -- single point measures -- for variables, for most statistics. Certainly this works for mean, n, standard deviation, and measures derived from these. A short array will do for accumulating extreme values (the n largest and smallest values). Mode, median, and percentiles are a different animal AFAIK, thought there are probably some algorithmic shortcuts. My earlier investigation of BASS (Jeff Bass's early PC-based SAS run-alike) showed that most of the algorithms in use in SAS, at least at the time BASS was created, were based on published work. According to the Procedure Manual, p 625, this is:

Fisher, R.A. (1973), /Statistical Methods for Research Workers/, 14th Edition, New York: Hafner Publishing Company.

Amazon.com lists this as out-of-print (http://www.amazon.com/exec/obidos/ASIN/0050021702).

-- Karsten M. Self (kmself@ix.netcom.com) What part of "Gestalt" don't you understand?

SAS for Linux: http://www.netcom.com/~kmself/SAS/SAS4Linux.html Mailing List: body "subscribe sas-linux" mailto:majordomo@Cranfield.ac.uk


Back to: Top of message | Previous page | Main SAS-L page