Date:         Mon, 9 Aug 1999 17:32:54 +0100
Reply-To:     John Whittington <medisci@POWERNET.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         John Whittington <medisci@POWERNET.COM>
Subject:      Re: Real stats on real big data?
Comments: To: "Berryhill, Tim" <TWB2@PGE.COM>
In-Reply-To:  <>
Content-Type: text/plain; charset="us-ascii"

At 18:00 06/08/99 -0700, Berryhill, Tim wrote:

>In the mapping example below, it might have been sufficient to sample one >point in each bin. Drawing a 1% sample from California might give you only >people in Los Angeles. You could easily miss entire counties.

Tim, stratified samples are obviously fine if one is interested in looking at some sort of 'characteristics' of what is in each 'bin', but the approach clearly can't be used if the purpose of the exercise is to estmate the *number* of items in each 'bin' - which, as far as I can make out, was what was wanted in this example.

I think the point you make above illustrates why any sort of sampling/data-reduction methods are probably inappropriate to the mapping exercise - since, unless one seeks only very 'coarse' information (i.e. very large bins), one will invariably 'chop off the bottom of the data' - and, as you say, could miss whole towns/counties.

Kind Regards

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Buckingham MK18 4EL, UK ----------------------------------------------------------------

