Date: Sat, 1 Sep 2001 19:58:33 GMT
Reply-To: Xlr82sas <xlr82sas@AOL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Xlr82sas <xlr82sas@AOL.COM>
Organization: AOL http://www.aol.com
Subject: Re: Cooperative work on 2000 Census data
Thanks. I will order the SSU proceedings.
You might want to visit my site, members.aol.com/xlr82sas/utl.html. I posted
some code which defines column names and labels for the 39 segments used in the
detailed data. The code builds SAS tables automatically from the meta data.
I use the data for customer relationship modelling (CRM). We help companies
understand their customers. We also build predictive models for customer
I have added zip+4 and census 1990 geocodes to the ethnic redistricting Census
data. I have used these counts in several customer behavior models.
Census 2000 provided better predictors than the 1990 Census data, significantly
reducing the sum of squares error. (comparing the two models)
I am combining all states into one SAS table, the QC is very time consuming.
( This will reduce the 2080 (40x52 States) Census 2000 zip files into 40 SAS
tables - 50Gb total with compress=binary??)
I hope to end up with a simple star schema of about 6 tables. I expect to
drop about 2/3 of the columns and half the rows?? For instance: tree race
counts like Asian, African American and Alaskan Natives could be dropped since
the coverage is very low.
This may be of help, to all that are compiling the detailed data.
Any independent confirmation is welcome.
Segment One (52 States/Puerto Rico/DC) of the SF1 data has 9,541,315 records
This count should match the Geographic Headers and about 8 other segments
All other segments appear to contain 724,015 records.
For segment one.
I have record counts by state, in a SAS table.
I have Column max, mins and sums by state and overall, in a SAS table.
The data appears quite clean, so far.
Roger J DeAngelis
XLR82SAS@aol.com ( Accelerate to SAS )