Date: Wed, 23 Apr 1997 09:40:13 CDT
Reply-To: John Blodgett <C1921@UMSLVMA.UMSL.EDU>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Blodgett <C1921@UMSLVMA.UMSL.EDU>
Subject: ZIP to County, etc.
This is in response to Steve Des Jardin's posting re how to relate
ZIPs to counties in SAS. The short answer is to use a SAS format code
available via anonymous ftp at
oseda.missouri.edu
in directory pub/sasfmats
as file Szipcnty.sas .
A more detailed summary of the general problem and some tools that we
have developed for helping to deal with it is attached. It was posted
to this list last September, but not much has changed so its worth
posting again.
John Blodgett
Urban Information Center / 211 Lucas Hall
University of Missouri - St. Louis 8001 Natural Bridge Rd.
St. Louis, Mo 63121-4499 Phone: (314) 516-6014/6000 FAX: 516-6274
<===============note originally posted 9-96=================
This posting was prompted by the recent thread regarding the SAS ballot
item about a function to return the city name for ZIP codes. It has
been pointed out that such a function would not be reliable and could
not be kept up to date (ZIPs span cities and are always changing.)
And a followup reply brought up the FAQ about relating ZIPs to counties.
I agree that a SAS function is probably not the best way to handle the
problem. But it certainly would be a useful feature to have a way of
relating the current ZIPs to cities and counties. I think a SAS format
would be the better tool to accomplish this. You can modify and update
a SAS format a lot easier than waiting for SAS I. to update a built-in
function. And you'd have a lot more control over how it worked.
There are a series of FTP'able SAS format codes available that you
can use at least as "starter" code to build your custom table-lookups.
These were created based on the USPS city-state file circa 6/94. We
shall probably update them some time next year. FTP to our site at
oseda.missouri.edu (login as anonymous) and use
cd pub/sasfmats
get Szipcnty.sas <--note upcase "S" to emulate "$". This code
takes 5-digit ZIP arg & retruns FIPS county
code + county name. This is the "primary"
county associated with the ZIP. See article.
get Szipnmus.sas <--A place name to associate with the ZIP. Huge
format. Not rigorous but very practical.
get Szipmetr.sas <--From the county you can get the metro area.
get Szipstab.sas <--Get the state abbreviation.
Now if you have a mailing list with ZIP codes you can drop city and
state (unless you need them for something else) and when you generate
the mailing address the last line becomes:
put zip $zipnmus. +1 zip $zipstab. +1 zip ;
There will always be ZIPs that do not show up in these format tables
but since you have the source code you can look these up and add your
own entries. To quickly see if you have any zips on your dataset that
are not in the format table (this assumes you have your zip code stored
as a $5 variable named "zip") use:
proc sql;
select unique zip from mysas.customer
where put(zip,$zipnmus.) eq zip;
(If the zip code is not found the format returns the argument.)
To select all your customers living in ZIPs that are "mostly" in a
list of counties you can use:
proc sql;
select * from mysas.customer
where put(zip,$zipcnty5.) in ('29019','29133','29211');
or to select customers in ZIPs that are in the Kansas City MSA use:
proc sql;
select * from mysas.customer
where put(zip,$zipmetr4.) eq '3760';
or to add state-abbreviation and primary county codes to a dataset
with just zips:
proc sql;
create view withcods as
select *, put(zip,$zipcnty5.) format=$fipcnty16. as county,
put(zip,$zipstab2.) as state from mysas.customer;
*--the source code for $fipcnty is also available as file
Sfipcnty.sas from the same sasfmats directory mentioned above;
*--and now do a freq report on zips with the county + state ids;
select zip, county, state, count(*) as n
from withcods group by zip;
These are good practical "close enough for business" applications.
If you need something more rigorous that relates ZIP codes to detailed
cities, counties, urbanized areas, census tracts, etc. then you should
check out the 1990 zip equivalency files stored in the census archive at
CIESIN - described in the attached article.
------------------------------------------------------------
ZIP Codes and Counties
======================
Ever had an address file with ZIP codes that you wanted to analyze
based on state and county geography? The question is: given a 5-digit
ZIP code, can I impute the county and, if so, how?
The answer to the question of whether it can be done is a qualified
"yes". Based on a ZIP equivalency file that we built from a census
block to ZIP database provided by the U.S. Census Bureau, we generated
the frequency report in Table 1 below. The cross-tabulation looks at
the frequency of the variable "PCTCO1", defined as the percentage of a
ZIP code's population that is also within its "primary" county. The
primary county is defined as that with which the ZIP shares the largest
portion of its population. The report shows that 87.5% of all ZIPs are
entirely within a single county, and that another 6.5% have between 90
and 99.99% of their populations within a single county. So if you build
a table with ZIP code as the argument and the primary county (including
state) as the result, you will be assigning the correct code every
time for 87.5% of the ZIPs and you'll be correct better than 90% of the
time for another 6.5%.
The columns of the table correspond to a variable we assigned based
on the portion of the each ZIP's population that resided in a metro
area. "Mixed areas" were ZIPs that had at least some population both
inside and outside a metro area. Note that these ZIP codes have the
greatest degree of county-straddling, while those that are entirely
within metro areas are least likely to cross a county boundary.
So what's the answer to the question of can we assign the county
based on ZIP? The table is interesting but a little confusing. What
do I say to my user when they ask me how reliable my "county lookup"
technique is? The answer is 98.2%. We arrived at this by simply
going through our 29,472 ZIP codes and taking the average of PCTCO1
using the 1990 total population of the ZIP as the weighting variable.
This way a ZIP code like 63136 with almost 55,000 people will count
more than ten times more than 63140 with fewer than 500 people. What
it means is that if you use the primary-county table to assign the
county to the ZIP code you should be right about 98.2% of the time.
(Its not really quite that simple, of course; these ZIPs are from 1991
and do not include special non-residential ZIPs -- and ZIP code
boundaries constantly change at the whim of the post office.)
You can access the ZIP-to-primary-county table stored in the form
of a SAS format code via anonymous FTP at ftp.ciesin.org using:
cd /pub/census/usa
cd zipeq/0code
get zipcnty.sas
This format code can be used in a SAS program with the PUT function to
do a table lookup. For example:
attrib county length=$5 label='FIPS county code';
county=put(zip,$zipcnty5.); *<--truncate returned value;
The format also returns the name of the county but it can easily be
truncated to get just the code. Related modules in the same directory
are zipname.sas (returns a geographic name associated with the ZIP),
zipmetro.sas (returns the metro area code of the ZIP) and
zipread.me (background notes on source and usage of these tools.)
TABLE 1
Pct of ZIP Code in Primary County by Metro/Non-metro Status 1
Source: 1990 STF3B ZIP Equivalency Files
TABLE OF PCTCO1 BY MNM
PCTCO1(% ZIP in primary co.) MNM(Metro/non-metro)
Frequency|
Row Pct |In mixed|In metro|Nonmetro|
Col Pct | area | area | area | Total
---------+--------+--------+--------+
0 to 50 | 29 | 23 | 65 | 117
| 24.79 | 19.66 | 55.56 |
| 3.92 | 0.19 | 0.40 |
---------+--------+--------+--------+
50 to 67 | 169 | 168 | 251 | 588
| 28.74 | 28.57 | 42.69 |
| 22.87 | 1.35 | 1.54 |
---------+--------+--------+--------+
67-90 | 371 | 316 | 617 | 1304
| 28.45 | 24.23 | 47.32 |
| 50.20 | 2.54 | 3.78 |
---------+--------+--------+--------+
90-99.9+ | 138 | 643 | 882 | 1663
| 8.30 | 38.67 | 53.04 |
| 18.67 | 5.17 | 5.41 |
---------+--------+--------+--------+
100% | 32 | 11277 | 14491 | 25800
| 0.12 | 43.71 | 56.17 |
| 4.33 | 90.75 | 88.87 |
---------+--------+--------+--------+
Total 739 12427 16306 29472
2.51 42.17 55.33 100.00