LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1997, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 23 Apr 1997 09:40:13 CDT
Reply-To:     John Blodgett <C1921@UMSLVMA.UMSL.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         John Blodgett <C1921@UMSLVMA.UMSL.EDU>
Subject:      ZIP to County, etc.
Comments: To: sas-l@uga.UMSL.EDU

This is in response to Steve Des Jardin's posting re how to relate ZIPs to counties in SAS. The short answer is to use a SAS format code available via anonymous ftp at in directory pub/sasfmats as file . A more detailed summary of the general problem and some tools that we have developed for helping to deal with it is attached. It was posted to this list last September, but not much has changed so its worth posting again.

John Blodgett Urban Information Center / 211 Lucas Hall University of Missouri - St. Louis 8001 Natural Bridge Rd. St. Louis, Mo 63121-4499 Phone: (314) 516-6014/6000 FAX: 516-6274

<===============note originally posted 9-96================= This posting was prompted by the recent thread regarding the SAS ballot item about a function to return the city name for ZIP codes. It has been pointed out that such a function would not be reliable and could not be kept up to date (ZIPs span cities and are always changing.) And a followup reply brought up the FAQ about relating ZIPs to counties.

I agree that a SAS function is probably not the best way to handle the problem. But it certainly would be a useful feature to have a way of relating the current ZIPs to cities and counties. I think a SAS format would be the better tool to accomplish this. You can modify and update a SAS format a lot easier than waiting for SAS I. to update a built-in function. And you'd have a lot more control over how it worked.

There are a series of FTP'able SAS format codes available that you can use at least as "starter" code to build your custom table-lookups. These were created based on the USPS city-state file circa 6/94. We shall probably update them some time next year. FTP to our site at (login as anonymous) and use cd pub/sasfmats get <--note upcase "S" to emulate "$". This code takes 5-digit ZIP arg & retruns FIPS county code + county name. This is the "primary" county associated with the ZIP. See article. get <--A place name to associate with the ZIP. Huge format. Not rigorous but very practical. get <--From the county you can get the metro area. get <--Get the state abbreviation. Now if you have a mailing list with ZIP codes you can drop city and state (unless you need them for something else) and when you generate the mailing address the last line becomes: put zip $zipnmus. +1 zip $zipstab. +1 zip ;

There will always be ZIPs that do not show up in these format tables but since you have the source code you can look these up and add your own entries. To quickly see if you have any zips on your dataset that are not in the format table (this assumes you have your zip code stored as a $5 variable named "zip") use: proc sql; select unique zip from mysas.customer where put(zip,$zipnmus.) eq zip; (If the zip code is not found the format returns the argument.) To select all your customers living in ZIPs that are "mostly" in a list of counties you can use: proc sql; select * from mysas.customer where put(zip,$zipcnty5.) in ('29019','29133','29211');

or to select customers in ZIPs that are in the Kansas City MSA use: proc sql; select * from mysas.customer where put(zip,$zipmetr4.) eq '3760';

or to add state-abbreviation and primary county codes to a dataset with just zips: proc sql; create view withcods as select *, put(zip,$zipcnty5.) format=$fipcnty16. as county, put(zip,$zipstab2.) as state from mysas.customer; *--the source code for $fipcnty is also available as file from the same sasfmats directory mentioned above; *--and now do a freq report on zips with the county + state ids; select zip, county, state, count(*) as n from withcods group by zip;

These are good practical "close enough for business" applications. If you need something more rigorous that relates ZIP codes to detailed cities, counties, urbanized areas, census tracts, etc. then you should check out the 1990 zip equivalency files stored in the census archive at CIESIN - described in the attached article.

------------------------------------------------------------ ZIP Codes and Counties ======================

Ever had an address file with ZIP codes that you wanted to analyze based on state and county geography? The question is: given a 5-digit ZIP code, can I impute the county and, if so, how? The answer to the question of whether it can be done is a qualified "yes". Based on a ZIP equivalency file that we built from a census block to ZIP database provided by the U.S. Census Bureau, we generated the frequency report in Table 1 below. The cross-tabulation looks at the frequency of the variable "PCTCO1", defined as the percentage of a ZIP code's population that is also within its "primary" county. The primary county is defined as that with which the ZIP shares the largest portion of its population. The report shows that 87.5% of all ZIPs are entirely within a single county, and that another 6.5% have between 90 and 99.99% of their populations within a single county. So if you build a table with ZIP code as the argument and the primary county (including state) as the result, you will be assigning the correct code every time for 87.5% of the ZIPs and you'll be correct better than 90% of the time for another 6.5%. The columns of the table correspond to a variable we assigned based on the portion of the each ZIP's population that resided in a metro area. "Mixed areas" were ZIPs that had at least some population both inside and outside a metro area. Note that these ZIP codes have the greatest degree of county-straddling, while those that are entirely within metro areas are least likely to cross a county boundary.

So what's the answer to the question of can we assign the county based on ZIP? The table is interesting but a little confusing. What do I say to my user when they ask me how reliable my "county lookup" technique is? The answer is 98.2%. We arrived at this by simply going through our 29,472 ZIP codes and taking the average of PCTCO1 using the 1990 total population of the ZIP as the weighting variable. This way a ZIP code like 63136 with almost 55,000 people will count more than ten times more than 63140 with fewer than 500 people. What it means is that if you use the primary-county table to assign the county to the ZIP code you should be right about 98.2% of the time. (Its not really quite that simple, of course; these ZIPs are from 1991 and do not include special non-residential ZIPs -- and ZIP code boundaries constantly change at the whim of the post office.)

You can access the ZIP-to-primary-county table stored in the form of a SAS format code via anonymous FTP at using: cd /pub/census/usa cd zipeq/0code get This format code can be used in a SAS program with the PUT function to do a table lookup. For example: attrib county length=$5 label='FIPS county code'; county=put(zip,$zipcnty5.); *<--truncate returned value; The format also returns the name of the county but it can easily be truncated to get just the code. Related modules in the same directory are (returns a geographic name associated with the ZIP), (returns the metro area code of the ZIP) and (background notes on source and usage of these tools.)

TABLE 1 Pct of ZIP Code in Primary County by Metro/Non-metro Status 1 Source: 1990 STF3B ZIP Equivalency Files


PCTCO1(% ZIP in primary co.) MNM(Metro/non-metro)

Frequency| Row Pct |In mixed|In metro|Nonmetro| Col Pct | area | area | area | Total ---------+--------+--------+--------+ 0 to 50 | 29 | 23 | 65 | 117 | 24.79 | 19.66 | 55.56 | | 3.92 | 0.19 | 0.40 | ---------+--------+--------+--------+ 50 to 67 | 169 | 168 | 251 | 588 | 28.74 | 28.57 | 42.69 | | 22.87 | 1.35 | 1.54 | ---------+--------+--------+--------+ 67-90 | 371 | 316 | 617 | 1304 | 28.45 | 24.23 | 47.32 | | 50.20 | 2.54 | 3.78 | ---------+--------+--------+--------+ 90-99.9+ | 138 | 643 | 882 | 1663 | 8.30 | 38.67 | 53.04 | | 18.67 | 5.17 | 5.41 | ---------+--------+--------+--------+ 100% | 32 | 11277 | 14491 | 25800 | 0.12 | 43.71 | 56.17 | | 4.33 | 90.75 | 88.87 | ---------+--------+--------+--------+ Total 739 12427 16306 29472 2.51 42.17 55.33 100.00

Back to: Top of message | Previous page | Main SAS-L page