Date: Fri, 8 Mar 1996 23:31:56 -0800
Reply-To: Karsten Self <karsten@NEWAGE1.STANFORD.EDU>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Karsten Self <karsten@NEWAGE1.STANFORD.EDU>
Subject: solution & thanks - SSN distribution & interpretation
A few days behind the times having wrapped at Stanford, getting ready for
SUGI. BTW -- weather update for Chicago:
Saturday Hi: 28 Lo: 19 Sunny
Sunday Hi: 41 Lo: 26 Sunny
Monday Hi: 48 Lo: 35 Sunny
Tuesday Hi: 46 Lo: 37 Sunny
Wednesday Hi: 51 Lo: 39 Rain
First, thanks to 'Doc' Muhlbaier, Ron (rbcrosic@cbdcom.apgea.army.mil),
Chris Scott, Tom Soeder, Al Stone, Richard Epstein, Andrew Cary, Joel
Achtenberg (not the 'why things are' dude), Matthew Zack, and others, for
your responses.
'Computer Related RISKS' (Peter G. Neumann -- moderator of comp.risks
on usenet, Addison Wesley, 1995) makes several interesting
observations and anecdotes about SSN's. The book is currently on
unauthorized loan to me by my ex-roommate in Menlo Park. I guess I'm
just a guy who's know to take RISKS.... I also heartily plug the
book, and comp.risks for everyone out there.
My question was on how to generate roughly equivalent subsets of SSN's
for subsequent processing. Collate order was important, so I had to
select data by the left side rather than the right side of the SSN
(several people noted that when random sampling, you want to take, say,
some all SSN's ending in '30', '45', '66', and '87'. This will
generate a sample which is distributed in time and space, whereas
sampling from the beginning of SSN will chose individuals who were all
(when they received their SSN) in the same national region.)
Matthew gave me the specific data I was looking for -- triads and quartiles
for SSN distribution. Matthew is at CDC and is probably working with
the same types of data I am (was) -- Medicare:
> In a 5% systematic sample of 1991 Medicare enrollees 65 years old or
> older, the breakdown of the first three digits of the SSN/Railroad
> Retirement number was the following:
>
> Three digit range Percent range
>
> <= 173 0.0 - 24.9%
> 174-326 25.0 - 50.0%
> 327-461 50.1 - 75.0%
> >= 462 75.1 - 100.0%
>
> <= 232 0.0 - 33.3%
> 233-425 33.4 - 66.5%
> >= 426 66.6-100.0%
>
> Matthew Zack
Chris and Al both forwarded me a detailed description of how SSN's are
generated.
Structure of Social Security Numbers
by Chris Hibbert
A Social Security Number (SSN) consists of nine digits, commonly written as
three fields separated by hyphens: AAA-GG-SSSS. The first three-digit field
is called the "area number". The central, two-digit field is called the
"group number". The final, four-digit field is called the "serial
number".
The process of assigning numbers has been changed at least twice.
Until 1965, only half the group numbers were used. Before 1972,
numbers were assigned by field offices; since 1972, they have all been
assigned by the central office. The order in which numbers were
assigned was changed in the 1972 transition. There may have been
other changes, but it's difficult to get information on how things
used to be done.
Area Numbers
The area numbers are assigned to geographical locations. They were
originally assigned the same way that zip codes were later assigned
(in particular, area numbers increase from east to west across the
continental US as do the ZIP codes). Most area numbers were assigned
according to state (or territorial) boundaries, although the series
700-729 was assigned to railroad workers regardless of location (this
series of area numbers was discontinued in 1964 and is no longer used
for new SSNs). Area numbers assigned prior to 1972 are an indication
of the SSA office which originally issued the SSN. Since 1972 the
area number in SSNs corresponds to the residence address given by the
applicant on the application for the SSN.
In many regions the original range of area number assignments was
eventually exhausted as population grew. The original area number
assignments have been augmented as required. All of the original
assignments were less than 585 (except for the 700-729 railroad worker
series mentioned above). Area numbers of "000" have never been
issued.
001-003 NH 400-407 KY 530 NV
004-007 ME 408-415 TN 531-539 WA
008-009 VT 416-424 AL 540-544 OR
010-034 MA 425-428 MS 545-573 CA
035-039 RI 429-432 AR 574 AK
040-049 CT 433-439 LA 575-576 HI
050-134 NY 440-448 OK 577-579 DC
135-158 NJ 449-467 TX 580 VI Virgin Islands
159-211 PA 468-477 MN 581-584 PR Puerto Rico
212-220 MD 478-485 IA 585 NM
221-222 DE 486-500 MO 586 PI Pacific Islands*
223-231 VA 501-502 ND 587-588 MS
232-236 WV 503-504 SD 589-595 FL
237-246 NC 505-508 NE 596-599 PR Puerto Rico
247-251 SC 509-515 KS 600-601 AZ
252-260 GA 516-517 MT 602-626 CA
261-267 FL 518-519 ID 627-645 TX
268-302 OH 520 WY 646-647 UT
303-317 IN 521-524 CO 648-649 NM
318-361 IL 525 NM *Guam, American Samoa,
362-386 MI 526-527 AZ Philippine Islands,
387-399 WI 528-529 UT Northern Mariana Islands
650-699 unassigned, for future use
700-728 Railroad workers through 1963, then discontinued
729-799 unassigned, for future use
800-999 not valid SSNs. Some sources have claimed that numbers
above 900 were used when some state programs were converted
to federal control, but current SSA documents claim no
numbers above 799 have ever been used.
Group Numbers
The group number is not related to geography but rather to the order
in which SSNs are issued for a particular area. Before 1965, only
half the group numbers were used: odd numbers were used below 10 and
even numbers were used above 9. In 1965 the system was changed so
assignments continued with the low even numbers and the high odd
numbers. So, group numbers for each area number are assigned in the
following order:
1. Odd numbers, 01 to 09
2. Even numbers, 10 to 98
3. Even numbers, 02 to 08
4. Odd numbers, 11 to 99
Group codes of "00" aren't assigned
In each region, all possible area numbers are assigned with each group
number before using the next group number. This means the group
numbers can be used to find a chronological ordering of SSNs within a
region. When new group numbers are assigned to a state, the old
numbers are usually used up first.
SSA publishes a list every month of the highest group assigned for
each SSN Area. For example, if the highest group assigned for area
999 is 72, then we know that the number 999-04-1234 is an invalid
number because even Groups under 9 have not yet been assigned.
Serial Numbers
Serial numbers are assigned in chronological order within each aread
and group number as the applications are processed. Serial number
"0000" is never used. Before 1965, when number assignment was
transferred from field offices to the central office, serial numbers
may have been assigned in a strange order. (Some sources claim that
2000 and 7000 series numbers were assigned out of order. That no
longer seems to be the case.) Currently, the serial numbers are
assigned in strictly increasing order with each area and group
combination.
Invalid SSNs
Any SSN conforming to one of the following criteria is an invalid number:
1. Any field all zeroes (no field of zeroes is ever assigned).
2. First three digits above 740
A pamphlet entitled "The Social Security Number" (Pub. No.
05-10633) provides an explanation of the SSN's structure and
the method of assigning and validating Social Security numbers.
This description of the structure of the Social Security Number is
based on messages written by Jerry Crow and Barbara Bennett. The
information has been verified by its correspondence to the SSA's
Program Operations Manual System (POMS) Part 01, Chapter 001,
subchapter 01, which can be found at Federal Depository Libraries.
(SSA Pub. No. 68-0100201.)
---------------------------------------------
Karsten M. Self -- Analytic Programmer
PM Squared, Inc
250 Montgomery St., Suite 810
San Francisco, California 94104
New: KMSelf@ix.netcom.com
Old: Karsten@newage1.Stanford.EDU
What part of gestalt don't you understand?