Date: Thu, 22 Oct 1998 10:14:37 -0400
Reply-To: "Fehd, Ronald J." <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Fehd, Ronald J." <rjf2@CDC.GOV>
Subject: info: freq of digits in social security numbers
Content-Type: text/plain
Dear SAS-folk:
This is a frequency table that I put together a decade ago, when I had
access to 40,000 obs of public school teachers in a state. I was using the
social security number as part of an ID and needed to know which columns had
a more or less uniform distribution. As you can see, the first three triplet
and the middle duplet are not uniform, but each of the digits of the
quartuple are fairly so, at least for constructing the ID that I needed.
factoid: The ID I used was first last four digits of SSN, one digit
representing first letter of last name, and a check digit. The digit
representing the letter, I by-guess and by-golly after reviewing the Soundex
algorithm, and some head-scratching over distribution of Last-Name in
Atlanta phone book.
<sigh> A glimmer of creativity, so long ago! :-)
Hm, what else did I learn in that exercise? Oh, yeah, don't use the first
letter of the last name as part of the key. Some people get married and
change their last name -- I haven't yet figured out why they do that, :-\
-- but it did necessitate not only giving them a new ID but also recoding
their old ID in early files. Been There, Done That, Once Was Enough! Next
time I'll use first letter of first name. Now where did I put that table of
frequencies of Last Name? ... <sigh> ... back to the phone book. ... :-p
Best if viewed with monospace font like Courier.
keys: ID SSN SS# identifier social-security
Ascending Frequencies of Digits in Social Security Numbers
N~=40,000
SS3 SS2 SS4
---------------- ---------- ----------------------
1 2 3 1 2 1 2 3 4
---- ---- ---- ---- ---- ---- ---- ---- ----
88.0 47.8 11.6 21.8 20.4 10.4 10.6 10.2 10.3
7.1 38.9 11.3 20.7 20.1 10.1 10.3 10.1 10.3
1.9 3.3 11.1 15.6 20.0 10.1 10.2 10.1 10.1
1.5 3.2 10.8 11.2 19.3 10.1 10.2 10.1 10.1
0.8 1.6 10.8 11.2 18.4 10.1 9.9 10.0 10.0
0.7 1.5 10.8 6.7 0.5 10.0 9.9 10.0 9.9
0.0 1.3 10.6 6.4 0.3 10.0 9.8 10.0 9.9
0.9 1.1 10.5 2.8 0.3 9.8 9.7 9.9 9.8
0.0 0.7 10.3 2.5 0.3 9.8 9.7 9.8 9.8
0.0 0.6 2.1 1.1 0.3 9.7 9.7 9.8 9.7
note: the frequencies are in ascending order for each digit: proc FREQ
order=freq;
the first row is _not_ the frequency of zero in each of the digits,
but is the frequency of the most often occuring digit in that column.
Ah, the elegant solutions to getting paper with valuable information off
one's desk:
store it in the public archives!
Ron Fehd the macro maven CDC Atlanta GA