```Date: Thu, 22 Oct 1998 10:14:37 -0400 Reply-To: "Fehd, Ronald J." Sender: "SAS(r) Discussion" From: "Fehd, Ronald J." Subject: info: freq of digits in social security numbers Content-Type: text/plain Dear SAS-folk: This is a frequency table that I put together a decade ago, when I had access to 40,000 obs of public school teachers in a state. I was using the social security number as part of an ID and needed to know which columns had a more or less uniform distribution. As you can see, the first three triplet and the middle duplet are not uniform, but each of the digits of the quartuple are fairly so, at least for constructing the ID that I needed. factoid: The ID I used was first last four digits of SSN, one digit representing first letter of last name, and a check digit. The digit representing the letter, I by-guess and by-golly after reviewing the Soundex algorithm, and some head-scratching over distribution of Last-Name in Atlanta phone book. A glimmer of creativity, so long ago! :-) Hm, what else did I learn in that exercise? Oh, yeah, don't use the first letter of the last name as part of the key. Some people get married and change their last name -- I haven't yet figured out why they do that, :-\ -- but it did necessitate not only giving them a new ID but also recoding their old ID in early files. Been There, Done That, Once Was Enough! Next time I'll use first letter of first name. Now where did I put that table of frequencies of Last Name? ... ... back to the phone book. ... :-p Best if viewed with monospace font like Courier. keys: ID SSN SS# identifier social-security Ascending Frequencies of Digits in Social Security Numbers N~=40,000 SS3 SS2 SS4 ---------------- ---------- ---------------------- 1 2 3 1 2 1 2 3 4 ---- ---- ---- ---- ---- ---- ---- ---- ---- 88.0 47.8 11.6 21.8 20.4 10.4 10.6 10.2 10.3 7.1 38.9 11.3 20.7 20.1 10.1 10.3 10.1 10.3 1.9 3.3 11.1 15.6 20.0 10.1 10.2 10.1 10.1 1.5 3.2 10.8 11.2 19.3 10.1 10.2 10.1 10.1 0.8 1.6 10.8 11.2 18.4 10.1 9.9 10.0 10.0 0.7 1.5 10.8 6.7 0.5 10.0 9.9 10.0 9.9 0.0 1.3 10.6 6.4 0.3 10.0 9.8 10.0 9.9 0.9 1.1 10.5 2.8 0.3 9.8 9.7 9.9 9.8 0.0 0.7 10.3 2.5 0.3 9.8 9.7 9.8 9.8 0.0 0.6 2.1 1.1 0.3 9.7 9.7 9.8 9.7 note: the frequencies are in ascending order for each digit: proc FREQ order=freq; the first row is _not_ the frequency of zero in each of the digits, but is the frequency of the most often occuring digit in that column. Ah, the elegant solutions to getting paper with valuable information off one's desk: store it in the public archives! Ron Fehd the macro maven CDC Atlanta GA ```

