Date: Mon, 4 Dec 2006 17:03:09 -0500
Reply-To: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject: Re: Data cleaning problem
Content-Type: text/plain; charset="UTF-8"
this is a database design problem
you need two tables:
Id is the link between the two tables
your data entry person is not quite as [insert adj here]
as you think:
"Oh, Degree Granting Institute, like Birth-Date,
hasn't changed for this person!"
you'll eventually get rather better advise than redesigning your database structure.
in the meantime consider:
first. and last.Id processing
retain PrevDegree-Instit '.'
if Degree-Instit ne '.' then PrevDegree-Instit = Degree-Instit;
if Degree-Instit eq '.' then Degree-Instit = PrevDegree-Instit;
Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
> From: Pat Moore
> I have a data file which contains information on faculty
> members. The file is updated annually, but not consistently.
> Each record consists of the faculty memberâ€™s ID number, and
> a number of data entries such as rank, degree institution, etc.
> Most of the data is ok, but there are two situations where I
> need to inspect, and, possibly update the data: Here are
> examples of the two cases:
> Case 1:
> OBS Term ID Rank Other data entries
> 1 2001 12345 Assist abc def ghi
> 2 2002 12345 Assist abc def ghi
> 3 2003 12345 Assist abc def ghi
> 4 2004 12345 Assoc abc def ghi
> The chances are, this is correct, and I could confirm this if
> I had a way to get the third and fourth observations into a
> separate dataset, thus reducing the observations I need to
> inspect to the few that have situations like this. I know how
> to get *duplicated* records into a separate dataset, but, for
> this purpose, I need to get two *subsequent* records into a
> dataset, where the later one differs from the earlier.
> Case 2:
> OBS Term ID Rank Degree Institution Other data entries
> 1 2001 45678 Assist Univ of Chicago abc def ghi
> 2 2002 45678 Assist Univ of Chicago abc def ghi
> 3 2003 45678 Assist Univ of Chicago abc def ghi
> 4 2004 45678 Assoc . abc def ghi
> As each term occurs, the data are re-entered, and there are
> inconsistencies from term to term for many faculty members.
> In case 2, the data entry clerk has changed. Clerk one was
> good about entering the degree institution, but clerk 2 is
> careless, and has left it blank. In these cases, I need to
> fill in the blanks, where applicable. If someone graduated
> from the University of Chicago, and taught in 2001 through
> 2003, they still graduated from the University of Chicago
> when they taught in 2004. I want to find those blanks for
> those particular professors, and pick up the preceding
> termâ€™s value, but only for those columns in the data where
> change is unlikely.
> Pat Moore