LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 4 Dec 2006 17:03:09 -0500
Reply-To:     "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject:      Re: Data cleaning problem
Content-Type: text/plain; charset="UTF-8"

imo this is a database design problem you need two tables: Id_Degree_Instit Id_Term

Id is the link between the two tables

your data entry person is not quite as [insert adj here] as you think: "Oh, Degree Granting Institute, like Birth-Date, hasn't changed for this person!"

you'll eventually get rather better advise than redesigning your database structure. in the meantime consider: first. and last.Id processing with a retain PrevDegree-Instit '.' if Degree-Instit ne '.' then PrevDegree-Instit = Degree-Instit; else if Degree-Instit eq '.' then Degree-Instit = PrevDegree-Instit;

Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov

> From: Pat Moore > I have a data file which contains information on faculty > members. The file is updated annually, but not consistently. > Each record consists of the faculty member’s ID number, and > a number of data entries such as rank, degree institution, etc. > > Most of the data is ok, but there are two situations where I > need to inspect, and, possibly update the data: Here are > examples of the two cases: > > Case 1: > OBS Term ID Rank Other data entries > 1 2001 12345 Assist abc def ghi > 2 2002 12345 Assist abc def ghi > 3 2003 12345 Assist abc def ghi > 4 2004 12345 Assoc abc def ghi > > The chances are, this is correct, and I could confirm this if > I had a way to get the third and fourth observations into a > separate dataset, thus reducing the observations I need to > inspect to the few that have situations like this. I know how > to get *duplicated* records into a separate dataset, but, for > this purpose, I need to get two *subsequent* records into a > dataset, where the later one differs from the earlier. > > Case 2: > OBS Term ID Rank Degree Institution Other data entries > 1 2001 45678 Assist Univ of Chicago abc def ghi > 2 2002 45678 Assist Univ of Chicago abc def ghi > 3 2003 45678 Assist Univ of Chicago abc def ghi > 4 2004 45678 Assoc . abc def ghi > > As each term occurs, the data are re-entered, and there are > inconsistencies from term to term for many faculty members. > In case 2, the data entry clerk has changed. Clerk one was > good about entering the degree institution, but clerk 2 is > careless, and has left it blank. In these cases, I need to > fill in the blanks, where applicable. If someone graduated > from the University of Chicago, and taught in 2001 through > 2003, they still graduated from the University of Chicago > when they taught in 2004. I want to find those blanks for > those particular professors, and pick up the preceding > term’s value, but only for those columns in the data where > change is unlikely. > > Thanks, > > Pat Moore > pat.moore@uas.alaska.edu > >


Back to: Top of message | Previous page | Main SAS-L page