LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2011)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 29 Dec 2011 16:20:02 -0800
Reply-To:     David Marso <david.marso@gmail.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         David Marso <david.marso@gmail.com>
Subject:      Re: Specify criteria for replacing missing data
In-Reply-To:  <4EFC513C020000150074C272@mail.acponline.org>
Content-Type: text/plain; charset=UTF-8

For part 1 consider using LAG within a DO REPEAT structure. For part 2 here is some completely UNTESTED code which will most likely work like a charm (or NOT ;-). *NOTE* I am not doing *ANYTHING* to address the other 250 odd variables. You will want to segregate them and match them back later by Organization YEAR AGE etc so you don't cobber them with incoming data from the replacement cases. --- COMPUTE SCRAMBLE=UNIFORM(1). SORT CASES BY REPLACECODE AGE REGION SCRAMBLE. COMPUTE YOKED=SUM(1, (REPLACECODE EQ LAG(REPLACECODE)) * (AGE EQ LAG(AGE)) * (REGION EQ LAG(REGION)) * LAG(YOKED) ). DO IF REPLACECODE = 2. DO REPEAT COPY=@ORG @YEAR @AGE @REGION @REPLACECODE / SOURCE=ORGANIZATION YEAR AGE REGION REPLACECODE . COMPUTE COPY = SOURCE. END REPEAT. XSAVE OUTFILE "C:\TEMP\REPLACECODE2.SAV" / KEEP AGE REGION SCRAMBLE @ORG @YEAR @AGE @REGION @REPLACECODE. END IF. EXECUTE. SELECT IF REPLACECODE NE 2 AND TOTALCITIES GT 0. MATCH FILES / FILE * / DROP @ORG @YEAR @AGE @REGION @REPLACECODE. SAVE OUTFILE "C:\TEMP\ALLBUTREPLACECODE2.SAV". MATCH FILES / FILE * / FILE "C:\TEMP\REPLACECODE2.SAV" / BY AGE REGION SCRAMBLE . SELECT IF NOT(MISSING(@REPLACECODE)). COMPUTE DATAREPLACEDBY = ORGANIZATION. COMPUTE ORGANIZATION=@ORG. ADD FILES / FILE * / FILE="C:\TEMP\ALLBUTREPLACECODE2.SAV".

Dawn Wiest wrote > > Dear listserv, > I have a dataset with 15, 000 cases (organizations) and 400 variables. > The data were collected over a 10-year period. A single organization can > appear in the dataset for 10 years or for fewer years, depending when it > began to meet the criteria for inclusion. For a number of organizations > (roughly 5% of total), I need to replace data on 150 variables that > appear sequentially in the dataset. The variables are cities in which an > organization has members or affiliates, named City1 to City150 in the > dataset, where 1 = organization has members/affiliates in the city and > 0= organization has no known members/affiliates in the city. > The organizations for which the data need to be replaced with data from > another organization are given a particular code on a variable > “Replacecode”, depending on the method of replacement required. Data > are only replaced if another variable, “TotalCities” is equal to 0. If > TotalCities=0 and Replacecode=1, the data for City1-City150 will be > replaced with data from another occurrence of this organization for the > most proximate year for which TotalCities is not equal to 0. If there is > data for the same number of years prior to the missing year and the same > number of years after the missing year, data should always be replaced > with prior year data. In the table below, Organization A, year 2008, is > an example of an organization that requires this type of replacement. > If Replacecode=2, it means that City data are missing for all > occurrences of the organization in the dataset and that the data for > City1 – City150 should be replaced with data from a similar > organization. For example, in year 2010, organization B in the table > below was 1 year old and based in Region 3. For all organizations like > organization B (Age=1; Region=3) with missing data, data for > City1-City150 should be replaced with data from another organization > that is 1 year old and based in Region 3 (in this example, Organization > H matches the criteria). Ideally, for all cases where Replacecode=2, the > city from which data are copied to replace missing data will be randomly > selected from all cities in the dataset that match the criteria we > specify. > > What is the most efficient way to replace these data according to the > specifications outlined above? > > To give you an idea what the data look like: > > Organization > Year > Age > Region > Replacecode > City1 > City2 > City3 > City4 > City150 > TotalCities > > A > 2003 > 5 > 1 > 0 > 1 > 1 > 0 > 0 > 0 > 2 > > A > 2004 > 6 > 1 > 0 > 1 > 1 > 0 > 0 > 0 > 2 > > A > 2005 > 7 > 1 > 0 > 1 > 1 > 0 > 0 > 0 > 2 > > A > 2006 > 8 > 1 > 0 > 1 > 1 > 0 > 0 > 0 > 2 > > A > 2007 > 9 > 1 > 0 > 1 > 1 > 0 > 0 > 0 > 2 > > A > 2008 > 10 > 1 > 1 > 0 > > A > 2009 > 11 > 1 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > B > 2010 > 1 > 3 > 2 > 0 > > B > 2011 > 2 > 3 > 2 > 0 > > C > 2011 > 15 > 1 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > D > 2008 > 33 > 5 > 0 > 1 > 0 > 0 > 0 > 0 > 1 > > D > 2009 > 34 > 5 > 0 > 1 > 1 > 1 > 0 > 0 > 3 > > E > 2002 > 26 > 2 > 0 > 0 > 1 > 1 > 0 > 0 > 2 > > E > 2003 > 27 > 2 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > E > 2004 > 28 > 2 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > E > 2005 > 29 > 2 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > E > 2006 > 30 > 2 > 0 > 1 > 1 > 1 > 1 > 1 > 5 > > E > 2007 > 31 > 2 > 1 > 0 > > E > 2008 > 32 > 2 > 0 > 1 > 1 > 1 > 1 > 1 > 5 > > F > 2004 > 4 > 3 > 0 > 0 > 0 > 1 > 1 > 0 > 2 > > G > 2005 > 15 > 5 > 0 > 1 > 1 > 1 > 1 > 1 > 5 > > G > 2006 > 16 > 5 > 0 > 1 > 1 > 1 > 1 > 1 > 5 > > H > 2011 > 1 > 3 > 0 > 1 > 0 > 0 > 0 > 0 > 1 > > > Thank you for your help. > > Dawn Wiest, Ph.D. > Senior Research Analyst > Research Center - 7th Floor > American College of Physicians > 190 North Independence Mall West Philadelphia, PA 19106-1572 > > DWiest@.acponline > (215) 351-2561 >

-- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Specify-criteria-for-replacing-missing-data-tp5108094p5108945.html Sent from the SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Back to: Top of message | Previous page | Main SPSSX-L page