LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 18 Jul 2007 12:29:44 -0500
Reply-To:     "Peck, Jon" <peck@spss.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Peck, Jon" <peck@spss.com>
Subject:      Re: recoding
In-Reply-To:  A<469E4A32.1040306@brandeis.edu>
Content-Type: text/plain; charset="UTF-8"

If you can use programmability, this kind of task is pretty susceptible to automation. There are several tools available in Python or in SPSS Developer Central modules (www.spss.com.devcentral) that can help. I'll note here just a few ideas.

-regular expressions to pick up patterns of the common misspellings. -if you have a list of all the valid drug names, try spell correction based on levenshteindistance: calculate similarity between two strings or even soundex: calculate the soundex value of a string (a rough phonetic encoding)

simple Python code to look through lists and pick a new value.

The trans.py and extendedTransforms.py modules help you integrate this approach into SPSS transformations.

Putting programmability aside, you could check the values by applying the SPSS Data Validation module, which lets you define the valid answers.

HTH, Jon Peck

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Graham Wright Sent: Wednesday, July 18, 2007 12:13 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: [SPSSX-L] recoding

We've done similar things like this in the following way (which may or may not be appropriate in your case): Sort the data by the string variable in question, so that all the valid responses are at the top, then save a version of the dataset that only has that variable and some sort of unique ID variable. Then open the new file, make a new variable in the blank column next to the string variable and then go down the list manually in the "data view" screen and just put a number equal to what you want the new category to be, so next to "Asprin" and "asperin" put 1's and next to "Zoloft" and "zolofft" put 2's or whatever. Then get rid of the column with the string in it and merge this file (with just the ID var and the new column) back into the original dataset (using "add variables" and "Match cases on key variables" using the ID variable as the key) and you have a new variable that is a recode of the original open ended question.

This way seems like a lot of work, but it actually goes pretty quickly and I'm pretty sure it's faster than writing "recode ("Asprin" "asperin" "Aasperin".....etc=1) ("Zoloft", "zoloft...etc=2)" and having to retype everyone's answers. Plus you don't run into any sort of limit on recode statements.

-Graham

Noble, Lyndsay wrote: > You know, I can't think of a more efficient way than recoding, but 600 > recodes seems daunting. Maybe someone else on the list serve has an idea? > > Lyndsay > > > -----Original Message----- > From: saygili ayca [mailto:aycasaygili@yahoo.com] > Sent: Wednesday, July 18, 2007 9:48 AM > To: Noble, Lyndsay > Subject: RE: recoding > > I appreciate if you know an efficient way: > > There is a variable that contains drug name. But in > CRF it is open ended question. There are a lot of > miscoding error. We are trying to clean data by > recoding. :)) > > --- "Noble, Lyndsay" <Lyndsay.Noble@ejgallo.com> > wrote: > > >> There might be a more efficient way of doing it than >> writing all of the >> recode statements. What is the nature of the >> situation? >> >> Lyndsay >> >> -----Original Message----- >> From: saygili ayca [mailto:aycasaygili@yahoo.com] >> Sent: Wednesday, July 18, 2007 1:20 AM >> To: Noble, Lyndsay >> Subject: RE: recoding >> >> thanx. we need around 600, that is why I asked. --- >> "Noble, Lyndsay" <Lyndsay.Noble@ejgallo.com> wrote: >> >> >>> Hi, >>> I've done up to around 70 and not had a problem. >>> >>> Lyndsay >>> -----Original Message----- >>> From: SPSSX(r) Discussion >>> [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of >>> saygili ayca >>> Sent: Tuesday, July 17, 2007 3:45 AM >>> To: SPSSX-L@LISTSERV.UGA.EDU >>> Subject: recoding >>> >>> Hi all, >>> >>> when we recode a variable into new variable, Is >>> there >>> any limitation on number of coding? >>> >>> Thanks >>> >>> >>> >>> >>> >>> > _____________________________________________________________________________ > >>> _______ >>> Building a website is a piece of cake. Yahoo! >>> >> Small >> >>> Business gives you all >>> the tools to get online. >>> http://smallbusiness.yahoo.com/webhosting >>> >>> >> >> >> >> > _____________________________________________________________________________ > >> _______ >> Be a better Globetrotter. Get better travel answers >> from someone who knows. >> Yahoo! Answers - Check it out. >> >> > http://answers.yahoo.com/dir/?link=list&sid=396545469 > > > > > > _____________________________________________________________________________ > _______ > Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for > today's economy) at Yahoo! Games. > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow >


Back to: Top of message | Previous page | Main SPSSX-L page