Date: Wed, 18 Jul 2007 12:29:44 -0500
Reply-To: "Peck, Jon" <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Peck, Jon" <firstname.lastname@example.org>
Subject: Re: recoding
Content-Type: text/plain; charset="UTF-8"
If you can use programmability, this kind of task is pretty susceptible to automation. There are several tools available in Python or in SPSS Developer Central modules (www.spss.com.devcentral) that can help. I'll note here just a few ideas.
-regular expressions to pick up patterns of the common misspellings.
-if you have a list of all the valid drug names, try spell correction based on
levenshteindistance: calculate similarity between two strings or even
soundex: calculate the soundex value of a string (a rough phonetic encoding)
simple Python code to look through lists and pick a new value.
The trans.py and extendedTransforms.py modules help you integrate this approach into SPSS transformations.
Putting programmability aside, you could check the values by applying the SPSS Data Validation module, which lets you define the valid answers.
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Graham Wright
Sent: Wednesday, July 18, 2007 12:13 PM
Subject: Re: [SPSSX-L] recoding
We've done similar things like this in the following way (which may or
may not be appropriate in your case):
Sort the data by the string variable in question, so that all the valid
responses are at the top, then save a version of the dataset that only
has that variable and some sort of unique ID variable. Then open the new
file, make a new variable in the blank column next to the string
variable and then go down the list manually in the "data view" screen
and just put a number equal to what you want the new category to be, so
next to "Asprin" and "asperin" put 1's and next to "Zoloft" and
"zolofft" put 2's or whatever. Then get rid of the column with the
string in it and merge this file (with just the ID var and the new
column) back into the original dataset (using "add variables" and "Match
cases on key variables" using the ID variable as the key) and you have a
new variable that is a recode of the original open ended question.
This way seems like a lot of work, but it actually goes pretty quickly
and I'm pretty sure it's faster than writing
"recode ("Asprin" "asperin" "Aasperin".....etc=1) ("Zoloft",
"zoloft...etc=2)" and having to retype everyone's answers. Plus you
don't run into any sort of limit on recode statements.
Noble, Lyndsay wrote:
> You know, I can't think of a more efficient way than recoding, but 600
> recodes seems daunting. Maybe someone else on the list serve has an idea?
> -----Original Message-----
> From: saygili ayca [mailto:email@example.com]
> Sent: Wednesday, July 18, 2007 9:48 AM
> To: Noble, Lyndsay
> Subject: RE: recoding
> I appreciate if you know an efficient way:
> There is a variable that contains drug name. But in
> CRF it is open ended question. There are a lot of
> miscoding error. We are trying to clean data by
> recoding. :))
> --- "Noble, Lyndsay" <Lyndsay.Noble@ejgallo.com>
>> There might be a more efficient way of doing it than
>> writing all of the
>> recode statements. What is the nature of the
>> -----Original Message-----
>> From: saygili ayca [mailto:firstname.lastname@example.org]
>> Sent: Wednesday, July 18, 2007 1:20 AM
>> To: Noble, Lyndsay
>> Subject: RE: recoding
>> thanx. we need around 600, that is why I asked. ---
>> "Noble, Lyndsay" <Lyndsay.Noble@ejgallo.com> wrote:
>>> I've done up to around 70 and not had a problem.
>>> -----Original Message-----
>>> From: SPSSX(r) Discussion
>>> [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
>>> saygili ayca
>>> Sent: Tuesday, July 17, 2007 3:45 AM
>>> To: SPSSX-L@LISTSERV.UGA.EDU
>>> Subject: recoding
>>> Hi all,
>>> when we recode a variable into new variable, Is
>>> any limitation on number of coding?
>>> Building a website is a piece of cake. Yahoo!
>>> Business gives you all
>>> the tools to get online.
>> Be a better Globetrotter. Get better travel answers
>> from someone who knows.
>> Yahoo! Answers - Check it out.
> Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for
> today's economy) at Yahoo! Games.