LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2011)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 12 May 2011 12:56:48 -0400
Reply-To:     Art@DrKendall.org
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Art Kendall <Art@DrKendall.org>
Organization: Social Research Consultants
Subject:      Re: String Matches
Comments: To: JKRockStomper <jgardner@rejis.org>
In-Reply-To:  <1305217095998-4390718.post@n5.nabble.com>
Content-type: text/html; charset=UTF-8

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> <font size="+1">Please give little more detail.¬ <br> <br> Please provide a small mock-up of the situation,¬ i.e, a small number of cases with the type of variables that the real data has.<br> <br> What does the input data look like?¬ If there a set of variables that have strings in them each of which would result in a single code?¬ Are there strings each of which would result in several codes?<br> Are there several different strings that would yield the same code?¬ ¬ Is there an order to the variables? Is there an order to the contents of the strings?<br> Are the substrings entered consistently, i.e., could a substring¬ be Robbery, robbery, robry, etc.<br> <br> Art Kendall<br> Social Research Consultants<br> <br> <br> </font><br> On 5/12/2011 12:18 PM, JKRockStomper wrote: <blockquote cite="mid:1305217095998-4390718.post@n5.nabble.com" type="cite"> <pre wrap="">Good morning

I have inherited some legacy code to convert criminal arrest literals to NCIC offense codes; the problem at hand is that it takes several hours, 11 hours, to do the conversion on 200,000 records. This code is made up of several thousand IF INDEX(arrest_string, ‚Äúsubstring‚ÄĚ) &gt; 0 statements.

I have been given the task of researching better approaches to the conversion process, possibly rewriting all 20,000 lines of code. I have thought about SOUNDEX or METAPHONE but do not want to go that route if the outcome is still going to be close to the same time or the results will not be a great match. Right now, we get about a 95% match on everything but I would like to increase this is at all possible.

I am wondering what others have used in the past or currently using to do a similar process?

-- View this message in context: <a class="moz-txt-link-freetext" href="http://spssx-discussion.1045642.n5.nabble.com/String-Matches-tp4390718p4390718.html">http://spssx-discussion.1045642.n5.nabble.com/String-Matches-tp4390718p4390718.html</a> Sent from the SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to <a class="moz-txt-link-abbreviated" href="mailto:LISTSERV@LISTSERV.UGA.EDU">LISTSERV@LISTSERV.UGA.EDU</a> (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

</pre> </blockquote> </body> </html>

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Back to: Top of message | Previous page | Main SPSSX-L page