Date: Thu, 12 May 2011 12:56:48 -0400
Reply-To: Art@DrKendall.org
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Art Kendall <Art@DrKendall.org>
Organization: Social Research Consultants
Subject: Re: String Matches
In-Reply-To: <1305217095998-4390718.post@n5.nabble.com>
Content-type: text/html; charset=UTF-8
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font size="+1">Please give little more detail. <br>
<br>
Please provide a small mock-up of the situation, i.e, a small
number of cases with the type of variables that the real data has.<br>
<br>
What does the input data look like? If there a set of variables
that have strings in them each of which would result in a single
code? Are there strings each of which would result in several
codes?<br>
Are there several different strings that would yield the same
code? Is there an order to the variables? Is there an order to
the contents of the strings?<br>
Are the substrings entered consistently, i.e., could a substring
be Robbery, robbery, robry, etc.<br>
<br>
Art Kendall<br>
Social Research Consultants<br>
<br>
<br>
</font><br>
On 5/12/2011 12:18 PM, JKRockStomper wrote:
<blockquote cite="mid:1305217095998-4390718.post@n5.nabble.com"
type="cite">
<pre wrap="">Good morning
I have inherited some legacy code to convert criminal arrest literals to
NCIC offense codes; the problem at hand is that it takes several hours, 11
hours, to do the conversion on 200,000 records. This code is made up of
several thousand IF INDEX(arrest_string, “substring”) > 0 statements.
I have been given the task of researching better approaches to the
conversion process, possibly rewriting all 20,000 lines of code. I have
thought about SOUNDEX or METAPHONE but do not want to go that route if the
outcome is still going to be close to the same time or the results will not
be a great match. Right now, we get about a 95% match on everything but I
would like to increase this is at all possible.
I am wondering what others have used in the past or currently using to do a
similar process?
--
View this message in context: <a class="moz-txt-link-freetext" href="http://spssx-discussion.1045642.n5.nabble.com/String-Matches-tp4390718p4390718.html">http://spssx-discussion.1045642.n5.nabble.com/String-Matches-tp4390718p4390718.html</a>
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
<a class="moz-txt-link-abbreviated" href="mailto:LISTSERV@LISTSERV.UGA.EDU">LISTSERV@LISTSERV.UGA.EDU</a> (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
</pre>
</blockquote>
</body>
</html>
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD