Date: Thu, 29 Dec 2011 14:15:00 -0500
Reply-To: ANDRES ALBERTO BURGA LEON <aburgal@minedu.gob.pe>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: ANDRES ALBERTO BURGA LEON <aburgal@minedu.gob.pe>
Subject: Re: Searching for characters different to A, B,
...Z in string
In-Reply-To: <OFB6129CD1.DD89ECD6-ON87257975.005C4FB4-87257975.005D08D6@us.ibm.com>
Content-Type: multipart/alternative;
Thanks to everybody for the answers.
Indeed the names are not typed, but captured by a scanner. The software
didn't accept the acute accent or the dieresis, but could misinterpret a
letter for a number of other characters, like dot, commas, etc.
In principle, the only valid characters are those specified in David's
syntax, so it works well for me (adding the CHAR.)
Andrés
Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/
Jon K Peck <peck@us.ibm.com>
Enviado por: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
29/12/2011 01:13 p.m.
Por favor, responda a
Jon K Peck <peck@us.ibm.com>
Para
SPSSX-L@LISTSERV.UGA.EDU
cc
Asunto
Re: Searching for characters different to A, B, ...Z in
string
Hard to tell. The original said
I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).
and
Then, for the 15 new variables count the number of A, B .... Z. and
>>> create 27 new variables (I also need to count Ñ)
But Spanish orthography also uses a number of characters with acute
accents (a, e, i, o, u) and u with diaresis.
The best solution might be to use a table of known first and last names
and compare against that. That would give some false positives, but it
would be much more accurate for other kinds of spelling errors.
Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@us.ibm.com
new phone: 720-342-5621
From: David Marso <david.marso@gmail.com>
To: SPSSX-L@listserv.uga.edu
Date: 12/29/2011 08:55 AM
Subject: Re: [SPSSX-L] Searching for characters different to A, B,
...Z in string
Sent by: "SPSSX(r) Discussion" <SPSSX-L@listserv.uga.edu>
I believe the original question was to detect characters other than
<space>,
A..Z or Ñ , not numbers. Perhaps someone fat-fingered the input and
commas
!, @, %.. etc may have entered the field.
Indeed, the CHAR functions are likely more appropriate however my ancient
version does not support them and I usually like to test my code.
--
Jon K Peck wrote
>
> I suggest reversing the test and counting digits.
> LOOP #=1 TO LENGTH(RTRIM(nombre)).
> COMPUTE BADDATA
> =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))).
> END LOOP.
>
> That way, any other characters not listed in the original solution such
as
> accented characters will not trigger a count.
>
> One other thing: with Statistics 16 or later, use the CHAR. functions.
> They give the same result whether in Unicode or code page mode. The
old,
> deprecated functions work on bytes, not characters, and the number of
> bytes per character can be different in Unicode and code page mode. And
> if the data above had been, say, Japanese, Korean, or Chinese, the
> original code would have failed in either mode. (Of course, it would
also
> have failed because the list of alpha characters would have been
seriously
> incomplete.)
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From: David Marso <david.marso@>
> To: SPSSX-L@.uga
> Date: 12/29/2011 07:52 AM
> Subject: Re: [SPSSX-L] Searching for characters different to A,
B,
> ...Z in string
> Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga>
>
>
>
> Glad that helps.
> The Key is the last argument to the INDEX function which chops the
> 'haystack' into single characters.
> --
>
> ANDRES ALBERTO BURGA LEON wrote
>>
>> Excelent, this definitly solves my problem, thank you very much David
>>
>>
>> Andrés
>>
>> Mg. Andrés Burga León
>> Coordinador de Análisis e Informática
>> Unidad de Medición de la Calidad Educativa (UMC)
>> Ministerio de Educación del Perú
>> Av.de la Arqeuología s/n (cuadra 2)
>> Lima 41
>> Perú
>> Teléfono 615-5840 - 6155800 anexo 1212
>> http://www2.minedu.gob.pe/umc/
>>
>>
>>
>> David Marso <david.marso@>
>> Enviado por: "SPSSX(r) Discussion" <SPSSX-L@.UGA>
>> 29/12/2011 09:27 a.m.
>> Por favor, responda a
>> David Marso <david.marso@>
>>
>>
>> Para
>> SPSSX-L@.UGA
>> cc
>>
>> Asunto
>> Re: Searching for characters different to A, B, ...Z in string
>>
>>
>>
>>
>>
>>
>> "Is there another easier way to do this task?"
>>
>> DATA LIST / nombre (A20).
>> begin data
>> Juan Manuel
>> Alberto
>> Ana
>> Teresa Marilu
>> Alberto2
>> Te1resa Maril11u
>> END DATA.
>> LOOP #=1 TO LENGTH(RTRIM(nombre)).
>> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
>> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
>> END LOOP.
>> LIST.
>>
>>
>> NOMBRE BADDATA
>>
>> Juan Manuel .00
>> Alberto .00
>> Ana .00
>> Teresa Marilu .00
>> Alberto2 1.00
>> Te1resa Maril11u 3.00
>>
>>
>> Number of cases read: 6 Number of cases listed: 6
>>
>> ANDRES ALBERTO BURGA LEON wrote
>>>
>>> Hello to everybody:
>>>
>>> I have a sting (nombre) variable (A15) wose content are names of
>> diferent
>>> length. For example:
>>>
>>> Juan Manuel
>>> Alberto
>>> Ana
>>> Teresa Marilu
>>> ...
>>>
>>> I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).
>>>
>>> So far I can only think of creating 15 new variables, each having one
> of
>>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
>> and
>>> so on.
>>>
>>> Then, for the 15 new variables count the number of A, B .... Z. and
>>> create 27 new variables (I also need to count Ñ). Then sum this new 27
>>> variables and check if this sum is equal to CHAR.LENGT(nombre)
>>>
>>> Is there another easier way to do this task?
>>>
>>> Kindly
>>>
>>> Andrés
>>>
>>> Mg. Andrés Burga León
>>> Coordinador de Análisis e Informática
>>> Unidad de Medición de la Calidad Educativa (UMC)
>>> Ministerio de Educación del Perú
>>> Av.de la Arqeuología s/n (cuadra 2)
>>> Lima 41
>>> Perú
>>> Teléfono 615-5840 - 6155800 anexo 1212
>>> http://www2.minedu.gob.pe/umc/
>>>
>>
>>
>> --
>> View this message in context:
>>
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html
>
>>
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
[text/html]
|