LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2011)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 29 Dec 2011 14:15:00 -0500
Reply-To:     ANDRES ALBERTO BURGA LEON <aburgal@minedu.gob.pe>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         ANDRES ALBERTO BURGA LEON <aburgal@minedu.gob.pe>
Subject:      Re: Searching for characters different to A, B,
              ...Z in string
Comments: To: Jon K Peck <peck@us.ibm.com>
In-Reply-To:  <OFB6129CD1.DD89ECD6-ON87257975.005C4FB4-87257975.005D08D6@us.ibm.com>
Content-Type: multipart/alternative;

Thanks to everybody for the answers.

Indeed the names are not typed, but captured by a scanner. The software didn't accept the acute accent or the dieresis, but could misinterpret a letter for a number of other characters, like dot, commas, etc.

In principle, the only valid characters are those specified in David's syntax, so it works well for me (adding the CHAR.) Andrés

Mg. Andrés Burga León Coordinador de Análisis e Informática Unidad de Medición de la Calidad Educativa (UMC) Ministerio de Educación del Perú Av.de la Arqeuología s/n (cuadra 2) Lima 41 Perú Teléfono 615-5840 - 6155800 anexo 1212 http://www2.minedu.gob.pe/umc/

Jon K Peck <peck@us.ibm.com> Enviado por: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> 29/12/2011 01:13 p.m. Por favor, responda a Jon K Peck <peck@us.ibm.com>

Para SPSSX-L@LISTSERV.UGA.EDU cc

Asunto Re: Searching for characters different to A, B, ...Z in string

Hard to tell. The original said I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis).

and

Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ)

But Spanish orthography also uses a number of characters with acute accents (a, e, i, o, u) and u with diaresis.

The best solution might be to use a table of known first and last names and compare against that. That would give some false positives, but it would be much more accurate for other kinds of spelling errors.

Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@us.ibm.com new phone: 720-342-5621

From: David Marso <david.marso@gmail.com> To: SPSSX-L@listserv.uga.edu Date: 12/29/2011 08:55 AM Subject: Re: [SPSSX-L] Searching for characters different to A, B,

...Z in string Sent by: "SPSSX(r) Discussion" <SPSSX-L@listserv.uga.edu>

I believe the original question was to detect characters other than <space>, A..Z or Ñ , not numbers. Perhaps someone fat-fingered the input and commas !, @, %.. etc may have entered the field. Indeed, the CHAR functions are likely more appropriate however my ancient version does not support them and I usually like to test my code. --

Jon K Peck wrote > > I suggest reversing the test and counting digits. > LOOP #=1 TO LENGTH(RTRIM(nombre)). > COMPUTE BADDATA > =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))). > END LOOP. > > That way, any other characters not listed in the original solution such as > accented characters will not trigger a count. > > One other thing: with Statistics 16 or later, use the CHAR. functions. > They give the same result whether in Unicode or code page mode. The old, > deprecated functions work on bytes, not characters, and the number of > bytes per character can be different in Unicode and code page mode. And > if the data above had been, say, Japanese, Korean, or Chinese, the > original code would have failed in either mode. (Of course, it would also > have failed because the list of alpha characters would have been seriously > incomplete.) > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 > > > > > From: David Marso &lt;david.marso@&gt; > To: SPSSX-L@.uga > Date: 12/29/2011 07:52 AM > Subject: Re: [SPSSX-L] Searching for characters different to A, B, > ...Z in string > Sent by: "SPSSX(r) Discussion" &lt;SPSSX-L@.uga&gt; > > > > Glad that helps. > The Key is the last argument to the INDEX function which chops the > 'haystack' into single characters. > -- > > ANDRES ALBERTO BURGA LEON wrote >> >> Excelent, this definitly solves my problem, thank you very much David >> >> >> Andrés >> >> Mg. Andrés Burga León >> Coordinador de Análisis e Informática >> Unidad de Medición de la Calidad Educativa (UMC) >> Ministerio de Educación del Perú >> Av.de la Arqeuología s/n (cuadra 2) >> Lima 41 >> Perú >> Teléfono 615-5840 - 6155800 anexo 1212 >> http://www2.minedu.gob.pe/umc/ >> >> >> >> David Marso &lt;david.marso@&gt; >> Enviado por: "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt; >> 29/12/2011 09:27 a.m. >> Por favor, responda a >> David Marso &lt;david.marso@&gt; >> >> >> Para >> SPSSX-L@.UGA >> cc >> >> Asunto >> Re: Searching for characters different to A, B, ...Z in string >> >> >> >> >> >> >> "Is there another easier way to do this task?" >> >> DATA LIST / nombre (A20). >> begin data >> Juan Manuel >> Alberto >> Ana >> Teresa Marilu >> Alberto2 >> Te1resa Maril11u >> END DATA. >> LOOP #=1 TO LENGTH(RTRIM(nombre)). >> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) >> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). >> END LOOP. >> LIST. >> >> >> NOMBRE BADDATA >> >> Juan Manuel .00 >> Alberto .00 >> Ana .00 >> Teresa Marilu .00 >> Alberto2 1.00 >> Te1resa Maril11u 3.00 >> >> >> Number of cases read: 6 Number of cases listed: 6 >> >> ANDRES ALBERTO BURGA LEON wrote >>> >>> Hello to everybody: >>> >>> I have a sting (nombre) variable (A15) wose content are names of >> diferent >>> length. For example: >>> >>> Juan Manuel >>> Alberto >>> Ana >>> Teresa Marilu >>> ... >>> >>> I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis). >>> >>> So far I can only think of creating 15 new variables, each having one > of >>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) >> and >>> so on. >>> >>> Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ). Then sum this new 27 >>> variables and check if this sum is equal to CHAR.LENGT(nombre) >>> >>> Is there another easier way to do this task? >>> >>> Kindly >>> >>> Andrés >>> >>> Mg. Andrés Burga León >>> Coordinador de Análisis e Informática >>> Unidad de Medición de la Calidad Educativa (UMC) >>> Ministerio de Educación del Perú >>> Av.de la Arqeuología s/n (cuadra 2) >>> Lima 41 >>> Perú >>> Teléfono 615-5840 - 6155800 anexo 1212 >>> http://www2.minedu.gob.pe/umc/ >>> >> >> >> -- >> View this message in context: >> > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html

> >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html

> > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD >

-- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page