LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2007)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 28 Mar 2007 13:45:35 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: parsing strings
Comments: To: Doug Keyser <douglas.keyser@KENEXA.COM>
In-Reply-To:  <200703281247.l2SAkwtG011381@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 08:47 AM 3/28/2007, Doug Keyser wrote:

>I'm trying to parse a string variable with thousands of cases into new >variables. For example, a variable named "demotext" has the following >string: > >|NodeID:10115|TenureID:4|GenderID:2 > >I'd like to be able to create separate variables titled NodeID, >TenureID, GenderID with their respective values...10115, 4, 2. > >Thoughts?

Well (smile), another 'transformation program vs. Python'. Python's going to have an edge here, because it can read your string of keywords and generate the code to declare the variables. HOWEVER, here's a transformation-program solution.

I'm writing assuming

. The only variables you're creating are NodeID, TenureID, and GenderID;

. The input string values is in variable StringV, which is no more than 100 characters long;

. The string's internal structure is indefinite repetition of "|<keyword>:<value>". (Code doesn't check for syntax errors, which is a bad deficiency in a parser; among other things, it can let parser bugs get by.)

. The combination of a keyword and its value is never more than 25 characters long;

. There are no more than 10 keyword-value pairs

. All values are numeric, and F5 is an OK format for all the variables;

. A keyword can't occur more than once in the string. (If it does, the value from the latest occurrence will be used, with no warning.)

These can be changed or adjusted, of course.

Code is untested; and, I'm afraid, for even a simple parser like this, that means there'll be an error somewhere.

NUMERIC NodeID TenureID GenderID (F5). STRING BadKey (A10). VAR LABEL BadKey '(Last) unrecognized keyword found in string'.

STRING #Parsing (A100) /#Assign (A25) /#KeyStr #ValStr (A12). NUMERIC #Value (F5).

COMPUTE #Parsing = LTRIM(StringV). COMPUTE #Parsing = LTRIM(#Parsing,'|').

LOOP #AsgnNum = 1 TO 10 IF #Parsing NE ' '. . COMPUTE #Index = INDEX(#Parsing,'|'). . DO IF #Index GT 0. . COMPUTE #Assign = SUBSTR(#Parsing,1,#Index-1). . COMPUTE #Parsing = SUBSTR(#Parsing,#Index). . ELSE. . COMPUTE #Assign = #Parsing. . COMPUTE #Parsing = ''. . END IF. . COMPUTE #Parsing = LTRIM(#Parsing,'|'). . COMPUTE #Assign = LTRIM(#Assign).

. COMPUTE #Index = INDEX (#Assign,':'). . COMPUTE #KeyStr = SUBSTR(#Assign,1,#Index). . COMPUTE #ValStr = SUBSTR(#Assign,#Index+1). . COMPUTE #Value = NUMBER(#ValStr,F12).

. COMPUTE #Matched = 0. . DO REPEAT KeyWord = 'NodeID' 'TenureID' 'GenderID' /TgtVbl = NodeID TenureID GenderID. . DO IF KeyWord = #KeyStr. . COMPUTE TgtVbl = #Value. . COMPUTE #Matched = 1. . END IF. . END REPEAT.

. IF #Matched NE 1 BadKey = #KeyStr. END LOOP.


Back to: Top of message | Previous page | Main SPSSX-L page