Date: Wed, 28 Mar 2007 13:45:35 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: parsing strings
In-Reply-To: <200703281247.l2SAkwtG011381@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 08:47 AM 3/28/2007, Doug Keyser wrote:
>I'm trying to parse a string variable with thousands of cases into new
>variables. For example, a variable named "demotext" has the following
>string:
>
>|NodeID:10115|TenureID:4|GenderID:2
>
>I'd like to be able to create separate variables titled NodeID,
>TenureID, GenderID with their respective values...10115, 4, 2.
>
>Thoughts?
Well (smile), another 'transformation program vs. Python'. Python's
going to have an edge here, because it can read your string of keywords
and generate the code to declare the variables. HOWEVER, here's a
transformation-program solution.
I'm writing assuming
. The only variables you're creating are NodeID, TenureID, and
GenderID;
. The input string values is in variable StringV, which is no more than
100 characters long;
. The string's internal structure is indefinite repetition of
"|<keyword>:<value>". (Code doesn't check for syntax errors, which is a
bad deficiency in a parser; among other things, it can let parser bugs
get by.)
. The combination of a keyword and its value is never more than 25
characters long;
. There are no more than 10 keyword-value pairs
. All values are numeric, and F5 is an OK format for all the variables;
. A keyword can't occur more than once in the string. (If it does, the
value from the latest occurrence will be used, with no warning.)
These can be changed or adjusted, of course.
Code is untested; and, I'm afraid, for even a simple parser like this,
that means there'll be an error somewhere.
NUMERIC NodeID TenureID GenderID (F5).
STRING BadKey (A10).
VAR LABEL BadKey
'(Last) unrecognized keyword found in string'.
STRING #Parsing (A100)
/#Assign (A25)
/#KeyStr #ValStr (A12).
NUMERIC #Value (F5).
COMPUTE #Parsing = LTRIM(StringV).
COMPUTE #Parsing = LTRIM(#Parsing,'|').
LOOP #AsgnNum = 1 TO 10
IF #Parsing NE ' '.
. COMPUTE #Index = INDEX(#Parsing,'|').
. DO IF #Index GT 0.
. COMPUTE #Assign = SUBSTR(#Parsing,1,#Index-1).
. COMPUTE #Parsing = SUBSTR(#Parsing,#Index).
. ELSE.
. COMPUTE #Assign = #Parsing.
. COMPUTE #Parsing = ''.
. END IF.
. COMPUTE #Parsing = LTRIM(#Parsing,'|').
. COMPUTE #Assign = LTRIM(#Assign).
. COMPUTE #Index = INDEX (#Assign,':').
. COMPUTE #KeyStr = SUBSTR(#Assign,1,#Index).
. COMPUTE #ValStr = SUBSTR(#Assign,#Index+1).
. COMPUTE #Value = NUMBER(#ValStr,F12).
. COMPUTE #Matched = 0.
. DO REPEAT KeyWord = 'NodeID' 'TenureID' 'GenderID'
/TgtVbl = NodeID TenureID GenderID.
. DO IF KeyWord = #KeyStr.
. COMPUTE TgtVbl = #Value.
. COMPUTE #Matched = 1.
. END IF.
. END REPEAT.
. IF #Matched NE 1 BadKey = #KeyStr.
END LOOP.