| Date: | Thu, 29 Mar 2007 19:07:03 -0400 |
| Reply-To: | Richard Ristow <wrristow@mindspring.com> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | Richard Ristow <wrristow@mindspring.com> |
| Subject: | Re: parsing strings |
| Content-Type: | text/plain; charset="us-ascii"; format=flowed |
|---|
Update: test data, and tested code.
At 08:47 AM 3/28/2007, Doug Keyser wrote:
>I'm trying to parse a string variable with thousands of cases into new
>variables. For example, a variable named "demotext" has the following
>string:
>
>|NodeID:10115|TenureID:4|GenderID:2
>
>I'd like to be able to create separate variables titled NodeID,
>TenureID, GenderID with their respective values...10115, 4, 2.
This is tested; it fixes one simple bug that prevented functioning, and
adds a couple of features. It's SPSS 15 draft output, but all language
features used should work back at least through release 9.
In this version, intermediate variables are regular variables whose
names begin with '@'. Change those to '#' to make them scratch
variables, to leave the output file uncluttered.
|-----------------------------|---------------------------|
|Output Created |29-MAR-2007 18:59:45 |
|-----------------------------|---------------------------|
[Keyser] C:\Documents and Settings\Richard\My Documents
\Eudora mail\Attachments\parsing1.sav
demoText
|NodeID:10132|RecDate:1/18/2007|TenureID:|GenderID:
|NodeID:10115|RecDate:1/18/2007|TenureID:4|GenderID:2
|NodeID:10134|RecDate:1/18/2007|TenureID:2|GenderID:2
|NodeID:10134|RecDate:1/18/2007|TenureID:1|GenderID:2
|NodeID:10134|RecDate:1/18/2007|TenureID:4|GenderID:2
|NodeID:10133|RecDate:1/18/2007|TenureID:1|GenderID:1
|NodeID:10134|RecDate:1/18/2007|TenureID:3|GenderID:2
|NodeID:10133|RecDate:1/18/2007|TenureID:7|GenderID:2
|NodeID:10115|RecDate:1/18/2007|TenureID:1|GenderID:2
|NodeID:10134|RecDate:1/18/2007|TenureID:7|GenderID:3
Number of cases read: 10 Number of cases listed: 10
NUMERIC NodeID RecDate TenureID genderid (F5).
FORMATS RecDate (DATE11).
STRING BadKey (A10).
VAR LABEL BadKey
'(Last) unrecognized keyword found in string'.
* Working variables ***.
STRING @Parsing /* Remaining unparsed part on input */ (A255)
/@Assign /* Assigment pair (<keyword>:<value>) */ (A25)
/@KeyStr /* Keyword part of <keyword>:<value> pair */ (A12)
/@ValStr /* Value part of <keyword>:<value> pair */ (A12).
NUMERIC @Value /* Value part, converted to numeric */ (F5)
/@Index /* Result of "INDEX" search in a string */ (F5)
/@AsgnNum /* Counter, through 'assignment' pairs */ (F5).
COMPUTE @Parsing = LTRIM(demotext).
COMPUTE @Parsing = LTRIM(@Parsing,'|').
LOOP @AsgnNum = 1 TO 10
IF @Parsing NE ' '.
. COMPUTE @Index = INDEX(@Parsing,'|').
. DO IF @Index GT 0.
. COMPUTE @Assign = SUBSTR(@Parsing,1,@Index-1).
. COMPUTE @Parsing = SUBSTR(@Parsing,@Index).
. ELSE.
. COMPUTE @Assign = @Parsing.
. COMPUTE @Parsing = ''.
. END IF.
. COMPUTE @Parsing = LTRIM(@Parsing,'|').
. COMPUTE @Assign = LTRIM(@Assign).
. COMPUTE @Index = INDEX (@Assign,':').
. COMPUTE @KeyStr = SUBSTR(@Assign,1,@Index-1).
. COMPUTE @ValStr = SUBSTR(@Assign,@Index+1).
. COMPUTE Matched = 0.
. DO REPEAT KeyWord = 'NodeID' 'RecDate' 'TenureID' 'genderid'
/TgtVbl = NodeID RecDate TenureID genderid.
. DO IF UPCASE(@KeyStr) = UPCASE(KeyWord).
. DO IF UPCASE(@KeyStr) = 'RECDATE'.
* Special case: value is a date, not an integer *** .
. COMPUTE @Value = NUMBER(@ValStr,ADATE12).
ELSE.
* Integer values: *** .
. COMPUTE @Value = NUMBER(@ValStr,F12).
. END IF.
. COMPUTE TgtVbl = @Value.
. COMPUTE Matched = 1.
. END IF.
. END REPEAT.
. IF Matched NE 1 BadKey = @KeyStr.
END LOOP.
LIST demoText TO BadKey.
List
|-----------------------------|---------------------------|
|Output Created |29-MAR-2007 18:59:47 |
|-----------------------------|---------------------------|
[Keyser] C:\Documents and Settings\Richard\My Documents
\Eudora mail\Attachments\parsing1.sav
The variables are listed in the following order:
LINE 1: demoText
LINE 2: NodeID RecDate TenureID genderid BadKey
demoText: |NodeID:10132|RecDate:1/18/2007|TenureID:|GenderID:
NodeID: 10132 18-JAN-2007 . .
demoText: |NodeID:10115|RecDate:1/18/2007|TenureID:4|GenderID:2
NodeID: 10115 18-JAN-2007 4 2
demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:2|GenderID:2
NodeID: 10134 18-JAN-2007 2 2
demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:1|GenderID:2
NodeID: 10134 18-JAN-2007 1 2
demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:4|GenderID:2
NodeID: 10134 18-JAN-2007 4 2
demoText: |NodeID:10133|RecDate:1/18/2007|TenureID:1|GenderID:1
NodeID: 10133 18-JAN-2007 1 1
demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:3|GenderID:2
NodeID: 10134 18-JAN-2007 3 2
demoText: |NodeID:10133|RecDate:1/18/2007|TenureID:7|GenderID:2
NodeID: 10133 18-JAN-2007 7 2
demoText: |NodeID:10115|RecDate:1/18/2007|TenureID:1|GenderID:2
NodeID: 10115 18-JAN-2007 1 2
demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:7|GenderID:3
NodeID: 10134 18-JAN-2007 7 3
Number of cases read: 10 Number of cases listed: 10
|