LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2007)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 29 Mar 2007 19:07:03 -0400
Reply-To:   Richard Ristow <wrristow@mindspring.com>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Richard Ristow <wrristow@mindspring.com>
Subject:   Re: parsing strings
Content-Type:   text/plain; charset="us-ascii"; format=flowed

Update: test data, and tested code.

At 08:47 AM 3/28/2007, Doug Keyser wrote:

>I'm trying to parse a string variable with thousands of cases into new >variables. For example, a variable named "demotext" has the following >string: > >|NodeID:10115|TenureID:4|GenderID:2 > >I'd like to be able to create separate variables titled NodeID, >TenureID, GenderID with their respective values...10115, 4, 2.

This is tested; it fixes one simple bug that prevented functioning, and adds a couple of features. It's SPSS 15 draft output, but all language features used should work back at least through release 9.

In this version, intermediate variables are regular variables whose names begin with '@'. Change those to '#' to make them scratch variables, to leave the output file uncluttered.

|-----------------------------|---------------------------| |Output Created |29-MAR-2007 18:59:45 | |-----------------------------|---------------------------| [Keyser] C:\Documents and Settings\Richard\My Documents \Eudora mail\Attachments\parsing1.sav

demoText

|NodeID:10132|RecDate:1/18/2007|TenureID:|GenderID: |NodeID:10115|RecDate:1/18/2007|TenureID:4|GenderID:2 |NodeID:10134|RecDate:1/18/2007|TenureID:2|GenderID:2 |NodeID:10134|RecDate:1/18/2007|TenureID:1|GenderID:2 |NodeID:10134|RecDate:1/18/2007|TenureID:4|GenderID:2 |NodeID:10133|RecDate:1/18/2007|TenureID:1|GenderID:1 |NodeID:10134|RecDate:1/18/2007|TenureID:3|GenderID:2 |NodeID:10133|RecDate:1/18/2007|TenureID:7|GenderID:2 |NodeID:10115|RecDate:1/18/2007|TenureID:1|GenderID:2 |NodeID:10134|RecDate:1/18/2007|TenureID:7|GenderID:3

Number of cases read: 10 Number of cases listed: 10

NUMERIC NodeID RecDate TenureID genderid (F5). FORMATS RecDate (DATE11). STRING BadKey (A10). VAR LABEL BadKey '(Last) unrecognized keyword found in string'.

* Working variables ***. STRING @Parsing /* Remaining unparsed part on input */ (A255) /@Assign /* Assigment pair (<keyword>:<value>) */ (A25) /@KeyStr /* Keyword part of <keyword>:<value> pair */ (A12) /@ValStr /* Value part of <keyword>:<value> pair */ (A12). NUMERIC @Value /* Value part, converted to numeric */ (F5) /@Index /* Result of "INDEX" search in a string */ (F5) /@AsgnNum /* Counter, through 'assignment' pairs */ (F5).

COMPUTE @Parsing = LTRIM(demotext). COMPUTE @Parsing = LTRIM(@Parsing,'|').

LOOP @AsgnNum = 1 TO 10 IF @Parsing NE ' '. . COMPUTE @Index = INDEX(@Parsing,'|'). . DO IF @Index GT 0. . COMPUTE @Assign = SUBSTR(@Parsing,1,@Index-1). . COMPUTE @Parsing = SUBSTR(@Parsing,@Index). . ELSE. . COMPUTE @Assign = @Parsing. . COMPUTE @Parsing = ''. . END IF.

. COMPUTE @Parsing = LTRIM(@Parsing,'|'). . COMPUTE @Assign = LTRIM(@Assign).

. COMPUTE @Index = INDEX (@Assign,':'). . COMPUTE @KeyStr = SUBSTR(@Assign,1,@Index-1). . COMPUTE @ValStr = SUBSTR(@Assign,@Index+1).

. COMPUTE Matched = 0. . DO REPEAT KeyWord = 'NodeID' 'RecDate' 'TenureID' 'genderid' /TgtVbl = NodeID RecDate TenureID genderid. . DO IF UPCASE(@KeyStr) = UPCASE(KeyWord). . DO IF UPCASE(@KeyStr) = 'RECDATE'. * Special case: value is a date, not an integer *** . . COMPUTE @Value = NUMBER(@ValStr,ADATE12). ELSE. * Integer values: *** . . COMPUTE @Value = NUMBER(@ValStr,F12). . END IF. . COMPUTE TgtVbl = @Value. . COMPUTE Matched = 1. . END IF. . END REPEAT.

. IF Matched NE 1 BadKey = @KeyStr. END LOOP.

LIST demoText TO BadKey.

List |-----------------------------|---------------------------| |Output Created |29-MAR-2007 18:59:47 | |-----------------------------|---------------------------| [Keyser] C:\Documents and Settings\Richard\My Documents \Eudora mail\Attachments\parsing1.sav

The variables are listed in the following order:

LINE 1: demoText LINE 2: NodeID RecDate TenureID genderid BadKey

demoText: |NodeID:10132|RecDate:1/18/2007|TenureID:|GenderID: NodeID: 10132 18-JAN-2007 . .

demoText: |NodeID:10115|RecDate:1/18/2007|TenureID:4|GenderID:2 NodeID: 10115 18-JAN-2007 4 2

demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:2|GenderID:2 NodeID: 10134 18-JAN-2007 2 2

demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:1|GenderID:2 NodeID: 10134 18-JAN-2007 1 2

demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:4|GenderID:2 NodeID: 10134 18-JAN-2007 4 2

demoText: |NodeID:10133|RecDate:1/18/2007|TenureID:1|GenderID:1 NodeID: 10133 18-JAN-2007 1 1

demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:3|GenderID:2 NodeID: 10134 18-JAN-2007 3 2

demoText: |NodeID:10133|RecDate:1/18/2007|TenureID:7|GenderID:2 NodeID: 10133 18-JAN-2007 7 2

demoText: |NodeID:10115|RecDate:1/18/2007|TenureID:1|GenderID:2 NodeID: 10115 18-JAN-2007 1 2

demoText: |NodeID:10134|RecDate:1/18/2007|TenureID:7|GenderID:3 NodeID: 10134 18-JAN-2007 7 3

Number of cases read: 10 Number of cases listed: 10


Back to: Top of message | Previous page | Main SPSSX-L page