| Date: | Thu, 29 Jul 2004 14:29:51 -0400 |
| Reply-To: | Richard Ristow <wrristow@mindspring.com> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | Richard Ristow <wrristow@mindspring.com> |
| Subject: | Re: Creating Two String Variables From One |
|
| In-Reply-To: | <s107dd0d.001@GroupWise> |
| Content-Type: | text/plain; charset="us-ascii"; format=flowed |
|---|
At 05:06 PM 7/28/2004, Kurt Wilkening wrote:
>I have one string variable called name that contains the last name
>followed by a space, then first name. For example,
>
>Smith John R
>
>I would like to run syntax that creates two variables from the one so that
>I end up with one variable called last name that contains "smith" and
>another variable called first name that contains "John R".
I'm sure you'll have a correct answer posted by the time this hits the
list, but I think it's worth a comment.
Extracting logically different parts from a string, based on punctuation or
other context in the string, is called 'parsing', and it's fascinating,
challenging, and cause good people to tear their hair out. I have *been* there.
Your problem, as you've stated it, is an easy one; here's the solution
probably already posted. If your original variable is called FULLNAME, then:
. STRING LASTNAME FRSTNAME (A20).
. NUMERIC #BRKPNT (F3).
. COMPUTE #BRKPNT = INDEX (FULLNAME,' ').
. COMPUTE LASTNAME = SUBSTR(FULLNAME,1,#BRKPNT).
. COMPUTE FRSTNAME = SUBSTR(FULLNAME,#BRKPNT+1).
The only 'cute' trick is the scratch variable for the location of the first
space.
>Can someone provide quick help?
OK, there's quick help. Now, here's non-quick help. Real inputs for parsing
almost NEVER follow a simple rule consistently. The following just scratch
the surface:
"Smith John R" -> "John R" "Smith" (correct)
* Parsing company and institution names can give ludicrous results(*):
"The Acme Co." -> "Acme Co." "The"
"North American Van Lines" ->
"American Van Lines" "North"
"Supreme Court of Alabama" ->
"Court of Alabama" "Supreme"
* A simple deviation: leading blanks on the name will give a blank last name,
" Smith John R" -> "John R Smith" " "
So,
. STRING LASTNAME FRSTNAME (A20).
. STRING #TRMNAME (A50.
. COMPUTE #TRMNAME=LTRIM(FULLNAME).
. NUMERIC #BRKPNT (F3).
. COMPUTE #BRKPNT = INDEX (#TRMNAME,' ').
. COMPUTE LASTNAME = SUBSTR(#TRMNAME,1,#BRKPNT).
. COMPUTE FRSTNAME = SUBSTR(#TRMNAME,#BRKPNT+1).
* Second simple deviation: TWO blanks between first and last name give a
leading blank on first name,
"Smith John R" -> " John R" "Smith"
So, combining the previous correction with this one,
. STRING LASTNAME FRSTNAME (A20).
. STRING #TRMNAME (A50.
. COMPUTE #TRMNAME=LTRIM(FULLNAME).
. NUMERIC #BRKPNT (F3).
. COMPUTE #BRKPNT = INDEX (#TRMNAME,' ').
. COMPUTE LASTNAME = SUBSTR(#TRMNAME,1,#BRKPNT).
. COMPUTE FRSTNAME = LTRIM(SUBSTR(#TRMNAME,#BRKPNT+1)).
* Third deviation: A comma instead of blank separating the names:
"Smith,John R" -> "R" "Smith,John"
Adding this to the previous corrections,
. STRING LASTNAME FRSTNAME (A20).
. STRING #TRMNAME (A50.
. COMPUTE #TRMNAME=LTRIM(FULLNAME).
. NUMERIC #BRKPNT (F3).
. COMPUTE #BRKPNT = INDEX (#TRMNAME,' ,',2).
. COMPUTE LASTNAME = SUBSTR(#TRMNAME,1,#BRKPNT).
. COMPUTE FRSTNAME = LTRIM(SUBSTR(#TRMNAME,#BRKPNT+1)).
(Note the 'cute' use of the feature of INDEX.)
And that still doesn't fix
"Smith,John R" -> "R" "Smith,John"
"Smith ,John R" -> ",John R" "Smith"
Good luck! If your input wasn't machine-generated, you may see every one of
these, and almost certainly more that I haven't thought of.
..........
(*) The John Carter Brown Library is an endowed rare-book library at Brown
University. Years ago, it received a solicitation letter:
Mrs. John Carter Brown Library
<correct address>
Dear Mrs. Library:
|