Date: Thu, 5 Aug 2010 07:56:26 +0000
Reply-To: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
Subject: Re: grouping and sequencing cases
In-Reply-To: <186780.8590.qm@web59307.mail.re1.yahoo.com>
Content-Type: multipart/alternative;
Dear Tufayel,
Your syntax is lovely, I completely forgot about CREATE and I didn't even know FIRST and LAST were functions in AGGREGATE!
However, when I ran it, the variable TYPE_B you created did not correspond to the variable TYPE in your test data. For $casenum=12 TYPE=2 but your syntax rendered TYPE_B=1.
I'll paste the entire syntax below, I suffixed 'your' variables with _T (from Tufayel ;-)).
Thanks for the lovely teamwork!
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
dataset name d1.
*Create visit.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
*Compute tourID.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
*********Scratch copy of data.
dataset copy d2.
dataset activate d2.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxvis=max(visit).
execute.
select if location ne 3.
compute type_B=0.
if visit=maxvis and location=1 and lag(location)=1 type_b=1.
if visit=maxvis and location=1 and lag(location)=2 type_b=2.
if visit=maxvis and location=2 and lag(location)=1 type_b=2.
if visit=maxvis and location=2 and lag(location)=2 type_b=3.
sort cases personid(a)visit(d).
compute newcount=1.
if personID=lag(personID) newcount=lag(newcount)+1.
compute t1=lag(type_b).
execute.
if t1 gt 0 type_b=t1.
execute.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxnewcount=max(newcount).
loop #i=3 to maxnewcount.
if newcount=#i and location=1 and lag(location)=1 type_b=1.
if newcount=#i and location=1 and lag(location)=2 type_b=2.
if newcount=#i and location=2 and lag(location)=1 type_b=2.
if newcount=#i and location=2 and lag(location)=2 type_b=3.
end loop.
sort cases personid visit.
match files file *
/keep personid visit type_b.
execute.
match files file d1
/file d2
/by personid visit.
execute.
dataset close all.
dataset name d1.
if mis (type_b) type_b=lag(type_b).
execute.
compute check=type-type_b.
descriptives check.
delete variables check.
*********Tufayel solution.
*Make a little change in tourID.
compute tourID_T=1.
do if (location=1 or location=2) and (location ne lag(location)).
compute tourID_T=lag(tourID_T)+1.
else.
compute tourID_T=lag(tourID_T).
end if.
execute.
*Group the tours.
compute group=1.
if tourID = lag(tourID) group = lag(group).
if tourID <> lag(tourID) group = 1+ lag(group).
EXECUTE.
*Compute the first location of the tour.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/firstlocation = first(location).
EXECUTE.
*Compute the last location of the tour.
create leadlocation = lead(location,1).
EXECUTE.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/lastlocation = last(leadlocation).
EXECUTE.
*Compute type.
compute type_T = 0.
if firstlocation = 1 and lastlocation = 1 type_T = 1.
if firstlocation = 1 and lastlocation = 2 type_T = 2.
if firstlocation = 2 and lastlocation = 1 type_T = 2.
if firstlocation = 2 and lastlocation = 2 type_T = 3.
EXECUTE.
compute check=type_b-type_T.
exe.
Date: Wed, 4 Aug 2010 16:27:00 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi Ruben,
Thanks a lot for the syntax. I'm sorry that I couldn't clearly put the logic, but you got it right. I understood your syntax and was able to write (probably) an easier one to create the variable 'type'. I couldn't do it unless I went through your one.
*Make a little change in tourID.
compute tourID=1.
do if (location=1 or location=2) and (location ne lag(location)).
compute tourID=lag(tourID)+1.
else.
compute tourID=lag(tourID).
end if.
execute.
*Group the tours.
compute group=1.
if tourID = lag(tourID) group = lag(group).
if tourID <> lag(tourID) group = 1+ lag(group).
EXECUTE.
*Compute the first location of the tour.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/firstlocation = first(location).
EXECUTE.
*Compute the last location of the tour.
create leadlocation = lead(location,1).
EXECUTE.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/lastlocation = last(leadlocation).
EXECUTE.
*Compute type.
compute type_b = 0.
if firstlocation = 1 and lastlocation = 1 type_b = 1.
if firstlocation = 1 and lastlocation = 2 type_b = 2.
if firstlocation = 2 and lastlocation = 1 type_b = 2.
if firstlocation = 2 and lastlocation = 2 type_b = 3.
EXECUTE.
Thanks a lot.
-Tufayel
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Tue, August 3, 2010 5:40:19 AM
Subject: Re: grouping and sequencing cases
Dear Tufayel,
I'm sorry but you did not answer my question and I still don't understand the logic. However, I tried to 'extract' the logic from the data and wrote some syntax that exactly reproduces 'Type' in your data (but only for these example respondents). The syntax is rather long and clumsy, but I didn't see any better options to get it done. Perhaps the List can suggest some improvements?
This comes without any warranty whatsoever and I suggest you check the actual results meticulously, I'm not overly confident it will work properly.
HTH,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
*Test data.
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
dataset name d1.
*Create visit.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
*Compute tourID.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
*********Scratch copy of data.
dataset copy d2.
dataset activate d2.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxvis=max(visit).
execute.
select if location ne 3.
compute type_B=0.
if visit=maxvis and location=1 and lag(location)=1 type_b=1.
if visit=maxvis and location=1 and lag(location)=2 type_b=2.
if visit=maxvis and location=2 and lag(location)=1 type_b=2.
if visit=maxvis and location=2 and lag(location)=2 type_b=3.
sort cases personid(a)visit(d).
compute newcount=1.
if personID=lag(personID) newcount=lag(newcount)+1.
compute t1=lag(type_b).
execute.
if t1 gt 0 type_b=t1.
execute.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxnewcount=max(newcount).
loop #i=3 to maxnewcount.
if newcount=#i and location=1 and lag(location)=1 type_b=1.
if newcount=#i and location=1 and lag(location)=2 type_b=2.
if newcount=#i and location=2 and lag(location)=1 type_b=2.
if newcount=#i and location=2 and lag(location)=2 type_b=3.
end loop.
sort cases personid visit.
match files file *
/keep personid visit type_b.
execute.
match files file d1
/file d2
/by personid visit.
execute.
dataset close all.
dataset name d1.
if mis (type_b) type_b=lag(type_b).
execute.
compute check=type-type_b.
descriptives check.
delete variables check.
Date: Mon, 2 Aug 2010 12:28:40 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: ruben_van_den_berg@hotmail.com
Hi Ruben,
I should have been more explicit about the context. In my dataset each case represent a person's activity throughout the whole day (24 hours). For example, sleeping at home > taking breakfast at home > taking kid to daycare > going to office > from office to grocery > from grocery to home... etc. For my convenience, I have recoded the activity-locations as three main categories: 1=home, 2=office, 3=other places.
I define a TOUR based on two anchors: home and office. Thus, if a person travels like this: home > car > daycare > car > home (1>3>3>3>1), this is a home-home tour. A tour is complete when a person starts from any of the two anchors (home or office) and goes/returns to any of those, via other places (i.e. 3).
The variables 'type' is defined based on the type of tour-origin and tour-destination. The only variables used to define 'type' should be location, personID shouldn't matter. type=1, if 1>3>3>3>1 (home-home tour), type=2, if 1>3>3>2 (home-office) or 2>3>3>3>1 (office-home), and type=3, if 2>3>3>2 (office-office).
location tourID type
1 1 2
3 1 2
3 1 2
2 2 3
3 2 3
3 2 3
2 3 3
Hope I have cleared things up. And thanks so much for your help.
-Tufayel
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: tufayel_02@yahoo.com
Sent: Mon, August 2, 2010 3:56:33 AM
Subject: RE: grouping and sequencing cases
Dear Tufayel,
This logic is pretty hard to grasp. From the data, it seems that for
1
3
1
3
2
The first 'tour' is 1-3-1 and the second 'tour' is 1-3-3. The type for the SECOND '1' is determined by the second tour (so its type=2, not 1). Is type always 1 for the first location within a personID?
Best,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
Date: Fri, 30 Jul 2010 17:28:30 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi Ruben,
Thank you for the help with tourID. I'm sorry for being ambiguous about the variable 'type'. Thing is, 1 and 2 are fixed locations (home and office respectively) and 3 is any kind of vehicle/walk. You'd notice in the example that within a personID the location changes from 1-1, 1-2, 2-1 or 2-2 via 3. To rephrase, the tours are always like 1-3-1, 1-3-2, 2-3-1 or 2-3-2. For a tour 1-3-1, type=1 (tourID=1 in the example), for 1-3-2 or 2-3-1 (tourID=2 and 4), type=2, and for 2-3-2 (tourID=3), type=3. I hope this clears things up.
personID location tourID type
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
Thanks
Tufayel
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Fri, July 30, 2010 4:38:44 AM
Subject: Re: grouping and sequencing cases
Dear Tufayel,
The order of cases in your example data is vital information, isn't it? The first thing I'd do if these were my raw data, is create this order in the data. Otherwise, if you'd sort your records randomly, you'd destroy part of the information contained in the data. I created a new variable 'visit' which is the nth visit for each personID. The next block of syntax should create tourID (I called it tourID_B so you can compare it to your desired tourID).
Your third request, however, was somewhat unclear to me. I think in total you have 9 location sequences:
1-1
1-2
1-3
2-1
2-2
2-3
3-1
3-2
3-3
Four of these (within personID) cause the tourID to change:
1-2
2-1
3-1
3-2
So within tourID groups there are 5 possible sequences:
1-1
1-3
2-2
2-3
3-3
As I understood, you want to create type within tourID group, so for each of these 5 sequences the value of type should be specified (even if (system) missing).
Could you please help us out a bit more?
Best,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
Date: Thu, 29 Jul 2010 16:58:27 -0700
From: tufayel_02@yahoo.com
Subject: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi all,
I am trying to create two variables (tourID and type) from two existing variables (personID and location). TourID is a sequence where the numbers remain the same until 'location' 1 or 2 arrives. TourID always starts from 1 when personID changes.
The variable 'type' can take three values - 1, 2 and 3. If the 'location' changes from 1 to 1, type=1 (for that tourID group); if it changes from 1 - 2, or 2 - 1, type=2; if its 2 - 2, type=3 (last four cases).
personID location tourID type
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
Can anyone please help me out?
Thanks in advance!
Tufayel
[text/html]