Date: Fri, 6 Aug 2010 13:52:01 -0700
Reply-To: Tufayel Chowdhury <tufayel_02@yahoo.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Tufayel Chowdhury <tufayel_02@yahoo.com>
Subject: Re: grouping and sequencing cases
In-Reply-To: <SNT110-W260A27E713EC53D625E2C8FA900@phx.gbl>
Content-Type: multipart/alternative;
Hi Ruben,
Yes, it was wonderful, I've learned a lot. As to the $casenum=12, my syntax
gives the accurate result (type_b = 2). I checked it couple of times. The syntax
also works fine with my dataset. But thanks a lot, I'm pretty new to syntax and
your syntax helped a lot.
Regards
Tufayel
________________________________
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Thu, August 5, 2010 3:56:26 AM
Subject: Re: grouping and sequencing cases
Dear Tufayel,
Your syntax is lovely, I completely forgot about CREATE and I didn't even
know FIRST and LAST were functions in AGGREGATE!
However, when I ran it, the variable TYPE_B you created did not correspond to
the variable TYPE in your test data. For $casenum=12 TYPE=2 but your syntax
rendered TYPE_B=1.
I'll paste the entire syntax below, I suffixed 'your' variables with _T (from
Tufayel ;-)).
Thanks for the lovely teamwork!
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
dataset name d1.
*Create visit.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
*Compute tourID.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne
lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
*********Scratch copy of data.
dataset copy d2.
dataset activate d2.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxvis=max(visit).
execute.
select if location ne 3.
compute type_B=0.
if visit=maxvis and location=1 and lag(location)=1 type_b=1.
if visit=maxvis and location=1 and lag(location)=2 type_b=2.
if visit=maxvis and location=2 and lag(location)=1 type_b=2.
if visit=maxvis and location=2 and lag(location)=2 type_b=3.
sort cases personid(a)visit(d).
compute newcount=1.
if personID=lag(personID) newcount=lag(newcount)+1.
compute t1=lag(type_b).
execute.
if t1 gt 0 type_b=t1.
execute.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxnewcount=max(newcount).
loop #i=3 to maxnewcount.
if newcount=#i and location=1 and lag(location)=1 type_b=1.
if newcount=#i and location=1 and lag(location)=2 type_b=2.
if newcount=#i and location=2 and lag(location)=1 type_b=2.
if newcount=#i and location=2 and lag(location)=2 type_b=3.
end loop.
sort cases personid visit.
match files file *
/keep personid visit type_b.
execute.
match files file d1
/file d2
/by personid visit.
execute.
dataset close all.
dataset name d1.
if mis (type_b) type_b=lag(type_b).
execute.
compute check=type-type_b.
descriptives check.
delete variables check.
*********Tufayel solution.
*Make a little change in tourID.
compute tourID_T=1.
do if (location=1 or location=2) and (location ne lag(location)).
compute tourID_T=lag(tourID_T)+1.
else.
compute tourID_T=lag(tourID_T).
end if.
execute.
*Group the tours.
compute group=1.
if tourID = lag(tourID) group = lag(group).
if tourID <> lag(tourID) group = 1+ lag(group).
EXECUTE.
*Compute the first location of the tour.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/firstlocation = first(location).
EXECUTE.
*Compute the last location of the tour.
create leadlocation = lead(location,1).
EXECUTE.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/lastlocation = last(leadlocation).
EXECUTE.
*Compute type.
compute type_T = 0.
if firstlocation = 1 and lastlocation = 1 type_T = 1.
if firstlocation = 1 and lastlocation = 2 type_T = 2.
if firstlocation = 2 and lastlocation = 1 type_T = 2.
if firstlocation = 2 and lastlocation = 2 type_T = 3.
EXECUTE.
compute check=type_b-type_T.
exe.
________________________________
Date: Wed, 4 Aug 2010 16:27:00 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi Ruben,
Thanks a lot for the syntax. I'm sorry that I couldn't clearly put the logic,
but you got it right. I understood your syntax and was able to write (probably)
an easier one to create the variable 'type'. I couldn't do it unless I went
through your one.
*Make a little change in tourID.
compute tourID=1.
do if (location=1 or location=2) and (location ne lag(location)).
compute tourID=lag(tourID)+1.
else.
compute tourID=lag(tourID).
end if.
execute.
*Group the tours.
compute group=1.
if tourID = lag(tourID) group = lag(group).
if tourID <> lag(tourID) group = 1+ lag(group).
EXECUTE.
*Compute the first location of the tour.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/firstlocation = first(location).
EXECUTE.
*Compute the last location of the tour.
create leadlocation = lead(location,1).
EXECUTE.
aggregate
/outfile * overwrite=yes mode addvariables
/break group
/lastlocation = last(leadlocation).
EXECUTE.
*Compute type.
compute type_b = 0.
if firstlocation = 1 and lastlocation = 1 type_b = 1.
if firstlocation = 1 and lastlocation = 2 type_b = 2.
if firstlocation = 2 and lastlocation = 1 type_b = 2.
if firstlocation = 2 and lastlocation = 2 type_b = 3.
EXECUTE.
Thanks a lot.
-Tufayel
________________________________
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Tue, August 3, 2010 5:40:19 AM
Subject: Re: grouping and sequencing cases
Dear Tufayel,
I'm sorry but you did not answer my question and I still don't understand the
logic. However, I tried to 'extract' the logic from the data and wrote some
syntax that exactly reproduces 'Type' in your data (but only for these example
respondents). The syntax is rather long and clumsy, but I didn't see any better
options to get it done. Perhaps the List can suggest some improvements?
This comes without any warranty whatsoever and I suggest you check the actual
results meticulously, I'm not overly confident it will work properly.
HTH,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
*Test data.
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
dataset name d1.
*Create visit.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
*Compute tourID.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne
lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
*********Scratch copy of data.
dataset copy d2.
dataset activate d2.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxvis=max(visit).
execute.
select if location ne 3.
compute type_B=0.
if visit=maxvis and location=1 and lag(location)=1 type_b=1.
if visit=maxvis and location=1 and lag(location)=2 type_b=2.
if visit=maxvis and location=2 and lag(location)=1 type_b=2.
if visit=maxvis and location=2 and lag(location)=2 type_b=3.
sort cases personid(a)visit(d).
compute newcount=1.
if personID=lag(personID) newcount=lag(newcount)+1.
compute t1=lag(type_b).
execute.
if t1 gt 0 type_b=t1.
execute.
aggregate
/outfile * overwrite=yes mode addvariables
/break personid
/maxnewcount=max(newcount).
loop #i=3 to maxnewcount.
if newcount=#i and location=1 and lag(location)=1 type_b=1.
if newcount=#i and location=1 and lag(location)=2 type_b=2.
if newcount=#i and location=2 and lag(location)=1 type_b=2.
if newcount=#i and location=2 and lag(location)=2 type_b=3.
end loop.
sort cases personid visit.
match files file *
/keep personid visit type_b.
execute.
match files file d1
/file d2
/by personid visit.
execute.
dataset close all.
dataset name d1.
if mis (type_b) type_b=lag(type_b).
execute.
compute check=type-type_b.
descriptives check.
delete variables check.
________________________________
Date: Mon, 2 Aug 2010 12:28:40 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: ruben_van_den_berg@hotmail.com
Hi Ruben,
I should have been more explicit about the context. In my dataset each case
represent a person's activity throughout the whole day (24 hours). For example,
sleeping at home > taking breakfast at home > taking kid to daycare > going to
office > from office to grocery > from grocery to home... etc. For my
convenience, I have recoded the activity-locations as three main categories:
1=home, 2=office, 3=other places.
I define a TOUR based on two anchors: home and office. Thus, if a person travels
like this: home > car > daycare > car > home (1>3>3>3>1), this is a home-home
tour. A tour is complete when a person starts from any of the two anchors (home
or office) and goes/returns to any of those, via other places (i.e. 3).
The variables 'type' is defined based on the type of tour-origin and
tour-destination. The only variables used to define 'type' should be location,
personID shouldn't matter. type=1, if 1>3>3>3>1 (home-home tour), type=2, if
1>3>3>2 (home-office) or 2>3>3>3>1 (office-home), and type=3, if 2>3>3>2
(office-office).
locationtourIDtype
112
312
312
223
323
323
233
Hope I have cleared things up. And thanks so much for your help.
-Tufayel
________________________________
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: tufayel_02@yahoo.com
Sent: Mon, August 2, 2010 3:56:33 AM
Subject: RE: grouping and sequencing cases
Dear Tufayel,
This logic is pretty hard to grasp. From the data, it seems that for
1
3
1
3
2
The first 'tour' is 1-3-1 and the second 'tour' is 1-3-3. The type for
the SECOND '1' is determined by the second tour (so its type=2, not 1). Is type
always 1 for the first location within a personID?
Best,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
________________________________
Date: Fri, 30 Jul 2010 17:28:30 -0700
From: tufayel_02@yahoo.com
Subject: Re: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi Ruben,
Thank you for the help with tourID. I'm sorry for being ambiguous about the
variable 'type'. Thing is, 1 and 2 are fixed locations (home and office
respectively) and 3 is any kind of vehicle/walk. You'd notice in the example
that within a personID the location changes from 1-1, 1-2, 2-1 or 2-2 via 3. To
rephrase, the tours are always like 1-3-1, 1-3-2, 2-3-1 or 2-3-2. For a tour
1-3-1, type=1 (tourID=1 in the example), for 1-3-2 or 2-3-1 (tourID=2 and 4),
type=2, and for 2-3-2 (tourID=3), type=3. I hope this clears things up.
personID location tourID type
1111
1311
1311
1122
1322
1233
1333
1333
1333
1242
1342
1152
Thanks
Tufayel
________________________________
From: Ruben van den Berg <ruben_van_den_berg@hotmail.com>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Fri, July 30, 2010 4:38:44 AM
Subject: Re: grouping and sequencing cases
Dear Tufayel,
The order of cases in your example data is vital information, isn't it? The
first thing I'd do if these were my raw data, is create this order in the data.
Otherwise, if you'd sort your records randomly, you'd destroy part of the
information contained in the data. I created a new variable 'visit' which is the
nth visit for each personID. The next block of syntax should create tourID (I
called it tourID_B so you can compare it to your desired tourID).
Your third request, however, was somewhat unclear to me. I think in total you
have 9 location sequences:
1-1
1-2
1-3
2-1
2-2
2-3
3-1
3-2
3-3
Four of these (within personID) cause the tourID to change:
1-2
2-1
3-1
3-2
So within tourID groups there are 5 possible sequences:
1-1
1-3
2-2
2-3
3-3
As I understood, you want to create type within tourID group, so for each of
these 5 sequences the value of type should be specified (even if (system)
missing).
Could you please help us out a bit more?
Best,
Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: ruben.van.den.berg@tns-nipo.com
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com
data list free/personID location tourID type.
begin data
1 1 1 1
1 3 1 1
1 3 1 1
1 1 2 2
1 3 2 2
1 2 3 3
1 3 3 3
1 3 3 3
1 3 3 3
1 2 4 2
1 3 4 2
1 1 5 2
2 1 1 2
2 3 1 2
2 2 2 2
2 3 2 2
2 3 2 2
2 1 3 2
3 2 1 3
3 3 1 3
3 3 1 3
3 2 2 3
end data.
compute visit=1.
if personID=lag(personID) visit=lag(visit)+1.
compute tourID_B=1.
do if (personID=lag(personID)) and (location=1 or location=2) and (location ne
lag(location)).
compute tourID_B=lag(tourID_B)+1.
else if (personID=lag(personID)).
compute tourID_B=lag(tourID_B).
end if.
execute.
________________________________
Date: Thu, 29 Jul 2010 16:58:27 -0700
From: tufayel_02@yahoo.com
Subject: grouping and sequencing cases
To: SPSSX-L@LISTSERV.UGA.EDU
Hi all,
I am trying to create two variables (tourID and type) from two existing
variables (personID and location). TourID is a sequence where the numbers remain
the same until 'location' 1 or 2 arrives. TourID always starts from 1 when
personID changes.
The variable 'type' can take three values - 1, 2 and 3. If the 'location'
changes from 1 to 1, type=1 (for that tourID group); if it changes from 1 - 2,
or 2 - 1, type=2; if its 2 - 2, type=3 (last four cases).
personID location tourID type
1111
1311
1311
1122
1322
1233
1333
1333
1333
1242
1342
1152
2112
2312
2222
2322
2322
2132
3213
3313
3313
3223
Can anyone please help me out?
Thanks in advance!
Tufayel
[text/html]