Date: Thu, 22 Sep 2005 09:21:36 +0200
Reply-To: Spousta Jan <JSpousta@CSAS.CZ>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Spousta Jan <JSpousta@CSAS.CZ>
Subject: Re: 2nd Attempt: Grouping Zip Codes
Content-Type: text/plain; charset="us-ascii"
Yes Simon, something like this can indicate which cases are "close
enough". But how to create the clusters? Even in your simple case you
have two possible candidate solutions:
(1,2);(3);(4) and (1);(2);(3);(4)
And you should find that the first one is better in the Deepak's sense.
If you will have n ZIP's, then the maximum number of possible solutions
is given by the Bell number Bn (see
http://mathworld.wolfram.com/BellNumber.html and
http://www.research.att.com/cgi-bin/access.cgi/as/njas/sequences/eisA.cg
i?Anum=A000110) - numbers growing approximatedly as quickly with n as
n**n. For example:
B10 = 115,975,
B20 = 51,724,158,235,372 - here it becomes unsearchable even for the
best today's comuters in a reasonable time using a brute-force search
B30 = 846,749,014,511,809,332,450,147 - here I am no more able to
promounce the number correctly even in my mother's language :-)
B100 =
475853912767648336587907688413872078263636696868256114666163346375591144
97892442622672724044217756306953557882560751
In an average country, there are thousands ZIP's...
***
Regarding Richard's suggestion to use SQL - I think, Richard, that
C/C++/... would be much quicker and Python or Ruby much more convenient.
All these languages are better suited for loops and branching of
algorithms.
But I am not an SQL expert - if you are, you will do your best in SQL.
In my opinion, the programmer is more important for the solution than
the tool.
Best regards
Jan
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Simon Freidin
Sent: Wednesday, September 21, 2005 11:36 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: 2nd Attempt: Grouping Zip Codes
data list list /origin (a5) Zip1 Zip2 Zip3 .
begin data.
Zip1 0 5 15
Zip2 5 0 12
Zip3 15 12 0
Zip4
end data.
casestovars/separator='_'.
flip.
sel if index(case_lbl,'ORIGIN')=0.
compute lt10=(var001<10).
match files file=*/drop=var001.
formats lt10 (f1).
list.
CASE_LBL LT10
ZIP1_1 1
ZIP1_2 1
ZIP1_3 0
ZIP1_4 .
ZIP2_1 1
ZIP2_2 1
ZIP2_3 0
ZIP2_4 .
ZIP3_1 0
ZIP3_2 0
ZIP3_3 1
ZIP3_4 .
Number of cases read: 12 Number of cases listed: 12
On 21/09/2005, at 11:19 PM, Deepak Jethwani wrote:
> Hi Listers,
> This is my second mail to the list. I am still struggling with the
> following issue.
> We have a list of zip codes and a table that lists out the drive time
> distance from one zip code to the other in the following format.
>
> Zip1 Zip2 Zip3 .....
> Zip1 0 5 15
> Zip2 5 0 12
> Zip3 15 12 0
> Zip4
>
>
>
> Now, we need to group the zip codes which fall within 10 minutes of
> drive time distance from a zip code into a group.
> The problem is actually about identifying starting zip codes around
> which to build these groupings.
>
> I would really welcome any comments from anyone who has faced a
> similar problem or any suggested approaches that come to mind for a
> possible solution or any suggested texts which I can use to tackle
> this problem.
>
> Best regards
> Deepak Jethwani
>