LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 22 Sep 2005 09:21:36 +0200
Reply-To:     Spousta Jan <JSpousta@CSAS.CZ>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Spousta Jan <JSpousta@CSAS.CZ>
Subject:      Re: 2nd Attempt: Grouping Zip Codes
Content-Type: text/plain; charset="us-ascii"

Yes Simon, something like this can indicate which cases are "close enough". But how to create the clusters? Even in your simple case you have two possible candidate solutions:

(1,2);(3);(4) and (1);(2);(3);(4)

And you should find that the first one is better in the Deepak's sense.

If you will have n ZIP's, then the maximum number of possible solutions is given by the Bell number Bn (see http://mathworld.wolfram.com/BellNumber.html and http://www.research.att.com/cgi-bin/access.cgi/as/njas/sequences/eisA.cg i?Anum=A000110) - numbers growing approximatedly as quickly with n as n**n. For example: B10 = 115,975, B20 = 51,724,158,235,372 - here it becomes unsearchable even for the best today's comuters in a reasonable time using a brute-force search B30 = 846,749,014,511,809,332,450,147 - here I am no more able to promounce the number correctly even in my mother's language :-) B100 = 475853912767648336587907688413872078263636696868256114666163346375591144 97892442622672724044217756306953557882560751

In an average country, there are thousands ZIP's...

***

Regarding Richard's suggestion to use SQL - I think, Richard, that C/C++/... would be much quicker and Python or Ruby much more convenient. All these languages are better suited for loops and branching of algorithms.

But I am not an SQL expert - if you are, you will do your best in SQL. In my opinion, the programmer is more important for the solution than the tool.

Best regards

Jan

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Simon Freidin Sent: Wednesday, September 21, 2005 11:36 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: 2nd Attempt: Grouping Zip Codes

data list list /origin (a5) Zip1 Zip2 Zip3 . begin data. Zip1 0 5 15 Zip2 5 0 12 Zip3 15 12 0 Zip4 end data.

casestovars/separator='_'. flip. sel if index(case_lbl,'ORIGIN')=0. compute lt10=(var001<10). match files file=*/drop=var001. formats lt10 (f1). list.

CASE_LBL LT10

ZIP1_1 1 ZIP1_2 1 ZIP1_3 0 ZIP1_4 . ZIP2_1 1 ZIP2_2 1 ZIP2_3 0 ZIP2_4 . ZIP3_1 0 ZIP3_2 0 ZIP3_3 1 ZIP3_4 .

Number of cases read: 12 Number of cases listed: 12

On 21/09/2005, at 11:19 PM, Deepak Jethwani wrote:

> Hi Listers, > This is my second mail to the list. I am still struggling with the > following issue. > We have a list of zip codes and a table that lists out the drive time

> distance from one zip code to the other in the following format. > > Zip1 Zip2 Zip3 ..... > Zip1 0 5 15 > Zip2 5 0 12 > Zip3 15 12 0 > Zip4 > > > > Now, we need to group the zip codes which fall within 10 minutes of > drive time distance from a zip code into a group. > The problem is actually about identifying starting zip codes around > which to build these groupings. > > I would really welcome any comments from anyone who has faced a > similar problem or any suggested approaches that come to mind for a > possible solution or any suggested texts which I can use to tackle > this problem. > > Best regards > Deepak Jethwani >


Back to: Top of message | Previous page | Main SPSSX-L page