Date: Thu, 20 Aug 2009 12:11:19 -0400
Reply-To: msz03@albany.edu
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Zdeb <msz03@ALBANY.EDU>
Subject: Re: shortest distance algorithm
Content-Type: text/plain;charset=iso-8859-1
hi ... one more thought on this and the suggestion about being satisfied with zip-2-zip distance
the suggestions by Mark Miller are all ways to reduce the number of calculations you'd have to make
one way to to reduce the number of zip-2-zip calculations is to make the assumption that the
closest Zip to any given zip is either within the same state or a bordering state
so ... you can make a table of states and bordering states using one of the SAS-supplied maps
*
merge a SAS/GRAPH map data set with itself
find common vertices (lat/long locations with same values)
;
* state 72 ... Puerto Rico;
proc sql;
create table borders as
select distinct d0.state as state0,
d1.state as state1
from maps.states d0, maps.states d1
where (d0.x eq d1.x and d0.y eq d1.y) and state0 ne 72;
quit;
*
rearrange data to create a matrix of bordering states
;
proc transpose data=borders out=new (drop=_:) prefix=state;
by state0;
run;
then use that info to restrict the number of calculations you'd have to make
when creating the lookup table of zip-2-zip minimum distances
--
Mike Zdeb
U@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
> hi ... in addition to this paper (it's another great paper by Howard Schreier,
> though I'm not sure if your distances need the 3-dimensional distances calculated by Howard),
> you can also look at ...
>
> http://www.sascommunity.org/wiki/Driving_Distances_and_Drive_Times_using_SAS_and_Google_Maps
>
> it shows how to calculate straight line distance using three different methods,
> the Haversine distance formula and two new V9.2 function (ZIPCITYDISTANCE and GEODIST)
>
>
> it also shows how to access Google maps using the URL access method and get
> driving distances (though 37 million x 5,000 is a lot of Google hits)
>
>
> you can reduce the problem size if you are willing to make some compromises
>
>
> even with the 37 million unique customers and 5000 dealers, you might be
> satisfied with ZIP centroid-to-ZIP centroid distance
>
> in that case, you can build a table of distances between ZIP centroids using
> the SASHELP.ZIPCODE data set and the ZIPCITYDISTANCE function
> (there are only ~42,000 observations in SASHELP.ZIPCODE and you
> could reduce the problem even more by limiting the distance calculations to
> only those zips in your data base)
>
> then, you can create a lookup table with the closest zip to any given zip,
> throw it into a hash object in a data step and process your customer table
>
> note: if you can be satisfied with zip centroid-to zip centroid distances, you
> can get driving distances from Google maps with just zip codes, you
> don't need a complete address ... that would reduce the number of Google hits,
> but you still might get "booted"
>
>
> the "art" here would be in coming up with some efficient way to establish
> the lookup table (and that's where Howard's paper could help)
>
>
> --
> Mike Zdeb
> U@Albany School of Public Health
> One University Place
> Rensselaer, New York 12144-3456
> P/518-402-6479 F/630-604-1475
>
>> The may be helpful.
>>
>> http://tinyurl.com/lg5r86
>>
>> http://www.nesug.org/Proceedings/nesug03/at/at008.pdf
>>
>> On 8/20/09, Suresh Ramanathan <ramanathansuresh@yahoo.com> wrote:
>>> Hello SAS Users,
>>>
>>> I am looking for an efficient way to find the closest dealer location for
>>> every customer in our db.
>>> We have a customer table (37 mil unique customers) and a dealer table
>>> (5000 unique dealers).
>>> Customer table has the latitude & the longitude of the customer location
>>> in decimal degrees and the
>>> Dealer table has the latitude & the longitude of the dealer location in
>>> decimal degrees
>>> I was able to get the formula for calculating the distance between two
>>> points based on lat & long from wiki.
>>> Is there a way other than using the brute force to get the shortest
>>> dealer location for every customer.
>>> Please share your thoughts.
>>>
>>> Thanks
>>> Suresh
>>>
>>
>