Date: Thu, 20 Aug 2009 15:39:11 -0500
Reply-To: Kevin Myers <KevinMyers@AUSTIN.RR.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Kevin Myers <KevinMyers@AUSTIN.RR.COM>
Subject: Re: shortest distance algorithm
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original
For any who may be interested, I just evaluated computing great circle
distances using several different formulas in SAS, across a very wide range
of latitude and longitude values, including point pairs spaced both very
closely and almost diametrically opposed to each other.
Mathematically, the "simple" formula, the Haversine formula, and the
Vincenty formula for great circle distances should all produce identical
results. However, they can produce different results in digital
computations due to round off error (see:
http://en.wikipedia.org/wiki/Great-circle_distance).
My results indicate that using the simple version of the formula is entirely
adequate in SAS for almost any typical application, producing a maximum
difference of roughly 0.2 inches on the surface of the earth in the worst
case scenario vs. the more accurate Vincenty formula. The reason this turns
out so well in SAS (and in most other modern computer languages with modern
compilers and CPUs) is due to the large number of digits that are used in
computations (64 bit double precision floats, with 80 bit intermediate
results internal to the CPU). This results in much less round off error
than in older systems that used less precision.
Here is the "simple" great circle distance formula:
d = r * arcos( sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) *
cos(long2-long1) )
Where:
d is the great circle distance in the same units as r.
r is the earth (or other sphere) radius in whatever distance unit that you
want the results in.
lat1 and long1 are the latitude and longtitude coordinates for one point,
both in radians.
lat2 and long2 are the latitude and longitude coordinates for the second
point, both in radians.
Note that the above does not take into account eccentricity of the earth,
which is important over long distances, but can generally be ignored for
nearest neighbor types of problems.
Since the simple version of the great circle distance involves fewer
computations, it will execute faster than the Haversine and Vicenty
formulas. Since it still produces quite accurate results, you might want to
consider using it for computationally intensive problems such as determining
nearest neighbors for tables with millions of rows.
The Vincenty formula is used internally in the version 9.2 geodist()
function. The SAS docs don't say whether the geodist() implementation
accounts for earth eccentricity, and what ellipsoid is used if so.
Hope this is helpful.
s/KAM
----- Original Message -----
From: "Mike Zdeb" <msz03@ALBANY.EDU>
To: <SAS-L@LISTSERV.UGA.EDU>
Sent: Thursday, August 20, 2009 10:08
Subject: Re: shortest distance algorithm
> hi ... in addition to this paper (it's another great paper by Howard
> Schreier,
> though I'm not sure if your distances need the 3-dimensional distances
> calculated by Howard),
> you can also look at ...
>
> http://www.sascommunity.org/wiki/Driving_Distances_and_Drive_Times_using_SAS_and_Google_Maps
>
> it shows how to calculate straight line distance using three different
> methods,
> the Haversine distance formula and two new V9.2 function (ZIPCITYDISTANCE
> and GEODIST)
>
>
> it also shows how to access Google maps using the URL access method and
> get
> driving distances (though 37 million x 5,000 is a lot of Google hits)
>
>
> you can reduce the problem size if you are willing to make some
> compromises
>
>
> even with the 37 million unique customers and 5000 dealers, you might be
> satisfied with ZIP centroid-to-ZIP centroid distance
>
> in that case, you can build a table of distances between ZIP centroids
> using
> the SASHELP.ZIPCODE data set and the ZIPCITYDISTANCE function
> (there are only ~42,000 observations in SASHELP.ZIPCODE and you
> could reduce the problem even more by limiting the distance calculations
> to
> only those zips in your data base)
>
> then, you can create a lookup table with the closest zip to any given zip,
> throw it into a hash object in a data step and process your customer table
>
> note: if you can be satisfied with zip centroid-to zip centroid
> distances, you
> can get driving distances from Google maps with just zip codes, you
> don't need a complete address ... that would reduce the number of Google
> hits,
> but you still might get "booted"
>
>
> the "art" here would be in coming up with some efficient way to establish
> the lookup table (and that's where Howard's paper could help)
>
>
> --
> Mike Zdeb
> U@Albany School of Public Health
> One University Place
> Rensselaer, New York 12144-3456
> P/518-402-6479 F/630-604-1475
>
>> The may be helpful.
>>
>> http://tinyurl.com/lg5r86
>>
>> http://www.nesug.org/Proceedings/nesug03/at/at008.pdf
>>
>> On 8/20/09, Suresh Ramanathan <ramanathansuresh@yahoo.com> wrote:
>>> Hello SAS Users,
>>>
>>> I am looking for an efficient way to find the closest dealer location
>>> for
>>> every customer in our db.
>>> We have a customer table (37 mil unique customers) and a dealer table
>>> (5000 unique dealers).
>>> Customer table has the latitude & the longitude of the customer
>>> location
>>> in decimal degrees and the
>>> Dealer table has the latitude & the longitude of the dealer location in
>>> decimal degrees
>>> I was able to get the formula for calculating the distance between two
>>> points based on lat & long from wiki.
>>> Is there a way other than using the brute force to get the shortest
>>> dealer location for every customer.
>>> Please share your thoughts.
>>>
>>> Thanks
>>> Suresh
>>>
>>
>