```Date: Thu, 20 Aug 2009 15:39:11 -0500 Reply-To: Kevin Myers Sender: "SAS(r) Discussion" From: Kevin Myers Subject: Re: shortest distance algorithm Comments: To: msz03@albany.edu Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original For any who may be interested, I just evaluated computing great circle distances using several different formulas in SAS, across a very wide range of latitude and longitude values, including point pairs spaced both very closely and almost diametrically opposed to each other. Mathematically, the "simple" formula, the Haversine formula, and the Vincenty formula for great circle distances should all produce identical results. However, they can produce different results in digital computations due to round off error (see: http://en.wikipedia.org/wiki/Great-circle_distance). My results indicate that using the simple version of the formula is entirely adequate in SAS for almost any typical application, producing a maximum difference of roughly 0.2 inches on the surface of the earth in the worst case scenario vs. the more accurate Vincenty formula. The reason this turns out so well in SAS (and in most other modern computer languages with modern compilers and CPUs) is due to the large number of digits that are used in computations (64 bit double precision floats, with 80 bit intermediate results internal to the CPU). This results in much less round off error than in older systems that used less precision. Here is the "simple" great circle distance formula: d = r * arcos( sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(long2-long1) ) Where: d is the great circle distance in the same units as r. r is the earth (or other sphere) radius in whatever distance unit that you want the results in. lat1 and long1 are the latitude and longtitude coordinates for one point, both in radians. lat2 and long2 are the latitude and longitude coordinates for the second point, both in radians. Note that the above does not take into account eccentricity of the earth, which is important over long distances, but can generally be ignored for nearest neighbor types of problems. Since the simple version of the great circle distance involves fewer computations, it will execute faster than the Haversine and Vicenty formulas. Since it still produces quite accurate results, you might want to consider using it for computationally intensive problems such as determining nearest neighbors for tables with millions of rows. The Vincenty formula is used internally in the version 9.2 geodist() function. The SAS docs don't say whether the geodist() implementation accounts for earth eccentricity, and what ellipsoid is used if so. Hope this is helpful. s/KAM ----- Original Message ----- From: "Mike Zdeb" To: Sent: Thursday, August 20, 2009 10:08 Subject: Re: shortest distance algorithm > hi ... in addition to this paper (it's another great paper by Howard > Schreier, > though I'm not sure if your distances need the 3-dimensional distances > calculated by Howard), > you can also look at ... > > http://www.sascommunity.org/wiki/Driving_Distances_and_Drive_Times_using_SAS_and_Google_Maps > > it shows how to calculate straight line distance using three different > methods, > the Haversine distance formula and two new V9.2 function (ZIPCITYDISTANCE > and GEODIST) > > > it also shows how to access Google maps using the URL access method and > get > driving distances (though 37 million x 5,000 is a lot of Google hits) > > > you can reduce the problem size if you are willing to make some > compromises > > > even with the 37 million unique customers and 5000 dealers, you might be > satisfied with ZIP centroid-to-ZIP centroid distance > > in that case, you can build a table of distances between ZIP centroids > using > the SASHELP.ZIPCODE data set and the ZIPCITYDISTANCE function > (there are only ~42,000 observations in SASHELP.ZIPCODE and you > could reduce the problem even more by limiting the distance calculations > to > only those zips in your data base) > > then, you can create a lookup table with the closest zip to any given zip, > throw it into a hash object in a data step and process your customer table > > note: if you can be satisfied with zip centroid-to zip centroid > distances, you > can get driving distances from Google maps with just zip codes, you > don't need a complete address ... that would reduce the number of Google > hits, > but you still might get "booted" > > > the "art" here would be in coming up with some efficient way to establish > the lookup table (and that's where Howard's paper could help) > > > -- > Mike Zdeb > U@Albany School of Public Health > One University Place > Rensselaer, New York 12144-3456 > P/518-402-6479 F/630-604-1475 > >> The may be helpful. >> >> http://tinyurl.com/lg5r86 >> >> http://www.nesug.org/Proceedings/nesug03/at/at008.pdf >> >> On 8/20/09, Suresh Ramanathan wrote: >>> Hello SAS Users, >>> >>> I am looking for an efficient way to find the closest dealer location >>> for >>> every customer in our db. >>> We have a customer table (37 mil unique customers) and a dealer table >>> (5000 unique dealers). >>> Customer table has the latitude & the longitude of the customer >>> location >>> in decimal degrees and the >>> Dealer table has the latitude & the longitude of the dealer location in >>> decimal degrees >>> I was able to get the formula for calculating the distance between two >>> points based on lat & long from wiki. >>> Is there a way other than using the brute force to get the shortest >>> dealer location for every customer. >>> Please share your thoughts. >>> >>> Thanks >>> Suresh >>> >> > ```

Back to: Top of message | Previous page | Main SAS-L page