Calculating Many-to-Many Distances from GPS Coordinates

  • I
  • Thread starter MarneMath
  • Start date
In summary: The only way to do this would be to sort the data first.This sounds like an interesting problem. It sounds like you have a lot of data that you would like to use efficiently. One way to do this would be to use a sorting algorithm to get the data in order. This would reduce the amount of time needed to calculate the distance between points.
  • #1
MarneMath
Education Advisor
550
198
Here's my problem: I have approximately 500 million pairs of GPS coordinates. Each pair of point is unique and points (lat1,long1,lat2,long2) are considered to be the same as (lat2,long2,lat1,long1). Write now, I'm building a rather large table that'll contain the drive time calculations between each coordinate. I'm doing it in a rather naive way using an openstreemap API that calculates the distances. I'm essentially calculating the distance for each pair. Now, it occurs to me that there should exist a way to build a matrix that uses previously calculated distances in order to reduce the query times for long distance.

For example, if point A and B is calculated and we move onto point A C and if A B appears in the route for A C then we shouldn't have to search all the routes from A to C. We should be able to automatically select route A B and start searching from there.

I was wondering if anyone knows of an algorithm that does this. I vaguely recall something similar many years ago but I haven't found anything via Dr. Google.

Thanks!
 
Physics news on Phys.org
  • #2
MarneMath said:
if A B appears in the route for A C
How do you know if AB is on the route for AC?
 
  • #3
The entirety of the mapped has a contraction hierarchy layer and locally indexed.
 
  • #4
MarneMath said:
Here's my problem: I have approximately 500 million pairs of GPS coordinates. Each pair of point is unique and points (lat1,long1,lat2,long2) are considered to be the same as (lat2,long2,lat1,long1). Write now, I'm building a rather large table that'll contain the drive time calculations between each coordinate. I'm doing it in a rather naive way using an openstreemap API that calculates the distances. I'm essentially calculating the distance for each pair. Now, it occurs to me that there should exist a way to build a matrix that uses previously calculated distances in order to reduce the query times for long distance.

For example, if point A and B is calculated and we move onto point A C and if A B appears in the route for A C then we shouldn't have to search all the routes from A to C. We should be able to automatically select route A B and start searching from there.

I was wondering if anyone knows of an algorithm that does this. I vaguely recall something similar many years ago but I haven't found anything via Dr. Google.

Thanks!
Well, I guess you don't mind flogging the old computer with these calculations!

What you are doing sounds like a variant of the Traveling Salesman Problem. Given an arbitrarily long list of cities and the distances between pairs of cities (by whatever route), the TSP solution finds a single ordered list of these cities which has the least total distance to cover to visit all of them in turn.

https://en.wikipedia.org/wiki/Travelling_salesman_problem

Even Dr. Google has something on this.

It's not clear what is contained in this massive DB of 500 million points, but you should be able to whittle down the point data to something more managable, otherwise your grandkids might be waiting for the final answer to pop out of the optimization process.
 
  • #5
I may have not been clear regarding the exact nature of the problem. The route of the problem has been implemented already by a well known algorithm. The query time for a route from LA to NY isn't the issue at play. The problem deals with the repetitiveness of the problem and my inefficient method. So, let me clear this up. As of this morning, I already calculated the distances for all 500 million points. It took 8 days; however, this is rather inefficient if I ever receive a different set of 500 million points. What I would like to do is utilize the nodes created from a previous paths found to speed up of the time calculation in the process.

I think I basically came up with a solution, but it isn't elegant. I'm using Apache Spark to cache the nodes so when the program identifies a node in the cache it stop searching different paths and begins a new node from the end point of the previously calculated node.

edit: There's also other badness that comes with my implementation. Since a shuffle step is guaranteed and the data is not read in order I can't guarantee that the closest points are calculated globally, but locally I can. It's rather annoying though.
 
Last edited:
  • #6
MarneMath said:
For example, if point A and B is calculated and we move onto point A C and if A B appears in the route for A C then we shouldn't have to search all the routes from A to C. We should be able to automatically select route A B and start searching from there.

Since you mention "searching" it appears that your goal is to determine the shortest driving time between each pair of points. Is that correct? If so, I don't understand the statement "if A B appears in the route for A C". Is this assuming A B appears in "the" route knowm to the be quickest route between A and C ? Or are you saying that if A B appears in a given route between A and C then we can incorporate what we know about the quickest route between A and C in searching for ways to improve that given route ? - namely we don't try to improve the given route by changing the part that goes between A and B.

If you are saying the latter, I don't think it is always a safe assumption. For example, the quickest route between A and C might bypass B. If you know that any route between A and C must go through B then I think it's a correct assumption.
 
  • #7
Hey Stephen, you're right that isn't a good assumption. I think the way around this would be to enable priority queueing into the mix. However, programmatically this makes the problem quadratic in nature. I think by adding an adjacent listing method to the function then that'll reduce the time complexity. I'll bang my head on this tomorrow.
 
  • #8
Stephen Tashi said:
For example, the quickest route between A and C might bypass B.
That's the Law of the Short Cut. :wink:

It's still not clear where all of these 500 million points come from.

Are these points all located on existing roads? Are some scattered randomly across the countryside, where no transportation infrastructure exists?
 
  • #9
They should exists on roads. Thankfully, it doesn't particularly matter. If a graph cannot be generated then my code moves the GPS coordinates to a secondary table for it to be reevaluated by a data analyst.
 

FAQ: Calculating Many-to-Many Distances from GPS Coordinates

How do you calculate distances between multiple GPS coordinates?

To calculate the distance between multiple GPS coordinates, you can use the Haversine formula. This formula takes into account the curvature of the Earth and calculates the shortest distance between two points on a sphere. It uses the latitude and longitude of each coordinate to calculate the distance.

Can you use any GPS coordinates to calculate distance?

Yes, as long as you have the latitude and longitude for each coordinate, you can use the Haversine formula to calculate the distance between them. However, keep in mind that the accuracy of the distance calculation may vary depending on the quality and precision of the GPS coordinates.

What units are used for the distance calculation?

The Haversine formula calculates distances in kilometers. If you need the distance in miles, you can convert it by multiplying the result by 0.621371. Alternatively, you can use the Vincenty formula, which calculates distances in meters by default, and then convert it to your desired unit.

Can you calculate distances between multiple GPS coordinates at once?

Yes, you can input multiple sets of GPS coordinates into the Haversine formula or use a programming language to loop through the coordinates and calculate the distances between them. This allows you to calculate many-to-many distances efficiently.

Are there any limitations to using GPS coordinates for distance calculations?

While GPS coordinates can be useful for calculating distances, there are a few limitations to keep in mind. The accuracy of the distance calculation can be affected by factors such as signal interference, atmospheric conditions, and the quality of the GPS device. Additionally, the Haversine formula assumes a spherical Earth, so it may not be as accurate for long distances or when calculating distances near the poles.

Back
Top