Link

Manan Shah (@cs-mshah)

My own Google Map

Table of Contents

  1. Introduction
  2. Collection and Stroring data
  3. Preprocessing
  4. Computing Quickest/Shortest Path
  5. Efficiency, Feasibility and Features
  6. References
  7. Implementation
  8. Conclusion

Introduction

In this document I have tried to model how a navigation system works. During my study, I found that it is truly an extremely complex system and requires several algorithms and heuristics. I have tried my best to descibe the details which I understood and thought that are used in actual systems.

Collection and Stroring data

Data collection is a very big task and we need several sources to get map data.

  • Map partners: getting data from local map agencies.
  • satellite images.
  • location data from phone.
  • user corrections for a location and verification from a local team.
  • croud-sourcing to get real-time traffic data through location enabled.

With all the data, on a 2D grid, paths and locations can be plotted according to their coordinates. With several computational geometry algorithms, the curves are drawn removing the difference in coordinates from various sources and a 2d map is generated which we view. We store all the coordinates of locations and edge attributes in a look up tabe which has $O(1)$ complexity for insertion and deletion of new/old data. The next section describes what attributes are required and how we construct the graph of our map.

Preprocessing

There is a lot of preprocessing required.

Vertices:

What all has to be taken care for vertices? Vertices denote locations and junctions/traffic signals. So the time to stop at that junction has to be taken care of. So for every vertex, we store the time required to pass through that junction. If it is not known, we just store the average time required to take a turn.

There is one more interesting thing that has to be taken care of- what if we have the same coodinate for several locations? Which is the exact case that would happen with a building having lets say several different companies, or a place with different societies/shops. We cannot store take all these vertices in a graph as it would drastically increase their number.

So,we will encode location coodinates in a systematic way to have a parent vertex which will be in the graph. There are several standard map encodings available such as plus codes. So for all the locations if we store the encoding (upto a length which denotes a few buildings) in a set , then this DS will automatically have unique elements which will serve as our vertices for a graph.

Edges:

This plays the most important part of making the graph. They are formed between 2 connected vertices. It it would be quite logical to have the edge weights as time required to travel between the vertices. All parameters will be converted to time equivalents. Here are some parameters:

  • type of road
type of roadavg speed(km/hr)(just to represent)
Expressways120
national highways100
state highways80
major district roads60
village roads20
arterial/subarterial streets40
local streets30
  • no. of lanes
  • incline
  • sinuosity: it is the euclidean distance divided by the travelled distance. So a road with sinuosity index close to 1 is preferred.
  • data for different days/times of the week

with all these parameters from the collected data, a formula based on croud-sourcing can be devised to have a fairly good estimate of speed on a road having certain parameters. Thus, with the speed, we can now get the time required to traverse that road.

we keep the weights as seconds and can easily store them as integers to remove any complexity arrising due to floats.

rough estimate:
$max\ weight=30 * 24 * 3600=2592000s$
so we can safely use long long

With the vertices and edges, we now construct a graph. As some roads are one-way, what we have here is a directed and weighted graph.

There’s still one thing to do: how to deal with weights on the vertices?

unprocessed graph

We use a very clever trick to trasfer the weights on vetices to edges:

$w(u,v)’=w(u,v)+c(v)$

processed graph

Now as the roads weights change, we need to used a DS that makes us do fast insertion/update operations.
So we use a modified adjacency list.

unordered_map<ll,<unordered_map<ll,ll> > > This allows us to:

  • update an edge weight $(u,v)$ from $w\ to\ w’$ by $adj[u][v]=w’$ in constant time.
  • delete a vertex in $O(e)$ where $e$ is the number of edges from vetex.
  • delete an edge in $O(1)$ time.

Also we used unordered_map as the vertices might not be a contiguous array because of updates.

  • For differnt vehicles we could have different graph copies with different weights so that we estimate paths accordingly.
  • Also, we could modify weights according to user profiles: For example, if some user is ok with scenic paths but is not much concerned over time,we could increase the deciding factor of that. If some user does not like uneven roads at all, those paths could be given more weight.

Computing Quickest/Shortest Path

Algorithms:

  • classsical dijkstra’s algorithm is too slow to compute the shortest path on a graph of the scale of a country.
  • So, to speed up the algorithm, we make use of a technique which is called ‘meet in the middle’ and try a bidirectional dijkstra’s algorithm.
  • both algorithms are same asymptotically, but the bidirectional dijkstra improves the algorithm by a constant factor of 2.

####bidirectional dijkstra’s algorithm:

  • We start by alternately running the dijkstra algorithm from the source and from the destination.
  • For this, we need to reverse all the directions of the directed graph for running in the reversed direction.
  • Once a node is processed (i.e extracted from the priority queue) from both the dijkstra’s, we terminate.
  • But the shortest path is not necessarily from source to this point and to the destination.

lemma

  • so for every operation, we maintain the least distance from source to some u and from u to target and after termination, we construct the path by maintaining parent nodes.

Heuristics:

A* search algorithm:

We can imagine the dijkstra’s algorithm as an increasing search circle of processed vertices with increasing path lengths from the start. The bidirectional version can be imagined as two circles increasing-each from its source and stopping when they touch. In the process we see that several nodes which aren’t relevant to the path have been visited. So what should be done to direct the search more linearly towards the destination? We still want to use the dijkstra, but want to direct its flow.

  • We define the potentials $\pi(v)$ for every vertex to be some real number.
  • And for every edge: $l_\pi(u,v)=l(u,v)-\pi(u)+\pi(v)$
  • We can eaily see that for any path $P$, between two vertices,$l_\pi(P)=l(P)-\pi(u)+\pi(v)$ is invarient of the path path $P$. Here $P=(v_1,v_2,..,v_k)$ and the edges between the vertices. (this can be seen from the telescopic summation). This also shows that the shortest path found previously doesn’t change practically but only its weight is shifted by a constant.
  • While assigning these potentials,care must be taken to have the new edge weights non-negative.
  • Our goal is to direct the search towards the destination. So the function should increase the priority of points close to the ending in the priority queue.
  • So we naturally define $\pi(v)=d(v,t)/(avg\ vehicle\ speed)$ where $d(v,t)$ is the distance of a vertex to the destination $t$. We divide by the avg vehicle speed as our weights denote time.

Running the dijkstra:

  • the priority becomes: $time[v]-\pi(s)+\pi(v)$ and we extract min vertex having this value from the priority queue.
  • As $\pi(s)$ is same for all, the expression becomes $time[v]+time(v,t)$ which is like the directed search that we want.
  • We can optimise this further by doing a bidirectional run but with $p_f(u)=\frac{\pi_f(u)-\pi_r(u)}{2}$ and $p_r(u)=-p_f(u)$ where $\pi_f$ and $\pi_r$ are defined as usual for only forward and reversed graphs, as the new forward modified potential. (we do this to have the same weight of an edge-both in forward and reverse graphs)

Efficiency, Feasibility and other Features

How do we deploy this at world level?

For multiple queries and even faster processing, we would require further optimisations like using locations, reaches and contraction hierarchies (out of scope of this article!!). One might be tempted to store all paths between all vertices but the preprocessing would require several months and storage in PB. So that is not feasible.

How do we reroute if we take a wrong turn?

  • we simply run a dijkstra from the the vertex where we are and terminate the run when we extract any vertex which was part of the original path.
  • This usually would be something like a U-turn.
  • If the user wishes to simply get another route, he can go on his wished path and call for a reroute. We run our A* again. But if the user reaches the destination without our route but faster, then the path can be considered for analysis and weights can be modified accordingly.

What happens if a road is blocked or there’s huge traffic because of accidents?

  • we use croud-sourcing to get an estimate of traffic data. But for getting a good estimate the data should be available for more than a certain minimum number of vehicles on a road. Depending on the speed of traffic, the edge weight can be modified and set to critical modification.
  • road blocks are set to critical modifications.
  • after every traversal of a vertex or on some specific time interval, we check if the verties/edges of path are in the critical modified set. If they are, we re-route the path from current vertex using new edge weights.

How do we use offline maps when we have bad connectivity?

GPS doesn’t require internet connectivity. So we could have the map data of our city/route downloaded offline and run the algorithm locally. This won’t allow live updates on edges.

Getting nearby shops/hospitals/hotels/restaurants:

  • we simply run a dijkstra from the source and add the nodes in a set of the type which are requested in the query and terminate when extracted node has a time greater than 15-25 mins depending on what is nearby.
  • We then show the set to the user and ask where to go.
  • We can afford a dijkstra as we would terminate in a locality.

SOS FEATURES!

  • we could add an SOS button where the user could set certain other users as friends and in case of an emergency, the SOS button would inform the current location to the friends.
  • In case of serious emergencies like an accident, where the person is even unable to press the SOS, new car technologies could have IoT devices which tell the state of the vehicle(engine temperature,air bags state etc) and if it amounts to a serious accident, it would immediately give the directions to the location to nearby friends, police and hospitals.

References

[1] Ron Gutman’s overview
[2] Routing considerations
[3] bidirectional dijkstra MIT OCW
[4] bidirectional dijkstra
[5] A* algorithm

additional resources:

visual run of various algorithms, application
bidirectional,A* algos and results

Implementation

I solved cf Dijkstra? by 2 methods to compare performance:
normal dijkstra
bidirectional dijkstra
By expanding the test case details, we observe that for tests having edges>vertices, the bidirectional algorithm runs faster than the normal dijkstra.

In actual road networks, this is the case: edges>vertices.
Ex: USA road network has 20M vertices and 50M edges.

Conclusion

In this entire process, I tried to read countless no. of papers, blogs and tried to understand several things. The actual working in the real world is truly complex and that thoght of appreciating the complexity and trying to model on my own gives me a lot of pleasure.