CS494 Lecture Notes - A-Star

A-Star (yes, you can write it A*, but I prefer to write it out. I'm old) is a shortest path algorithm which tweaks Dijkstra's shortest path algorithm to get some drastic performance gains. There are some great resources for A-Star on the web, and I urge you to read one of them (especially the first one). The problem is as follows: You want to find the shortest path from node Start to node End in a directed, weighted graph. The idea behind A-Star is simple and elegant, and it is most easily explained with respect to Dijkstra's algorithm. With Dijkstra, you maintain a set of closed nodes: these are the nodes for which you already know the shortest paths from Start. You also maintain a multimap of open nodes ordered by their shortest known distance from Start. At each step, you are assured that the first node in the multimap, let's call it n, is one that you can remove from the open set and add to the closed set. At the same time, you look at each edge from n, and see if you can add or update any nodes in the open set, as a result of taking a path through n to these nodes. You continue in this manner until End is in the closed set. Then you're done.

With A-Star, you assign an extra value to each node. This is the estimated distance from that node to End. We will call it H(n) for each node n. To make the algorithm work, this distance must be less than or equal to the node's actual distance to End. Now, when you add a node to the multimap from Dijkstra's algorithm, instead of adding it based on the distance from Start, you add it based on the distance from Start, plus the estimate of the distance to End. You'll note that when you consider the first node on the multimap, you actually know shortest distance to that node, which allows you to move it from the open set to the closed set. You are guaranteed of this by the fact that your estimates are always on the low side.

What A-Star does to Dijkstra's algorithm is give preference on the multimap to nodes that are likely to belong on the shortest path from Start to End. The cool thing is that when End is the first node in the multimap, you are assured that you have found the shortest path to it. That's because all of the remaining nodes on the multimap have (distance from Start to node plus estimate of node to End) values that are greater than the path already discovered to End.

Let's take a really simple example:

The edges are weighted by their Euclidean distances, and I have labeled each node's H value to be its Euclidean distance to End. It should be clear that the actual path length from any node to End will have to be greater than or equal to its H value. In this example, only Start's path length is greater than its H value. The rest are equal.

When you run Dijkstra's algorithm on this graph, you go through the following steps:

Now, let's see how A-Star works.

The two drawings below illustrate the ending state of the two algorithms. In the drawing, the Start and End nodes are labeled in yellow, as are the edges on the shortest path between them. The nodes on the shortest path are labeled green too. The other nodes are labeled in blue if they are in the closed set, and red if they are in the open set (they'll be white if the algorithm never touches them). Finally, edges are red if the algorithm has visited them.



As you can see, the difference is the fact that the A node is in the closed set with Dijkstra, but in the open set with A-star. As a result, the edge from A to end was not processed in A-star.

How good do those estimates need to be?

I'm not going to write much here, because the Patel pages above do a fantastic job with it. However, I'll go over a few key points:

Some programs to explore A-Star

I've written a few programs to explore A-Star. Let's start with a-star-tester-0.cpp. You call it as follows:

a-star-tester-0 seed xmin ymin xmax ymax nn connections-per-node Dijsktra|A-Star|Nothing print(Y|N|G)

It will create a random graph on the XY coordinate plane bounded by (xmin,ymin) and (xmax,ymax). The graph will have nn nodes and it will be fully connected. Each node will have roughly connections-per-node edges to nearby nodes. The Start node will be on the left side somewhere, and the End node will be on the right side. When the graph is created, you can then run Dijkstra, A-Star, or nothing on it. At the end, if you specify Y, it will print out the graph using the colors above, using jgraph. If you specify G, it simply prints out a text file representation of the graph before runnning the algorithm. If you specify N, it only prints timing information, and the sizes of the open and closed sets. Finally, if you specify 0 as connections-per-node, it doesn't create the graph, but instead reads its text representation from standard input.

Let's try a simple example:

UNIX> a-star-tester-0 106 -10 -10 10 10 40 5 Nothing G > G-40-5-106.txt
UNIX> a-star-tester-0 106 -10 -10 10 10 40 0 Dijkstra N < G-40-5-106.txt
(* Time:          0.000028 *)
(* Path Length:     12.875 *)
(* Closed Set Size:     25 *)
(* Open   Set Size:     10 *)
(* Unvisited Nodes:      5 *)
UNIX> a-star-tester-0 106 -10 -10 10 10 40 0 A-Star N < G-40-5-106.txt
(* Time:          0.000011 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      7 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     19 *)
If you use the jgraph option (and do a little tweaking, as I am wont to do), you get the following pictures of the above calls:



The pictures and the output of the programs convey the same thing -- they have both found the same shortest path from Start to the End. However, Dijkstra's algorithm visits more nodes and edges, and has many more nodes in its closed set. A-Star, on the other hand, is much smarter about its closed set, which only has two nodes that aren't on the shortest path.

(Graph Generation)

This isn't about A-Star, but it is interesting, because generating good relevant graphs was a bit of a challenge.

I'd be remiss if I didn't talk a little about how the program generates its graphs. My intent was to have each node have connections-per-node edges to its closest neighbors. There are some issues with this, of course. First, there is an issue of reflexiveness. Take a look at the rightmost node in the graphs directly above, and suppose that connections-per-node were one instead of five, and suppose that we call the node A, and the one closest to it B. It's pretty clear that A is not the closest node to B, or even one of the four closest nodes to B.

So, what I settled on was the following. I considered the nodes in random order (the order in which they were created). When I considered a node, it may already have had edges on its adjacency list. So, I needed to generate z = (connections-per-node - adjacency.size()) new edges. To do that, I maintained a map of closest nodes, ordered by their distance to the node. It starts empty, and I never let it get bigger than z elements (if it does, I delete the biggest elements on it).

Now, I consider four nodes as candidates for the map:

If these nodes aren't already on the adjacency list, and if I haven't visited them before, I insert them into the map. After considering the four nodes, I look at the next node along the given axis. For example, I set xlow to be the node whose x value is largest value smaller than xlow's.

I can stop when the distance along the given axis is big enough. For example, I can stop considering xlow nodes when the distance between xlow and the node along the x axis is bigger than the largest node in the map.

It's hard to do a formal analysis of this, but it should do a decent job of considering a fairly small subset of the nodes, especially when the number of nodes is large and connections-per-node is small. I would probably do better to break up the grid into squares whose sides are sqrt(|V|) or something like that, and then only look for edges within certain squares. I don't have the time to play with it.

When I'm done with this process, I add the nodes in the map to the node's adjacency list (and add the reverse edges, because these graphs are undirected).

At that point, the graph may be disconnected. To connect it, during the graph generation process, I maintain disjoint sets of connected components. Then, after generated the edges above, I connect the graph by going through the following process -- I choose a random node, and find the closest node to it that is not in the same set. Then, I find the closest node to that one that is not in the same set. The logic there is that the first node that I have chosen may be in the "middle" of its disjoint set. However, the node closest to it will be on the "edge" of its disjoint set. So that's a good node to connect with another disjoint set. That part of the algorithm is O(|V|) for each disjoint set.

I'm left with a fully connected graph, and its running time shouldn't be O(|V|2), which is what I was trying to avoid.

Trying to make it faster and better

I have a few modifications to the program. The first is a-star-tester-1.cpp, which stores the H value of each node when it is first calculated. The previous program simply calculated it every time it was needed. I don't explore this in these lecture notes, but if you're interested, you should. It's a memory vs. instruction tradeoff, and those are often more subtle than you think.

The second program is in a-star-tester-2.cpp, which tries to make H better. What it does is the following: When it needs to calculate a node's H value, it doesn't use Euclidean distance. Instead, it considers every node to which it is incident that is not already in the closed set. It calculates the distance to that node, plus that node's Euclidean distance to End, and sets its H value to the minimum of these values. You should see how that gives you H values that are higher than the Euclidean distance, but still less than or equal to the actual shortest path lengh.

If a node is in the open set, and all of its edges are to nodes in the closed set, then there is no way that the node can be on a shortest path. When we discover such a node, we set its H value to ∞.

Look at a simple example, which is the first graph I showed you:

Using our new calculation, the Start node's H value is 4, rather than 3.16. That is because Start chose its H value to be the minimum distance-plus-H value to A and B, rather than its Euclidean distance. You should see how that results in a higher, but still legal H value.

Obviously, the tradeoff in this program is going to be smaller closed-set size, versus more expensive calculation of H. When I first implemented it, here's the picture I got with the above example:

That doesn't look right, does it? The closed set side is bigger than the previous example. The reason is that I used the algorithm to set End's H value, rather than just setting it to zero. That's a bug, and I discovered it by looking at the pictures. I show this just to highlight how important it is to test your programs as you write them!!!. That program is in a-star-tester-2.cpp.

The bug is fixed in a-star-tester-3.cpp. Here's the output -- you can see that there is one fewer node in the closed set than in the previous A-Star example:

UNIX> a-star-tester-3 106 -10 -10 10 10 40 5 A-Star N < G-40-5-106.txt
(* Time:          0.000008 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      6 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     20 *)

Using H values that are too high

My final program is a-star-tester-4.cpp. In this program, rather than specify "Dijkstra|A-Star|Nothing," you specify a factor. The H value is set to the Euclidean distance multiplied by the factor. This means that factors less than or equal to one will find the shortest paths, but smaller factors will yield bigger closed set sizes. Factors greater than one will have smaller closed set sizes, but they are not guaranteed to find the shortest paths. For example:
UNIX> a-star-tester-4 106 -10 -10 10 10 40 5 1.1 N < G-40-5-106.txt
(* Time:          0.000011 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      7 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     19 *)
UNIX> a-star-tester-4 106 -10 -10 10 10 40 5 2 N < G-40-5-106.txt
(* Time:          0.000015 *)
(* Path Length:     14.672 *)
(* Closed Set Size:      6 *)
(* Open   Set Size:     15 *)
(* Unvisited Nodes:     19 *)
As you can see, that last call didn't find the shortest path. This is the one that it found. When the factors get really high, the program becomes a greedy DFS using only the Euclidean distances as heuristics:

Pretty Pictures of Large Graphs

In this example, I create a 10,000 node graph:
UNIX> a-star-tester-0 1 -10 -10 10 10 10000 3 Nothing G > G-10000-3-1.txt
Below, I show the various programs finding the shortest paths (sometimes):

The graph
Finding the yellow nodes is like finding Waldo...

Path Length = 30.706.
Time: 0.004342 seconds
Closed Set Size: 9729

Path Length: 30.706
Time: 0.002506 seconds
Closed Set Size: 4893

A-Star-3 (improved H)
Path Length: 30.706
Time: 0.002754 seconds
Closed Set Size: 4816

A-Star, Factor = 1.1
Path Length: 30.721
Time: 0.002091 seconds
Closed Set Size: 3971

A-Star, Factor = 2
Path Length: 32.212
Time: 0.000481 seconds
Closed Set Size: 891

A few items of note:

Below, I graph the algorithms on the 100,000 node graph in G100K.txt. The leftmost Y axes are log scale, and you can see the various tradeoffs very clearly:

Further study

There's more probing that you can do with A-Star. For example: I will leave these explorations to the inquisitive student!