## CS494 Lecture Notes - A-Star

• James S. Plank
• Directory: /home/plank/cs494/Notes/A-Star
• Original notes: February, 2015
• Most recent revision: Mon Jan 25 11:03:21 EST 2016

A-Star (yes, you can write it A*, but I prefer to write it out. I'm old) is a shortest path algorithm which tweaks Dijkstra's shortest path algorithm to get some drastic performance gains. There are some great resources for A-Star on the web, and I urge you to read one of them (especially the first one).
• Amit Patel's series of pages on A-Star are here. They are wonderful.
• Mr. Patel has written another set of pages here.
• Of course, there's Wikipedia.
The problem is as follows: You want to find the shortest path from node Start to node End in a directed, weighted graph. The idea behind A-Star is simple and elegant, and it is most easily explained with respect to Dijkstra's algorithm. With Dijkstra, you maintain a set of closed nodes: these are the nodes for which you already know the shortest paths from Start. You also maintain a multimap of open nodes ordered by their shortest known distance from Start. At each step, you are assured that the first node in the multimap, let's call it n, is one that you can remove from the open set and add to the closed set. At the same time, you look at each edge from n, and see if you can add or update any nodes in the open set, as a result of taking a path through n to these nodes. You continue in this manner until End is in the closed set. Then you're done.

With A-Star, you assign an extra value to each node. This is the estimated distance from that node to End. We will call it H(n) for each node n. To make the algorithm work, this distance must be less than or equal to the node's actual distance to End. Now, when you add a node to the multimap from Dijkstra's algorithm, instead of adding it based on the distance from Start, you add it based on the distance from Start, plus the estimate of the distance to End. You'll note that when you consider the first node on the multimap, you actually know shortest distance to that node, which allows you to move it from the open set to the closed set. You are guaranteed of this by the fact that your estimates are always on the low side.

What A-Star does to Dijkstra's algorithm is give preference on the multimap to nodes that are likely to belong on the shortest path from Start to End. The cool thing is that when End is the first node in the multimap, you are assured that you have found the shortest path to it. That's because all of the remaining nodes on the multimap have (distance from Start to node plus estimate of node to End) values that are greater than the path already discovered to End.

Let's take a really simple example:

The edges are weighted by their Euclidean distances, and I have labeled each node's H value to be its Euclidean distance to End. It should be clear that the actual path length from any node to End will have to be greater than or equal to its H value. In this example, only Start's path length is greater than its H value. The rest are equal.

When you run Dijkstra's algorithm on this graph, you go through the following steps:

• Add Start to the closed set. Nodes A and B are added to the multimap, with distances of 1.4 and 3 respectively.
• Node A is at the beginning of the multimap, so you remove it and add it to the closed set. You add End to the multimap with a distance of 5.4.
• Node B is at the beginning of the multimap, so you remove it and add it to the closed set. The path to End has been improved to 4, so you replace End in the multimap to have a distance of 4.
• Finally End is at the beginning of the multimap, so you know its distance is 4. The end.

Now, let's see how A-Star works.

• Add Start to the closed set. Nodes A and B are added to the multimap, but instead of using their distances as keys, you use their distances plus H values. Therefore, A has a value of 5.4, and B has a value of 4. You still maintain their actual distances from Start, because you'll need that in subsequent calculations.
• Node B is at the beginning of the multimap, so you remove it and add it to the closed set. You add End to the multimap with a distance of 4.
• End is now at the beginning of the multimap, so you know its distance is 4. You never had to visit A, because you knew that the path to End through A has to be greater than or equal to 5.4.
The two drawings below illustrate the ending state of the two algorithms. In the drawing, the Start and End nodes are labeled in yellow, as are the edges on the shortest path between them. The nodes on the shortest path are labeled green too. The other nodes are labeled in blue if they are in the closed set, and red if they are in the open set (they'll be white if the algorithm never touches them). Finally, edges are red if the algorithm has visited them.

 Dijkstra A-Star

As you can see, the difference is the fact that the A node is in the closed set with Dijkstra, but in the open set with A-star. As a result, the edge from A to end was not processed in A-star.

### How good do those estimates need to be?

I'm not going to write much here, because the Patel pages above do a fantastic job with it. However, I'll go over a few key points:
• If you always set H to zero, you have just turned A-Star into Dijkstra.

• So long as each node's H value is less than or equal to its actual distance to End, A-Star will always find the shortest path.

• The number of nodes in the closed set will be reduced by having higher, but still legal, H values.

• If each node has its actual distance as its H value, and there is only one shortest path, then A-Star will find it without adding any additional nodes to the closed set.

• However, if there are multiple shortest paths, your closed set can get pretty big. The reason is that for all nodes on the shortest path, the sum of shortest path from Start and the estimate is equal to the path length from Start to End. So this sum is the same for all nodes that are on any shortest path.

To make A-star effective here, you need to tweak either the algorithm or your estimates here -- the Patel page goes into great length on this, and it's fascinating.

• If your H values are too big, then you may not find the actual shortest path, but instead one that is hopefully close to the shortest path. That may be good enough for your purposes, and your closed set will be smaller than if you had used better estimates.

### Some programs to explore A-Star

I've written a few programs to explore A-Star. Let's start with a-star-tester-0.cpp. You call it as follows:

 ```a-star-tester-0 seed xmin ymin xmax ymax nn connections-per-node Dijsktra|A-Star|Nothing print(Y|N|G) ```

It will create a random graph on the XY coordinate plane bounded by (xmin,ymin) and (xmax,ymax). The graph will have nn nodes and it will be fully connected. Each node will have roughly connections-per-node edges to nearby nodes. The Start node will be on the left side somewhere, and the End node will be on the right side. When the graph is created, you can then run Dijkstra, A-Star, or nothing on it. At the end, if you specify Y, it will print out the graph using the colors above, using jgraph. If you specify G, it simply prints out a text file representation of the graph before runnning the algorithm. If you specify N, it only prints timing information, and the sizes of the open and closed sets. Finally, if you specify 0 as connections-per-node, it doesn't create the graph, but instead reads its text representation from standard input.

Let's try a simple example:

```UNIX> a-star-tester-0 106 -10 -10 10 10 40 5 Nothing G > G-40-5-106.txt
UNIX> a-star-tester-0 106 -10 -10 10 10 40 0 Dijkstra N < G-40-5-106.txt
(* Time:          0.000028 *)
(* Path Length:     12.875 *)
(* Closed Set Size:     25 *)
(* Open   Set Size:     10 *)
(* Unvisited Nodes:      5 *)
UNIX> a-star-tester-0 106 -10 -10 10 10 40 0 A-Star N < G-40-5-106.txt
(* Time:          0.000011 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      7 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     19 *)
UNIX>
```
If you use the jgraph option (and do a little tweaking, as I am wont to do), you get the following pictures of the above calls:

 Dijkstra A-Star

The pictures and the output of the programs convey the same thing -- they have both found the same shortest path from Start to the End. However, Dijkstra's algorithm visits more nodes and edges, and has many more nodes in its closed set. A-Star, on the other hand, is much smarter about its closed set, which only has two nodes that aren't on the shortest path.

### (Graph Generation)

This isn't about A-Star, but it is interesting, because generating good relevant graphs was a bit of a challenge.

I'd be remiss if I didn't talk a little about how the program generates its graphs. My intent was to have each node have connections-per-node edges to its closest neighbors. There are some issues with this, of course. First, there is an issue of reflexiveness. Take a look at the rightmost node in the graphs directly above, and suppose that connections-per-node were one instead of five, and suppose that we call the node A, and the one closest to it B. It's pretty clear that A is not the closest node to B, or even one of the four closest nodes to B.

So, what I settled on was the following. I considered the nodes in random order (the order in which they were created). When I considered a node, it may already have had edges on its adjacency list. So, I needed to generate z = (connections-per-node - adjacency.size()) new edges. To do that, I maintained a map of closest nodes, ordered by their distance to the node. It starts empty, and I never let it get bigger than z elements (if it does, I delete the biggest elements on it).

Now, I consider four nodes as candidates for the map:

• The node whose x value is largest value smaller than the node's. Let's call this node xlow.
• The node whose x value is smallest value greater than the node's.
• The node whose y value is largest value smaller than the node's.
• The node whose y value is smallest value greater than the node's.
If these nodes aren't already on the adjacency list, and if I haven't visited them before, I insert them into the map. After considering the four nodes, I look at the next node along the given axis. For example, I set xlow to be the node whose x value is largest value smaller than xlow's.

I can stop when the distance along the given axis is big enough. For example, I can stop considering xlow nodes when the distance between xlow and the node along the x axis is bigger than the largest node in the map.

It's hard to do a formal analysis of this, but it should do a decent job of considering a fairly small subset of the nodes, especially when the number of nodes is large and connections-per-node is small. I would probably do better to break up the grid into squares whose sides are sqrt(|V|) or something like that, and then only look for edges within certain squares. I don't have the time to play with it.

When I'm done with this process, I add the nodes in the map to the node's adjacency list (and add the reverse edges, because these graphs are undirected).

At that point, the graph may be disconnected. To connect it, during the graph generation process, I maintain disjoint sets of connected components. Then, after generated the edges above, I connect the graph by going through the following process -- I choose a random node, and find the closest node to it that is not in the same set. Then, I find the closest node to that one that is not in the same set. The logic there is that the first node that I have chosen may be in the "middle" of its disjoint set. However, the node closest to it will be on the "edge" of its disjoint set. So that's a good node to connect with another disjoint set. That part of the algorithm is O(|V|) for each disjoint set.

I'm left with a fully connected graph, and its running time shouldn't be O(|V|2), which is what I was trying to avoid.

### Trying to make it faster and better

I have a few modifications to the program. The first is a-star-tester-1.cpp, which stores the H value of each node when it is first calculated. The previous program simply calculated it every time it was needed. I don't explore this in these lecture notes, but if you're interested, you should. It's a memory vs. instruction tradeoff, and those are often more subtle than you think.

The second program is in a-star-tester-2.cpp, which tries to make H better. What it does is the following: When it needs to calculate a node's H value, it doesn't use Euclidean distance. Instead, it considers every node to which it is incident that is not already in the closed set. It calculates the distance to that node, plus that node's Euclidean distance to End, and sets its H value to the minimum of these values. You should see how that gives you H values that are higher than the Euclidean distance, but still less than or equal to the actual shortest path lengh.

If a node is in the open set, and all of its edges are to nodes in the closed set, then there is no way that the node can be on a shortest path. When we discover such a node, we set its H value to ∞.

Look at a simple example, which is the first graph I showed you:

Using our new calculation, the Start node's H value is 4, rather than 3.16. That is because Start chose its H value to be the minimum distance-plus-H value to A and B, rather than its Euclidean distance. You should see how that results in a higher, but still legal H value.

Obviously, the tradeoff in this program is going to be smaller closed-set size, versus more expensive calculation of H. When I first implemented it, here's the picture I got with the above example:

That doesn't look right, does it? The closed set side is bigger than the previous example. The reason is that I used the algorithm to set End's H value, rather than just setting it to zero. That's a bug, and I discovered it by looking at the pictures. I show this just to highlight how important it is to test your programs as you write them!!!. That program is in a-star-tester-2.cpp.

The bug is fixed in a-star-tester-3.cpp. Here's the output -- you can see that there is one fewer node in the closed set than in the previous A-Star example:

```UNIX> a-star-tester-3 106 -10 -10 10 10 40 5 A-Star N < G-40-5-106.txt
(* Time:          0.000008 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      6 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     20 *)
UNIX>
```

### Using H values that are too high

My final program is a-star-tester-4.cpp. In this program, rather than specify "Dijkstra|A-Star|Nothing," you specify a factor. The H value is set to the Euclidean distance multiplied by the factor. This means that factors less than or equal to one will find the shortest paths, but smaller factors will yield bigger closed set sizes. Factors greater than one will have smaller closed set sizes, but they are not guaranteed to find the shortest paths. For example:
```UNIX> a-star-tester-4 106 -10 -10 10 10 40 5 1.1 N < G-40-5-106.txt
(* Time:          0.000011 *)
(* Path Length:     12.875 *)
(* Closed Set Size:      7 *)
(* Open   Set Size:     14 *)
(* Unvisited Nodes:     19 *)
UNIX> a-star-tester-4 106 -10 -10 10 10 40 5 2 N < G-40-5-106.txt
(* Time:          0.000015 *)
(* Path Length:     14.672 *)
(* Closed Set Size:      6 *)
(* Open   Set Size:     15 *)
(* Unvisited Nodes:     19 *)
UNIX>
```
As you can see, that last call didn't find the shortest path. This is the one that it found. When the factors get really high, the program becomes a greedy DFS using only the Euclidean distances as heuristics:

### Pretty Pictures of Large Graphs

In this example, I create a 10,000 node graph:
```UNIX> a-star-tester-0 1 -10 -10 10 10 10000 3 Nothing G > G-10000-3-1.txt
```
Below, I show the various programs finding the shortest paths (sometimes):

 The graphFinding the yellow nodes is like finding Waldo... Dijkstra Path Length = 30.706. Time: 0.004342 seconds Closed Set Size: 9729 A-Star Path Length: 30.706 Time: 0.002506 seconds Closed Set Size: 4893 A-Star-3 (improved H) Path Length: 30.706 Time: 0.002754 seconds Closed Set Size: 4816 A-Star, Factor = 1.1 Path Length: 30.721 Time: 0.002091 seconds Closed Set Size: 3971 A-Star, Factor = 2 Path Length: 32.212 Time: 0.000481 seconds Closed Set Size: 891

A few items of note:

• The improvement from Dijkstra to A-Star is impressive!
• Although A-Star-3 did improve the size of the closed set, the improvement wasn't enough to offset the complexity of calculating H.
• Using a factor of 1.1 reduced the size of the closed set by 19 percent, while it also resulted in a path that was 0.05 percent longer than the optimal one. Its running time was 17 percent faster than our original A-Star.
• Using a factor of 2.0 reduced the size of the closed set by a whopping 82 percent, and improved the running time by 81 percent. However, the path it found was 5 percent longer than the optimal one. You can see that the "obstacle" in the graph caused this one the biggest problem. I find this stuff fascinating....
Below, I graph the algorithms on the 100,000 node graph in G100K.txt. The leftmost Y axes are log scale, and you can see the various tradeoffs very clearly:

### Further study

There's more probing that you can do with A-Star. For example:
• When is a-star-tester-3 faster than a-star-tester-0?
• When is a-star-tester-1 faster than a-star-tester-0?
• What is the real tradeoff of the factor in a-star-tester-4?
I will leave these explorations to the inquisitive student!