[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

S. Russell & E. Wefald (1991), Do the Right Thing: Studies in Limited Rationality, MIT Press.

Chapter 5: Application to Problem-Solving Search

@Book{Russell91,
  author =       "Stuart Russell and Eric Wefald",
  title =        "Do the right thing : studies in limited rationality",
  publisher =    "MIT Press",
  year =         "1991",
  series =       "Artificial intelligence",
  address =      "Cambridge, Mass",
}

Author of the summary: Patrawadee Prasangsit, 1999, pp@cc.gatech.edu

Cite this paper for:

Summary:

This chapter describes an application of rational metareasoning to the problem of controlling heuristic search.

It discusses Korf's existing RTA*, presents a new and somewhat simplified algorithm SRTA*, which allows for more efficient search than RTA*, and then introduces DTA*, which combines a search algorithm governed by a decision-theoric control method.

A* Revisited

A* expands nodes in increasing order of their f values, where f(n) = g(n) + h(n) and
g(n) is the exact cost of getting from the current node to n by the best path so far
h(n) is the heuristic estimate of the cost of getting from n to the goal
The heuristic value h(n) is admissible if it is a lower bound on the cost of getting from n to a goal state (i.e., not overestimating the actual value).  It is consistent if it satisfies the triangle inequality (h(s) <= h(s') + c(s,s') where s' is a successor of s).  Consistency is stronger than admissibility.

Lemma 1 (Pearl) If heuristic function h is consistent, then the f values are non-decreasing along any path away from the root.

Lemma 2 If the heuristic function h is consistent and admissible, the error in the backed-up cost estimate cannot increase with search depth.

RTA*

RTA* uses the idea of minimin (taking the minimum heuristic estimate of all frontier nodes descended from a given top-level move as the move's value).  The algorithm, given in figure 5.5 and 5.6 as outer and inner loops, respectively, can be simply stated as follows: for each successor s' of a current node s, do minimin (with alpha pruning) starting from s' to a given preset depth; s updates its f value by taking that of best successor (one with minimum f value) plus some value (to be discussed shortly); the best successor becomes the next current node.

The algorithm can avoid cycles as long as (paraphrasing lemma 3) the update procedure in the outer loop has the property that f values of each node in the cycle strictly increase between visits, specifically, the new value of current node is greater than or equal to that of the best child (lemma 4).  Korf's RTA* simply updates a current node's f value with that of the second-best child.  Alternative update procedures are, for example, best+1, best+10%, second-best+1, second-best+10%, etc.  Since an accurate estimate of the second-best f value is not necessary, this allows for a greater degree of pruning, ... which leads us to the next algorithm.

SRTA* (Simplified RTA*)

Since the estimates at intermediate nodes are lower bounds on the estimates at the frontier nodes (i.e. the f value of a frontier node must be at least that of its intermediate nodes), it is possible to determine that a given move is not the best without having to know what its exact estimate is.  We can accomplish this extra pruning by using a single alpha bound for the entire search of all of the possible moves, rather than performing an independent search of each move using its own alpha.  The algorithm is given in figure 5.7 and 5.8 as outer and inner loops, respectively.

The comparison between RTA* and SRTA* (section 5.2.5), unsurprisingly, shows SRTA* as more superior in several aspects.

DTA*

To improve the algorithm even further, we wish to be able to decide to cut off some paths earlier and continue others more deeply, depending on our confidence in the heuristic estimates of the nodes at the ends of those paths.  This is possible using a dynamically expanding tree with no fixed depth limit.  At each step of the search, a stopping criterion is applied to decide whether to end the search and return a result, or continue expanding.  If the latter, a node-selection criterion is applied to decide on a leaf to expand.

The idea behind DTA* is that, for the node-selection criterion, we wish to choose the leaf node whose expansion has the greatest expected utility, and for the stopping criterion, we wish to stop searching when no node expansions have non-negative expected utility.

Let alpha and beta1 be the current best move (one with smallest backed-up estimate), and second-best move, respectively.  No amount of searching can make us change our view that alpha is the best move to take unless it raises the backed-up cost estimate of alpha above the current backed-up estimate of beta1.  If the new search is eventually successful (i.e., if alpha's heuristic estimate is raised above f'(beta1)), then all of the nodes in alpha's subtree with estimates lower than f'(beta1) will have been expanded (which will have been a lot of work).  If the new search eventually fails to change our minds about alpha, we will wish to find this out as quickly as possible, in order to avoid as much unnecessary search as possible.  It follows that hardest-first search is rational; always expand the hardest alpha-leaf, where this is the leaf descended from the current best move whose heuristic estimate currently has the lowest probability of exceeding f'(beta1) after expansion.

This probability depends on, among other things, the amount by which the leaf node has to change its value.  The probability of a given size error is usually a decreasing function of the size of the error; hence the best leaf will be the one with minimum current estimate, i.e. whose current cost estimate is f(alpha).  In case of tie, usually among nodes with minimum f estimate, the best leaf to expand will be the one with minimum h estimate, since the probability of a given size error is also in general a decreasing function of h; note that this will also be the deepest node among those with minimum f, in the case of uniform edge costs, since in that case g is equivalent to depth.

The algorithm uses the same outer loop as that of SRTA* (figure 5.7).  Its inner loop is given in figure 5.14.  The comparison between DTA* and SRTA* is also given in section 5.4, showing experiments on the 15-puzzle, in robotic path-planning, and applications in computer vision.


Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies ( jim@jimdavies.org )

Last modified: July 13, 1999