[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

S. Russell and E. Wefald (1991). Do the right thing : studies in limited rationality (Chapter 4: Application to Game-Playing), MIT Press

@Book{Russell91,
  author =       "Stuart Russell and Eric Wefald",
  title =        "Do the right thing : studies in limited rationality",
  publisher =    "MIT Press",
  year =         "1991",
  series =       "Artificial intelligence",
  address =      "Cambridge, Mass",
}

Author of the summary: David Furcy, 1999, dfurcy@cc.gatech.edu

Cite this paper for:

Summary

This chapter describes an application to game-playing of the general ideas behind rational metareasoning developed in chapters 1-3. So the general equations for the value of computation obtained in chapter 3 are instantiated in the following way. First, the equations are simplified using the subtree independence assumption. Then, the propagation function 'f' specific to minimax search is derived from an example, including pruning rules. Methods for estimating needed probability distributions are provided. Finally, the resulting equations are implemented in two algorithms, MGSS* and MGSS2, whose performance is evaluated against alpha-beta with Othello as the main application domain.

Detailed outline

Let's start with some terminology

Minimax tree

   MAX                        current
                     ________/ |  \  \
                    /          |   \  \    
   MIN           alpha      beta-1 ... beta-i ... 
                /  |  \     /  |  \
               /   |   \   /   |   \
   MAX            ...
                 / | \
                /  |  \
   MIN             j
where alpha is the successor with the (currently) best value and beta-1 is the successor with the (currently) second best value

Subtree Independence (or SI)

Subtree Independence holds when a given computation affects the utility estimate of only one action.

Clearly, SI holds for minimax search since an action consists of expanding one node, say j, and this action can only affect the utility of one action (namely the move to alpha) but not that of any other beta-i.

Now, since SI holds, a deliberation is useful only if it changes the agent's intention. The latter is to move to alpha and will change only if either a computation step, say Sj, raises U(beta-i) above U(alpha), or another computation step, say Sk, lowers U(alpha) below U(beta-1). Cf. Eqs. 4.3.3 and 4.3.4.

Reminder from chapter 3

Assume that computation Sj provides an estimate of the value of a random variable j and that the estimate of the value of action Ai is changed to f(j) where 'f' is the propagation function describing the object level (OL). Then Eq. 4.3.5 can be used to compute the probability distribution pij for the effect of computation Sj on the utility of action Ai.

Plugging Eq. 4.3.5 into 4.3.3 (resp. 4.3.4) yields Eq. 4.3.6 (resp. 4.3.7) for computing the expected utility of computation Sj (resp. Sk).

Note that this assumption holds here with j being the value after expansion of a leaf node under e.g. alpha and f(j) is the new value of alpha after propagating j upward. Therefore f summarizes the OL of minimax search, namely the way values propagate up the ancestor hierarchy.

Roadmap

f is a model of the OL computation steps but there are different levels of granularity at which to analyze minimax search. One can view one atomic step as the expansion of all successors of a node at once (Cf. section 4.4) leading to algorithm MGSS* in section 4.5. But one could also construe node expansion as the finer-grained computation of expanding a single successor of a node (Cf. section 4.6) leading to algorithm MGSS2 in section 4.7. We will now focus on MGSS*.

To be able to compute the value of computations using Eq. 4.3.6 & 4.3.7, we need:
  1. To obtain a closed-form expression for f(j), and
  2. To have a method for estimating pjj and pkk.
Step 1: Expressing f Step 2: Estimating pjj(u) and pkk(u)

Let Vold(j) and Vnew(j) denote the current and new value (after depth one search) of node j.

pjj(u) = P(Vnew(j) = u) where Vnew(j) = Vold(j) + error

MGSS*

Terminology: The algorithm maintains a Queue of nodes ordered by decreasing SearchValue. This Queue is initialized with the relevant successors of the current node. Then, at each step: MGSS* was tested on Othello. Note that "computing search values is a good deal quicker than computing the static evaluation function, enabling MGSS* to perform with very low overhead." The search is very selective (i.e. goes down to uneven depths). The results indicate that "with moderately small time allocation, MGSS* is significantly better than an alpha-beta search using the same evaluation function, even though MGSS* generates significantly fewer nodes."

Some disadvantages of MGSS* include:

MGSS2

MGSS2 has the same overall structure as MGSS* but, since only one successor is expanded at a time, the algorithm needs probability distributions for the value of the minimum of the n successors. given that k of the n have already been evaluated. Some new irrelevance criteria are also derived. The results obtained with MGSS2 are only preliminary but encouraging.

Other games

Chess: No results were available yet.
Backgammon: Extension of this application to probabilistic games was underway. Equations are given and encouraging preliminary results against *-alphabeta were obtained.

Related work on game-playing

"The mainstream of game-playing research [...] has been dominated by fixed-depth minimax search using alpha-beta pruning" but its metalevel policy is too rigid.

Some limitations of the approach

Summary author's notes:


Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies ( jim@jimdavies.org )
Last modified: Tue Jul 13 08:55:37 EDT 1999