[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

S. Russell and E. Wefald (1991). Do the right thing : studies in limited rationality (Chapter 4: Application to Game-Playing), MIT Press

@Book{Russell91,
  author =       "Stuart Russell and Eric Wefald",
  title =        "Do the right thing : studies in limited rationality",
  publisher =    "MIT Press",
  year =         "1991",
  series =       "Artificial intelligence",
  address =      "Cambridge, Mass",
}

Author of the summary: David Furcy, 1999, dfurcy@cc.gatech.edu

Cite this paper for:

SYSTEMS: MGSS* and MGSS2
Application of rational metareasoning to game-playing.
Subtree independence assumption.
Pruning rules for minimax search.

Summary

This chapter describes an application to game-playing of the general ideas behind rational metareasoning developed in chapters 1-3. So the general equations for the value of computation obtained in chapter 3 are instantiated in the following way. First, the equations are simplified using the subtree independence assumption. Then, the propagation function 'f' specific to minimax search is derived from an example, including pruning rules. Methods for estimating needed probability distributions are provided. Finally, the resulting equations are implemented in two algorithms, MGSS* and MGSS2, whose performance is evaluated against alpha-beta with Othello as the main application domain.

Detailed outline

Let's start with some terminology

Minimax tree


   MAX                        current
                     ________/ |  \  \
                    /          |   \  \    
   MIN           alpha      beta-1 ... beta-i ... 
                /  |  \     /  |  \
               /   |   \   /   |   \
   MAX            ...
                 / | \
                /  |  \
   MIN             j

where alpha is the successor with the (currently) best value and beta-1 is the successor with the (currently) second best value

Subtree Independence (or SI)

Subtree Independence holds when a given computation affects the utility estimate of only one action.

Clearly, SI holds for minimax search since an action consists of expanding one node, say j, and this action can only affect the utility of one action (namely the move to alpha) but not that of any other beta-i.

Now, since SI holds, a deliberation is useful only if it changes the agent's intention. The latter is to move to alpha and will change only if either a computation step, say Sj, raises U(beta-i) above U(alpha), or another computation step, say Sk, lowers U(alpha) below U(beta-1). Cf. Eqs. 4.3.3 and 4.3.4.

Reminder from chapter 3

Assume that computation Sj provides an estimate of the value of a random variable j and that the estimate of the value of action Ai is changed to f(j) where 'f' is the propagation function describing the object level (OL). Then Eq. 4.3.5 can be used to compute the probability distribution pij for the effect of computation Sj on the utility of action Ai.

Plugging Eq. 4.3.5 into 4.3.3 (resp. 4.3.4) yields Eq. 4.3.6 (resp. 4.3.7) for computing the expected utility of computation Sj (resp. Sk).

Note that this assumption holds here with j being the value after expansion of a leaf node under e.g. alpha and f(j) is the new value of alpha after propagating j upward. Therefore f summarizes the OL of minimax search, namely the way values propagate up the ancestor hierarchy.

Roadmap

f is a model of the OL computation steps but there are different levels of granularity at which to analyze minimax search. One can view one atomic step as the expansion of all successors of a node at once (Cf. section 4.4) leading to algorithm MGSS* in section 4.5. But one could also construe node expansion as the finer-grained computation of expanding a single successor of a node (Cf. section 4.6) leading to algorithm MGSS2 in section 4.7. We will now focus on MGSS*.

To be able to compute the value of computations using Eq. 4.3.6 & 4.3.7, we need:

To obtain a closed-form expression for f(j), and
To have a method for estimating pjj and pkk.

Step 1: Expressing f

A first expression of f(j) for minimax without pruning is given as Eq. 4.4.8.
Eq. 4.4.9 is a modified version of Eq. 4.4.8 that takes into account some relevance criteria. It uses the notion of delta-bound defined on page 90. So Eq. 4.4.9, i.e. f(j) = min(delta-bound(j), j), is the final version of f(j) and models minimax with pruning (a la alpha-beta).
Reducing 4.4.9 to 4.4.10 and plugging it into 4.3.6 (resp. 4.3.7) yields Eq. 4.4.11 (resp 4.4.12).

Step 2: Estimating pjj(u) and pkk(u)

Let Vold(j) and Vnew(j) denote the current and new value (after depth one search) of node j.

pjj(u) = P(Vnew(j) = u) where Vnew(j) = Vold(j) + error

Sample the error term for a large number of depth one expansions to get a list of pairs {state,error}.
Group these pairs into buckets according to some chosen state features which are game-dependent. This is to allow for generalization across states.
pjj and pkk can be approximated by a normal distribution with a mean and variance specific to each bucket.
Plugging the equations for normal curves into 4.4.11 (resp 4.4.12), we get the final equations for the value of computation in 4.4.13 (resp. 4.4.14).

MGSS*

Terminology:

GameValue(node) = StaticValue(node) if node is a leaf, or its current backed-up value otherwise
SearchValue(node) = expected utility gain from expanding the node.

The algorithm maintains a Queue of nodes ordered by decreasing SearchValue. This Queue is initialized with the relevant successors of the current node. Then, at each step:

Pop Queue to get j
If its expected utility minus the TimeCost is negative or null, then stop the search and return the current best move, else perform computation Sj (i.e., expand j) and update the Queue and all required values.

MGSS* was tested on Othello. Note that "computing search values is a good deal quicker than computing the static evaluation function, enabling MGSS* to perform with very low overhead." The search is very selective (i.e. goes down to uneven depths). The results indicate that "with moderately small time allocation, MGSS* is significantly better than an alpha-beta search using the same evaluation function, even though MGSS* generates significantly fewer nodes."

Some disadvantages of MGSS* include:

Some opportunities for pruning are lost (Cf. alpha-beta).
Satisficing effects are also lost.

MGSS2

MGSS2 has the same overall structure as MGSS* but, since only one successor is expanded at a time, the algorithm needs probability distributions for the value of the minimum of the n successors. given that k of the n have already been evaluated. Some new irrelevance criteria are also derived. The results obtained with MGSS2 are only preliminary but encouraging.

Other games

Chess: No results were available yet.
Backgammon: Extension of this application to probabilistic games was underway. Equations are given and encouraging preliminary results against *-alphabeta were obtained.

Related work on game-playing

"The mainstream of game-playing research [...] has been dominated by fixed-depth minimax search using alpha-beta pruning" but its metalevel policy is too rigid.

B* (Berliner) uses a two-valued evaluation function (lower bound plus upper bound of presumed true value) and tries to prove one move better than all other moves. But it does not take TimeCost into account and will favor computations to distinguish between 2 actions even when they have symmetrical utilities.
PB* (Palay) -- I am getting very tired here and this summary is getting long anyway 8^) so I will just list the remaining approaches mentioned.
min/max approximation (Rivest)
conspiracy number (McAllester)

Some limitations of the approach

The unit of computation does NOT always affect only the utility of a single action as is assumed here.
TimeCost is NOT always independent of the move chosen.

Summary author's notes:

This chapter contains many equations, none of which are copied into this summary. Instead, the latter provides a roadmap for their derivation.
This summary focuses on sections 4.1 through 4.5.

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies ( jim@jimdavies.org )

Last modified: Tue Jul 13 08:55:37 EDT 1999