http://www.jimdavies.org/summaries/

@Book{Russell91, author = "Stuart Russell and Eric Wefald", title = "Do the right thing : studies in limited rationality", publisher = "MIT Press", year = "1991", series = "Artificial intelligence", address = "Cambridge, Mass", }

- SYSTEMS: MGSS* and MGSS2
- Application of rational metareasoning to game-playing.
- Subtree independence assumption.
- Pruning rules for minimax search.

MAX current ________/ | \ \ / | \ \ MIN alpha beta-1 ... beta-i ... / | \ / | \ / | \ / | \ MAX ... / | \ / | \ MIN jwhere alpha is the successor with the (currently) best value and beta-1 is the successor with the (currently) second best value

Clearly, SI holds for minimax search since an action consists of expanding one node, say j, and this action can only affect the utility of one action (namely the move to alpha) but not that of any other beta-i.

Now, since SI holds, a deliberation is useful only if it changes the agent's intention. The latter is to move to alpha and will change only if either a computation step, say Sj, raises U(beta-i) above U(alpha), or another computation step, say Sk, lowers U(alpha) below U(beta-1). Cf. Eqs. 4.3.3 and 4.3.4.

Plugging Eq. 4.3.5 into 4.3.3 (resp. 4.3.4) yields Eq. 4.3.6 (resp. 4.3.7) for computing the expected utility of computation Sj (resp. Sk).

Note that this assumption holds here with j being the value after expansion of a leaf node under e.g. alpha and f(j) is the new value of alpha after propagating j upward. Therefore f summarizes the OL of minimax search, namely the way values propagate up the ancestor hierarchy.

To be able to compute the value of computations using Eq. 4.3.6 & 4.3.7, we need:

- To obtain a closed-form expression for f(j), and
- To have a method for estimating pjj and pkk.

- A first expression of f(j) for minimax without pruning is given as Eq. 4.4.8.
- Eq. 4.4.9 is a modified version of Eq. 4.4.8 that takes into account
some relevance criteria. It uses the notion of
**delta-bound**defined on page 90. So Eq. 4.4.9, i.e. f(j) = min(delta-bound(j), j), is the final version of f(j) and models minimax with pruning (a la alpha-beta). - Reducing 4.4.9 to 4.4.10 and plugging it into 4.3.6 (resp. 4.3.7) yields Eq. 4.4.11 (resp 4.4.12).

Let Vold(j) and Vnew(j) denote the current and new value (after depth one search) of node j.

pjj(u) = P(Vnew(j) = u) where Vnew(j) = Vold(j) + error

- Sample the error term for a large number of depth one expansions to get a list of pairs {state,error}.
- Group these pairs into buckets according to some chosen state features which are game-dependent. This is to allow for generalization across states.
- pjj and pkk can be approximated by a normal distribution with a mean and variance specific to each bucket.
- Plugging the equations for normal curves into 4.4.11 (resp 4.4.12),
we get the
**final equations for the value of computation**in 4.4.13 (resp. 4.4.14).

- GameValue(node) = StaticValue(node) if node is a leaf, or its current backed-up value otherwise
- SearchValue(node) = expected utility gain from expanding the node.

- Pop Queue to get j
- If its expected utility minus the TimeCost is negative or null, then stop the search and return the current best move, else perform computation Sj (i.e., expand j) and update the Queue and all required values.

Some disadvantages of MGSS* include:

- Some opportunities for pruning are lost (Cf. alpha-beta).
- Satisficing effects are also lost.

- B* (Berliner) uses a two-valued evaluation function (lower bound plus upper bound of presumed true value) and tries to prove one move better than all other moves. But it does not take TimeCost into account and will favor computations to distinguish between 2 actions even when they have symmetrical utilities.
- PB* (Palay) -- I am getting very tired here and this summary is getting long anyway 8^) so I will just list the remaining approaches mentioned.
- min/max approximation (Rivest)
- conspiracy number (McAllester)

- The unit of computation does NOT always affect only the utility of a single action as is assumed here.
- TimeCost is NOT always independent of the move chosen.

- This chapter contains many equations, none of which are copied into this summary. Instead, the latter provides a roadmap for their derivation.
- This summary focuses on sections 4.1 through 4.5.

Back to the Cognitive Science Summaries homepage

Cognitive Science Summaries Webmaster: JimDavies ( jim@jimdavies.org ) Last modified: Tue Jul 13 08:55:37 EDT 1999