[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

## Rational Metareasoning, Chapter 3: Do the Right Thing, S. Russell and E. Weld.

### Author of the summary: Yaxin Liu, 1999, yxliu@cc

#### Cite this paper for:

• Rational metareasoning
• Decision theoretic meta-control
• Meta-greedy strategy
• Single-step assumption

#### Main Ideas:

It discusses how decision theory principles can be applied to meta-reasoning, what assumption about reasoning is needed, and what implications we can have.  More importantly, what information we can make use of and what simplifications are necessary to do meta-reasoning.

### Basic principles

• Computations as internal actions, to be selected among based on their expected utilities
• Utility is from expected effects: time (and environmental changes) and revision of agent's intended external actions

### What computation is NOT for

+ Time cost outweigh the gain
+ Computation cannot make differences

### Assumptions

+ Outcome of external actions are known (deterministic)
+ Utility of each outcome unknown

### Models of deliberation

#### Notation

* A: external actions
* S: internal actions, computation steps
* W: world state, both external and internal
* [X]: the world state resulted from action X and current world state
* [X, W]: the world state resulted from action X and state W
* U(W): utility of state W, only depends on the external part
* S: sequence of computation, T for future computation
* S.S_j: sequence of S followed by S_j
* e: evidence made available by S, e_j is by S_j
* \hat{Q}^S: estimate of Q, based on S
* \alpha: current default action
* \alpha_T: external action supported by computation T
* \beta_1, \beta_2, ...: current second-, third- best external actions

#### External model:

Consider utility/probability of external actions and internal states; a default (current best) external action \alpha; computation is to refine \alpha; random choice can be the baseline

* Basic formula: E[U([A])] = \sum_k P(W_k) U([A, W_k])
* Value of computation: V(S) = U([S]) - U(\alpha)
* For complete computation, S is followed by an external action:
U([S]) = U([\alpha_S, [S]])
* For partial computation:
U([S]) = \sum_T P(T) U([\alpha_T, [S.T])

* Ideal control algorithm:
+ Keep doing the available computation S with the highest V(S) until all are negative
+ Do action \alpha

* We definitely need approximation, because the utilities are not known in advance, so we have to guess!  In this model, one simplification (approximation) usually holds, which is time cost.

* Time and its cost, derivation as follows
+ Intrinsic utility: U([A, [S]]) = U_I(A) - C(A, S)
+ Ideally, C(A, S) = C(S), independent of A
+ Further, assume only the duration of S matters, C(S) = TC(|S|)
+ Benefit of computation: \Delta(S) = U(\alpha_S) - U(\alpha)
+ The value of computation S then is: U(S) = \Delta(S) - TC(|S|)

#### Estimated utility model:

\alpha maximize utility; computation is to refine utility estimation; posterior estimation of utility based on computation results (or evidences)

* Estimates and partial information
+ \hat{U}^S([A]) = E[U([A]) | e] for sequence S (up to now)
+ \hat{U}^{S.S_j}([A]) = E[U(A) | e and e_j] for further computation
+ Discussion: for non-axiometic probabilities, [more like possibilities]

* Analysis for complete computations
+ From the previous model:
\hat{V}^S(S_j) = E[(U(\alpha_{S_j}) - U(\alpha)) | e and e_j] - TC(|S_j|)
+ E[\hat{V}(S_j)] =
\int_{u} max(u) p_j(u) d u -  \int_{-\inf}^{\inf} u p_{\alpha j}(u) d u
[This we speculate is the \Delta part elaborated]
+ p are to be collected either by statistics, or by using current distribution if calculation is exact; require further simplification assumptions

* Simplifying assumptions
* Meta-greedy algorithms: expand one step, then estimate the ultimate effects; no commitment to which external action to take
* Single-step assumption: any computation is complete, assume commit to action after one step of computation

* Partial computations
* Problem without partial computation formulation: credit assignment, seperability of benefit \Delta

* Qualitative behavior: Only compute when it helps

#### Concrete model:

use object-level features, as what computation changes what value

* Principles:
+ A computation only affects a certain components of the internal structure (seperability of j)
+ Changes to these components affect the agent's choice of external action in known ways (possibility for f)

* The formulas are confusing, seems the utility of external action is independent of what these actions are.  It is used in chapter 4 and 5 as the starting point.

### Summary author's notes:

• none

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)