[ CogSci Summaries home | UP | email ]

Lebiere, C. (1998). The dynamics of cognition: An ACT-R model of cognitive arithmetic. Ph.D. Dissertation. CMU Computer Science Dept Technical Report CMU-CS-98-186. Pittsburgh, PA.

(The actual paper can be found at
http://reports-archive.adm.cs.cmu.edu/anon/1998/abstracts/98-186.html . It will be summarized in a published paper:
Lebiere, C. (1999). Cognitive dynamics: arithmetic as case study. To appear in the special English-language cognitive modeling issue of Kognitionswissenschaft [Cognitive Science].)

Author of the summary: Jim Davies, 1999, jim@jimdavies.org

(Author's note: This summary focuses on the chunks and productions and glosses over much of the subsymbolic detail.)

Cite this paper for:

chunks get into memory by environmental encoding or popped goals.
you must compute an answer enough times such that the chunk in question is sufficiently reinforced for it to be retrieved.
Problems with an argument of zero show no problem-size effect.(p19)
Smaller problems are easier to retrieve because they are better practiced.
Tie problems naturally are faster in actr because the same number appering twice makes the activation strong. (p21)
Errors can be made by retrieving an incorrect fact, or retrieving the wrong fact due to partial matching.
the model predicts incorrectly near-tie and corner effects(p64).
actr is critisized for having too many parameters to tweak. (p74)
Ironically, connectionist models are nothing but parameter tweaking.(p74)
There has been a concerted effort to constrain the values of global parameters in actr.
Noise level has become increasingly constrained.
Base level decay has essentially become fixed.

Contents:

Chapter 1: Introduction

Since we didn't evolve to do arithmetic, it must be learned with general learning principles. We have no special purpose functions for arithmetic. Generally, when getting the answer to an arithmetic fact, people either retrieve it from memory or compute it. People take longer and make more mistakes on problems with large numbers. (p3) This makes sense with computation, but it happens with retrieval too. Why? It might be because smaller problems (problems with smaller numbers) are seen more often and thus retrieved more easily. With multiplication, as numbers get larger they are seen less frequently in the addition subtasks you have to do. (p4)

Chapter 2: Model and Data (p6)

Actr has chunks of memory and productions that coorespond to procedural memory. Chunks can be activated, which reflects the log posterior odds that the chunk is relevant in the current context. Activation is the sum of the base-level activation of that chunk plus the sum of all context elements of their attentional weights (activation source level) times the strength of association between the context element and the chunk.

(Summary writer's note: There are lots of equations in this dissertation; html is bad at them so I won't try to reproduce them. If you are interested see the actual paper. The link to it is at the top.)

The base level activation of a chunk can be learned to reflect the past history of use of that chunk. This involves time since the chunk was used, lifetime of the chunk, number of references to the chunk and the decay rate. The strengths of associations are learned as well. Chunks are retrieved by productions. actr has two modes with respect to this- exact matching mode (where the chunk must match the production's requirements exactly) and partial matching mode (where any chunk of the correct type can be retrieved, but any differences from the requirements receives a penalty to the chunk activation.) (p8) When more than one chunk matches, the one that gets used is determined by the Boltzmann equation. (p9) More info on actr is available at http://act.psy.cmu.edu/.

Model

Facts are represented like this:


Fact-2+3=5
 isa arithmetic
 first 2
 operator +
 second 3
 result 5

Each value (2, +, 5, etc.) is a chunk in itself. The simplist production is retrieval, which just retrieves the appropriate fact (p11). Where do these facts come from? They can be seen (in a book, on a calculator, etc) or they can be calculated previously-- in actr popped goals become new chunks. So when you have to count to figure out what 2+2 is, you thereafter have a chunk in memory corresponding to the fact. You may not be able to retrieve it right away, but it is there.

There is an increment production too. After the answer is found, an answer production outputs the answer and pops the goal. (p12) Popped goals either become new chunks or reinforce old chunks (if the chunk is already there.) After this happens enough times, the chunk will be retrievable. This implements Logan's (1988) transition from algorithmic solution to direct retrieval.

Because of partial matching, incorrect chunks can be retrieved. This allows a modeling of the error rate. (p17) Errors can also happen on misretrieved chunks during iteration.

Data

Problems with an argument of zero show no problem-size effect. To handle this there is a special zero production. Smaller problems are easier to retrieve because they are better practiced. The ratio of frequencies from the smallest to largest is 1 (0+0) to 4 (9+9). That is, for every time you see 9+9, you see 0+0 four times.(p20) (Hamman and Ashcraft 1986). This simulation assumes that five hundred thousand problems were presented according to this frequency at about 100 problems per day.

Tie problems naturally are faster in actr because the same number appering twice makes the activation strong. (p21) This model can very precisely predict the accuracy data of problems of size 0-5 for young children using only retrieval. (p25) No statistics were done.

Multiplication is modeled too. Problems with 5 in them are easier because the second number is always 0 or 5. One effect it doesn't correctly model is the aid of multiplying even numbers. (p28) No statistics were done.

Chapter 3: Learning Over Time (p29)

Ashcraft (1987) found that as grade level increaced, RT went down and there was a flattening of the problem size effect. (p30) This is parsimoniously described by increase in base activation. In modeling this, retrieval only was used. This is probably the reason it overestimates the improvemtent at large problems.

Another way an error can be made is by retrieving an incorrect fact. Not the wrong fact, but an incorrect one (e.g. 2+1=4). This can happen because chunks of popped goals are remembered, and sometimes the goal pops with an error in it. Other models of arithmetic:

Ashcraft 1987: Increaces the network strength values for correct associations (percentage of correct retrievals) yearly. It predicts an exponential decrease in error over time.
Siegler & Shrager (1984): uses a reinforcement rule. increments associations for correct answers twice as much as wrong ones. Probability is linear until celing is reached.

In contrast, the actr model predicts a power law decrease in error.

Chapter 4: The Lifetime Simulation (p44)

What is wrong with the way these sims have been done?

requires additional assumptions about the state of knowledge at any given time
allows for parameter tweaking for every model.
incomplete understanding of how the parts fit together.

In order to remedy the situation, a lifetime simulation was made with one set of parameters, and extensive learning. During the sim, the sub-symbolic parameter learning is off.

Here are the productions used in retrieval:

arithmetic-retrieval
done-arithmetic (outputting answer)
first-plus-zero (for x + 0)
zero-plus-second (for 0 + x)
double-recoding (for ties: if a tie then recode as x op double)

And for computing:

addition-counting
done-count
iterate-count (sets subgoal to increment)
count-up (does the actual counting)
double-counting

Students are exposed to about 2000 addition and 2000 multiplication problems per year. This means an average inter-problem delay of 2 hours. The sim runs through 20 years of training. (p49) Took actr about an hour and a half to run.

There were some problems modeling ties.. To overcome this ties were encoded differently. There is some evidence that ties are more deeply processed than other facts. (p62)

Problems like 6+7 are likely to get confused with 7+6. This results in the right answer, a beneficial effect. But when the numbers are very dissimilar, like 2+9, this effect goes away because 2 is so distant from 9 that the mismatch is so large it can't be overcome. The model is set up so that close numbers are more similar than distant ones. The result is that the model predicts that near tie problems will be less error prone. As you go away from the diagnal the errors should increase. This is not shown in the subject data.

HEre's another model problem: 1+8=? retrieves 1+7=8 becase of the common 8. This happens a lot in the corners because of the high mismatch. This effect is not bourne out in the data. The solution is change the reinforcement so that a chunk in memory is only reinforced in terms of the filled slots at the goal chunk's creation-- so 1+7=8 pops and only reinforces 1+7, but not 8, since when the goal was created it was blank.

Another problem (and solution): When counting, chunks get reinforced that probably shouldn't. (p72) For example, using counting to solve 4+3, you must activate (0 next is 1) and (4 next is 5) to increment the count and the result respectively. All of these get reinforced. To deal with the problem, the count happens through subgoaling. This way, the reinforcement does not happen with respect to the main goal, but with the subgoal, thus avoiding unwanted reinforcement.

SUMMARY is UNFINISHED

Summary author's notes:

put in notes of yours about this paper here

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies ( jim@jimdavies.org )