CogSci Summaries home |
Lebiere, C. (1998). The dynamics of cognition: An ACT-R model of
cognitive arithmetic. Ph.D. Dissertation. CMU Computer Science Dept
Technical Report CMU-CS-98-186. Pittsburgh, PA.
(The actual paper can be found at
. It will be summarized in a published paper:
Lebiere, C. (1999). Cognitive dynamics: arithmetic as case study. To
appear in the special English-language cognitive modeling issue of
Kognitionswissenschaft [Cognitive Science].)
Author of the summary: Jim Davies, 1999, email@example.com
(Author's note: This summary focuses on the chunks and productions and
glosses over much of the subsymbolic detail.)
Cite this paper for:
Since we didn't evolve to do arithmetic, it must be learned with general
learning principles. We have no special purpose functions for arithmetic.
Generally, when getting the answer to an arithmetic fact, people either
retrieve it from memory or compute it. People take longer and make more
mistakes on problems with large numbers. (p3) This makes sense with
computation, but it happens with retrieval too. Why? It might be because
smaller problems (problems with smaller numbers) are seen more often and
thus retrieved more easily. With multiplication, as numbers get larger
they are seen less frequently in the addition subtasks you have to do. (p4)
Actr has chunks of memory and productions that coorespond to
procedural memory. Chunks can be activated, which reflects the log
posterior odds that the chunk is relevant in the current
context. Activation is the sum of the base-level activation of that
chunk plus the sum of all context elements of their attentional
weights (activation source level) times the strength of association
between the context element and the chunk.
- chunks get into memory by environmental encoding or popped goals.
- you must compute an answer enough times such that the chunk in
question is sufficiently reinforced for it to be retrieved.
- Problems with an argument of zero show no problem-size effect.(p19)
- Smaller problems are easier to retrieve because they are better
- Tie problems naturally are faster in actr because the same number
appering twice makes the activation strong. (p21)
- Errors can be made by retrieving an incorrect fact, or retrieving the
wrong fact due to partial matching.
- the model predicts incorrectly near-tie and corner effects(p64).
- actr is critisized for having too many parameters to tweak. (p74)
- Ironically, connectionist models are nothing but parameter tweaking.(p74)
- There has been a concerted effort to constrain the values of global
parameters in actr.
- Noise level has become increasingly constrained.
- Base level decay has essentially become fixed.
(Summary writer's note: There are lots of equations in this dissertation;
html is bad at them so I won't try to reproduce them. If you are interested
see the actual paper. The link to it is at the top.)
The base level activation of a chunk can be learned to reflect the
past history of use of that chunk. This involves time since the chunk
was used, lifetime of the chunk, number of references to the chunk and
the decay rate. The strengths of associations are learned as well.
Chunks are retrieved by productions. actr has two modes with respect
to this- exact matching mode (where the chunk must match the
production's requirements exactly) and partial matching mode (where
any chunk of the correct type can be retrieved, but any differences
from the requirements receives a penalty to the chunk activation.)
(p8) When more than one chunk matches, the one that gets used is
determined by the Boltzmann equation. (p9) More info on actr is
Facts are represented like this:
Each value (2, +, 5, etc.) is a chunk in itself. The simplist
production is retrieval, which just retrieves the appropriate fact
(p11). Where do these facts come from? They can be seen (in a book, on
a calculator, etc) or they can be calculated previously-- in actr
popped goals become new chunks. So when you have to count to figure
out what 2+2 is, you thereafter have a chunk in memory corresponding
to the fact. You may not be able to retrieve it right away, but it is
There is an increment production too. After the answer is found, an
answer production outputs the answer and pops the goal. (p12) Popped
goals either become new chunks or reinforce old chunks (if the chunk
is already there.) After this happens enough times, the chunk will be
retrievable. This implements Logan's (1988) transition from
algorithmic solution to direct retrieval.
Because of partial matching, incorrect chunks can be retrieved. This
allows a modeling of the error rate. (p17) Errors can also happen on
misretrieved chunks during iteration.
Problems with an argument of zero show no problem-size effect. To
handle this there is a special zero production. Smaller problems are
easier to retrieve because they are better practiced. The ratio of
frequencies from the smallest to largest is 1 (0+0) to 4 (9+9). That
is, for every time you see 9+9, you see 0+0 four times.(p20) (Hamman
and Ashcraft 1986). This simulation assumes that five hundred thousand
problems were presented according to this frequency at about 100
problems per day.
Tie problems naturally are faster in actr because the same number
appering twice makes the activation strong. (p21) This model can very
precisely predict the accuracy data of problems of size 0-5 for young
children using only retrieval. (p25) No statistics were done.
Multiplication is modeled too. Problems with 5 in them are easier because
the second number is always 0 or 5. One effect it doesn't correctly
model is the aid of multiplying even numbers. (p28) No statistics were
Ashcraft (1987) found that as grade level increaced, RT went down and
there was a flattening of the problem size effect. (p30) This is
parsimoniously described by increase in base activation. In modeling this,
retrieval only was used. This is probably the reason it overestimates
the improvemtent at large problems.
Another way an error can be made is by retrieving an incorrect fact.
Not the wrong fact, but an incorrect one (e.g. 2+1=4). This can
happen because chunks of popped goals are remembered, and sometimes
the goal pops with an error in it.
Other models of arithmetic:
In contrast, the actr model predicts a power law decrease in error.
What is wrong with the way these sims have been done?
- Ashcraft 1987: Increaces the network strength values for
correct associations (percentage of correct retrievals) yearly.
It predicts an exponential decrease in error over time.
- Siegler & Shrager (1984): uses a reinforcement rule. increments
associations for correct answers twice as much as wrong ones.
Probability is linear until celing is reached.
In order to remedy the situation, a lifetime simulation was made with
one set of parameters, and extensive learning. During the sim, the
sub-symbolic parameter learning is off.
- requires additional assumptions about the state of knowledge at
any given time
- allows for parameter tweaking for every model.
- incomplete understanding of how the parts fit together.
Here are the productions used in retrieval:
And for computing:
- done-arithmetic (outputting answer)
- first-plus-zero (for x + 0)
- zero-plus-second (for 0 + x)
- double-recoding (for ties: if a tie then recode as x op double)
Students are exposed to about 2000 addition and 2000 multiplication
problems per year. This means an average inter-problem delay of 2 hours.
The sim runs through 20 years of training. (p49) Took actr about an
hour and a half to run.
- iterate-count (sets subgoal to increment)
- count-up (does the actual counting)
There were some problems modeling ties.. To overcome this ties were
encoded differently. There is some evidence that ties are more deeply
processed than other facts. (p62)
Problems like 6+7 are likely to get confused with 7+6. This results in
the right answer, a beneficial effect. But when the numbers are very
dissimilar, like 2+9, this effect goes away because 2 is so distant
from 9 that the mismatch is so large it can't be overcome.
The model is set up so that close numbers are more similar than
is that the model predicts that near tie problems will be less error
prone. As you go away from the diagnal the errors should
increase. This is not shown in the subject data.
HEre's another model problem: 1+8=? retrieves
1+7=8 becase of the common 8. This happens a lot in the corners
because of the high mismatch. This effect is not bourne out in
the data. The solution is change the reinforcement so that a
chunk in memory is only reinforced in terms of the filled slots
at the goal chunk's creation-- so 1+7=8 pops and only reinforces
1+7, but not 8, since when the goal was created it was blank.
Another problem (and solution): When counting, chunks get
reinforced that probably shouldn't. (p72) For example, using
counting to solve 4+3, you must activate (0 next is 1) and
(4 next is 5) to increment the count and the result respectively.
All of these get reinforced. To deal with the problem, the count
happens through subgoaling. This way, the reinforcement does not
happen with respect to the main goal, but with the subgoal, thus
avoiding unwanted reinforcement.
actr is critisized for having too many parameters to tweak.
Ironically, connectionist models are nothing but parameter tweaking.
The optimal noise level (fastest retrieval with best accuracy) is .25,
which is kind of standard in actr. (p79)
SUMMARY is UNFINISHED
Summary author's notes:
- put in notes of yours about this paper here
Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster: