An evaluation of SIRRINE2 as a cognitive architecture based on a model of human arithmetic

This work was presented in a poster session in the 1999 Cognitive Science conference.

An evaluation of SIRRINE2
as a cognitive architecture
based on a model of human arithmetic

By Jim R. Davies
Winter 1999

The web page for this research can be found at:
http://www.jimdavies.org/research/despina/

Contents:

Part 1: Introduction

SIRRINE2 is an Artificial Intelligence (AI) architecture written by J. William Murdock. Its original purpose is to be a tool for describing, executing, and manipulating intelligent software agents. It is based on a "Task, Method, and Knowledge" structure (TMK), which is cognitively inspired. This paper investigates if SIRRINE2 or some later version of it could be proposed as a cognitive architecture. To give a framework for discussion I have created a simple cognitive model of arithmetic (called Despina) in SIRRINE2.

Part 2: SIRRINE2

SIRRINE2 evolved from the reasoning shell of AUTOGNOSTIC (Stroulia 1994, Stroulia & Goel 1994). It has a semi-formal TMK language which gives an agent explicit knowledge about what it can do and how. The tasks specify what kind of information gets input and output. The methods describe the control of subtasks, and the knowledge is the data that gets manipulated.

It was originally made to model high-level reflective thinking where the smallest operations take a few seconds.

Part 3: Cognitive architectures

A cognitive architecture is a high-level modeling language that makes claims about basic cognitive processes. Models written in an architecture are built using the primitive elements of cognition allowed by the architecture. For example, in ACT-R, a cognitive architecture from John Anderson's lab (1998), there are two kinds of memory: Procedural (consisting of productions) and Declarative (consisting of chunks of information).

To evaluate the quality of a model, one must compare the results of the model to human subject data. Cognitive psychology focuses on performance (often accuracy) and reaction time, so it is important that a cognitive architecture is able to make predictions of these sorts.

Part 4: Arithmetic

The small part of arithmetic under study is addition. There have been several models suggested for what people do when adding. In the MIN model (Groen and Parkman 1972) people count up from the larger number. The curve for reaction time (RT) can be fit to the increase of the minimum number in the problem (Ashcraft & Stazyk 1981). The SUM-squared (Ashcraft & Battaglia 1978) model was shown to be better than MIN, because an exponentially increasing RT is difficult to reconcile with an increment model. In adults, most facts are retrieved, which suggested the Fast-access model (Groen and Parkman 1972): the facts not retrieved (about 5%) need to be counted at a 400ms/increment rate. The exponential increase in RT predicted by the SUM-squared is problematic for this model.

A Four-stage retrieval/decision model (Ashcraft & Battaglia 1978) has also been suggested. Facts are functionally represented as a table, and the RT is proportional to distance traveled during the search. Table is "stretched" for larger sums (post-hoc) to account for exponential time fitting. Next a decision is made (comparing the result in the table to the stimulus in a verification task). In this model the decision takes constant time for positives, but for negatives time proportional to the difference between correct and incorrect. This accounts for the split effect, which is that people can verify that a fact is false faster the further off the answer is. In fact, large splits are faster than true facts in some cases. That is, 234+321=4 is identified as false faster than 234+321=555 is identified as true.

In answer production tasks, the decision part does not take place. Revised fast-access Accounts for the previous results by making the following modification: Retrieval failure probability is a function of size of min. With this change one can account for the exponential increase.

Part 5: Despina

Depina has two methods for answering math questions: retrieval and counting. If retrieval is impossible because the fact in question is not in memory, then the systems uses the counting method. Despina has a large memory of all addition and subtraction facts involving numbers one through ten, the result of which is positive. Following is an example of despina's math facts:

(defconcept-instance math-fact_7+5=12-concept-instance
    :domain-concept math-fact-domain-concept
    :symbol fact_7+5=12-value)
(setf fact_7+5=12-value '(seven plus five twelve))

The lower number is always second, and the reverse is not represented (5+7=12 is not in the memory, only 7+5=12). If asked what the sum of 7 and 5 is, Despina checks its fact memory. There is a table indexed by the numbers and operator, pointing to the fact if it is memory. If it is there, it is retrieved and output. If not, then the system counts up from the smaller number using the following count-up-method.

(defmmethod count-up-method
  :transitions
  (deftransition
;    normally you would find the min here and make
;    that the start sum. For now we will assume that
;    the minimum number is given second. So set-sum
;    sets the sum to the first argument.
    (:initial :start
	      :subtask set-sum-task
	      :succeed s1
	      :fail )
;    how many so far starts at zero.
    (:initial s1
	      :subtask set-how-many-so-far-task
	      :succeed s2
	      :fail )
;    set-count-up-to sets count-up-to to the second arg
    (:initial s2
	      :subtask set-count-up-to-task
	      :succeed s3
	      :fail )
;    when they are equal, the sum is correct 
    (:initial s3
	      :subtask confirm-how-many-so-far-equals-count-up-to-task
	      :succeed :succeed
	      :fail s4)
;    if the sum is not correct, then increment one to the sum. 
    (:initial s4
	      :subtask increment-sum-task
	      :succeed s5
	      :fail :fail)
;    Also increment how-many-so-far.
;    Then check to see if the sum is correct again.
    (:initial s5
	      :subtask increment-how-many-so-far-task
	      :succeed s3
	      :fail :fail)))

It starts with the smaller number as the sum, then keeps incrementing until the number of times incremented equals the number being added. When these numbers are equal, the method succeeded.

Part 6: Discussion

Despina can attempt to retrieve an answer to a math question and return the answer if it is found. If that method fails, it can try its second strategy, counting up. Could Despina be used to model experimental data?

Modeling Accuracy

In arithmetic experiments, accuracy and reaction time are measured. Right now Despina never makes any mistakes, so it is unable to replicate the mistakes humans make. It either retrieves the answer from memory or it counts up to get it, but in either case the correct answer is guaranteed. Lebiere's ACT-R model (1998) makes mistakes by retrieving the wrong chunk at some point. This happens because of partial matching and differences in activation. If 3 + 1 is retrieved much more often than 3 + 2, then when asked what 3 + 2 is, the 3 + 1 fact may be retrieved, because the correct fact is insufficiently active and the incorrect fact matches the goal partially. The other way the model could make a mistake is that it could retrieve the wrong next number at some point while counting up.

In the current version of Sirrine there is no way to model this. In Despina the table used for finding the appropriate chunk in memory is deterministic in that the same question will always result in the same fact being retrieved, if any fact is retrieved at all. There is a similar deterministic lookup table for incrementing numbers too.

SIRRINE2 could be modified to allow this kind of error. The most straightforward and uncontroversial way to do this would be to introduce an activation level for concept-instances or values, and to allow spreading activation to occur. In the underlying architecture there could be a retrieval threshold set for how activated a concept-instance must be. This might prevent the correct fact from being retrieved at times. So doing this would have an effect on which strategy gets used, and might have results where no answer is given, but would never result a wrong answer.

To get a wrong answer a question must have the potential to retrieve different concept-instances at different times. ACT-R's method of partial matching seems to be a straightforward solution to the problem. It seems that partial matching of some kind would be necessary, even if it were implicit in some kind of retrieval by activation level. For example, the question is activated in the task, and that spreads activation to facts, and the fact most activated gets retrieved. Even in this scenario a fact wouldn't become the most activated unless it shared features of the question. This retrieval method could allow errors in retrieval during counting as well.

Another way to get around the problem might be to have the table refer to a list of things to return in the order of priority. In this way the retrieval attempts might resemble the :by slot in a task. For example, upon seeing the problem 1 + 2, the table would try to return 1 + 2 = 3 first, and failing that return 1 + 3 = 4. This would be consistent with the way that a task has an unchangable order in which it tries different strategies. On the downside, it is unclear what the rationale would be for which concept-instances would go in the list and what the order of the items would be.

Modeling Reaction Time

SIRRINE2 makes no explicit reaction time predictions. Different strategies take different numbers of steps, though, and the number of steps could be claimed to be proportional somehow to the amount of time taken. For example, one could say that the entire cycle required to count up by one takes somewhere between 20 and 400ms. Since people often choose the biggest number and count up, such an interpretation would result in a model that fairly accurately fits the data (recall that the RT curve can be fairly well modeled by the minimum of the two numbers).

However it also seems to be the case that simple retrieval takes a variable amount of time as well (Lebiere 1998). That is, you can retrieve facts 1 + 1 = 2 and 5 + 2 = 7, but the latter will take longer than the former. ACT-R solves this problem by making the retrieval time for a chunk dependent on the activation level.

Conclusion

SIRRINE2 was designed to model high-level thinking on the order of seconds or longer. Arithmetic was chosen because it was a fairly simple phenomenon with interesting properties (retrieval, multiple strategies, etc.) and because it had already been modeled in ACT-R, which allows comparison. Arithmetic may not be a fair test because SIRRINE2 was not designed to model cognition at this level; we are asking SIRRINE2 to do something it was not meant to. Soar and ACT-R focus on theoretical claims of mental events that take a second or less (Newell 1990, pp 80-1), leaving higher level mental events for the modeler to program. SIRRINE2, on the other hand, can be looked at as a architecture that is approaching cognition from the other direction. That is, SIRRINE2 makes claims about high level cognition, leaving the low level details ambiguous. Unlike other architectures, though, SIRRINE2 does not allow the modeler to specify the details of the lower level cognition. The quality of SIRRINE2 as a cognitive architecture should be determined through an examination of the kinds of models it was meant to run.

Modeling arithmetic shows the limits of SIRRINE2. But if we intend for it to be a cognitive architecture then tasks of this sort will need to be dealt with sooner or later. Perhaps if high and low level cognitive tasks are considered from the start the architecture will have a better chance or avoiding assumptions early on that will show themselves to be unworkable when the range of phenomena to be modeled expands. In other words, we want to avoid making theoretical claims now which will prevent SIRRINE2 from modeling low level tasks later.

This article shows that in order for SIRRINE2 to be capable of successfully modeling arithmetic (and likely other tasks as well), it requires the following: 1) It needs some way to make retrieval errors, and 2) it needs to make theoretical claims about the time it takes to do things. When these changes are made then SIRRINE2 will be a better cognitive architecture, capable of modeling accuracy and reaction time, two of the most common measures in cognitive psychology.

Part 7: References

Anderson, J. & C. Lebiere (1998). The Atomic Components of Thought. Lawrence Erlbaum: Mahwah, NJ.

Ashcraft, M. H., & Battaglia, J. (1978). Cognitive arithmetic: Evidence for retrieval and decision processes in mental addition. Journal of Experimental Psychology: Human Learning and Memory, 4, 527-538.

Ashcraft, M. H. & E. H. Stazyk (1981). Mental addition: A test of three verification models. Memory & Cognition. v9 pp 185-196

Groen, G. J., & Parkman, J. M. (1972). A chronometric analysis of simple addition. Psychological Revies, 79, 329-343.

Lebiere, C. (1998). The dynamics of cognition: An ACT-R model of cognitive arithmetic. Ph.D. Dissertation. CMU Computer Science Dept Technical Report CMU-CS-98-186. Pittsburgh, PA.

Newell, A. (1990) Unified Theories of Cognition. Harvard University Press. Cambridge, Massachusetts.

Stroulia, E. (1994). Failure-Driven LEarning as a Model-Based Self Redesign. Ph.D. dissertation, Georgia Institute of Technology, College of Computing.

Stroulia, E. & Goel, A. K. (1994). Reflective self-adaptive problem solvers. In G. S. Luc Steels & W. V. de Velde (Eds.), Proceedings of the 1994 European Conference on Knowledge Acquisition: A Future for Knowledge Acquisition. Germany: Springer-Verlag.

JimDavies ( jim@jimdavies.org )

Last modified: Mon Apr 24 14:22:14 EDT 2000