Thomaz & Breazeal 2008: Teachable robots: Understanding human teaching behavior to build more effective robot learners

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Thomaz, A. L., & Breazeal, C. (2008). Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, 172, (6-7), 716-737.


@Article{ThomazBreazeal2008,
  author = 	 {Thomaz, Andrea L., and Breazeal, Cynthia},
  title = 	 {Teachable robots: Understanding human teaching behaviour to build more effective robot learners},
  journal = 	 {Artificial Intelligence},
  year = 	 {2008},
  volume = 	 {172},
  number =	 {6-7},
  pages = 	 {716-737}
}

Author of the summary: Leah MacQuarrie, 2012, leah.macquarrie@gmail.com

Cite this paper for:

Users giving pre-emptive guidance to robots can speed up the time it takes the robot to learn a task. [726]

Equipping a robot with transparency devices (for example, gaze) allows users to form a model of its mental state, and thus guide the robot more efficiently. [726]

Teaching a robot to undo an undesired action (rather than just register that it was bad) leads to quicker task learning. [726]

Allowing users to motivate a robot leads the user to give a more balanced number of positive and negative rewards. [726]

Social Learner Hypothesis: Humans assume a social teaching role with robots, treating them like intentional entities (even when the robot has no obvious endearing characteristics). [718]

Robots have difficulty learning in real-time environments that are partially observable, dynamic, and continuous (Mataric, 1997; Thrun, 2002; Thrun & Mitchell, 1993). However, interacting with a human user makes these tasks even more challenging. [716]

This paper seeks to demonstrate the importance of Situated Learning in human-computer interaction: teacher input and learner output interact, and both entities adjust their behaviour to be more efficient. [719]

This paper discusses five experiments, each building off the previous experiment, and taking steps to make human-computer interaction more efficient, so that robots can be designed to be easily taught by non-expert users. [716]

Experiment 1: Users interacted with a game in which a robot named Sophie bakes a cake. Sophie can perform several actions and interact with several objects. Users cannot tell Sophie what to do, but they can give her positive rewards for good actions, and negative rewards for bad actions. The game is complete when Sophie has learned to bake a cake by herself.
Results: 17 of 18 participants succeeded in getting Sophie to bake a cake on her own. However, there were 3 main findings [722]:

Users assumed that they could guide Sophie’s actions before she performed them.
Users modified their feedback behaviour as they formed a model of Sophie’s mental state.
Users had a positive feedback bias, using much less negative feedback.

Experiment 2: The game was modified so that there were two conditions: a "no guidance" condition, in which users could only give feedback, and a "guidance condition" in which users could give both anticipatory guidance, and feedback.
Results: Participants in the guidance condition taught Sophie much more efficiently. The number of Sophie’s actions necessary to learn the task decreased by 39%. This condition helped the robot spend less time in unhelpful states, and allowed the used to see progress being made more quickly. [728]

Experiment 3: The game was modified to include two conditions, a "guidance" condition (the same as in Experiment 2), or a "gaze-guidance" condition, where the robot had longer delays allowing her to gaze at an object before performing an action on it. Longer time spent gazing at an object indicated higher levels of uncertainty in the robot. [729]
Results: The gaze condition allowed the robot to better communicate her internal state to the user. Thus, the participants in the gaze-guidance condition gave less guidance to Sophie when she was more certain of what to do, and they gave her more guidance when she was uncertain. [731]

Experiment 4: The game was modified so that participants could administer a reward directly on the robot, as opposed to on the general environment.
Results: This evened out the amount of positive and negative rewards administered by the user. [732]

Experiment 5: The game was modified so that the robot could undo an undesirable action.
Results: This helped participants feel like their negative rewards were not being ignored, and led to the robot learning faster. [733]

Robots’ ability to be interacted with by non-expert users is an important next step in Machine Learning research. [735]

Summary author's notes:

None

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)