Rebel Science News
7/6/2010
Understanding the Lattice
 
6/28/2010
Physics: The Problem With Motion
 
6/22/2010
Why Does Eugene Kaspersky Eat Japanese Baby Crabs and Grin?
 
6/19/2010
Why the FAA's Next Generation Air Traffic Control System Will Fail (repost)
 
6/10/2010
Phil Plait: Really Bad Astronomer
 

Motivation (under revision)

 

 

Rebel Science Home

Temporal Intelligence
Animal
Perceptual Learning
Perceptual Network

Memory

Motivation

Motor Learning

Something Different
Contact Me

 

Foreword
Anticipation
The Need for Intelligent Reinforcement
Motivation Through Habituation
  Motor Commands as Goals
  Competition Among Goals
  Appetitive and Aversive Goals
  Goal Control
The Motivational System

Note: Due to recent developments, this page is now obsolete. Stay tuned for an update..

Foreword

In this section I use the words pain and pleasure solely for their convenience as naming labels. In no way do these terms imply that Animal or any other artificial intelligence consciously feel pain or pleasure. I use pleasure to refer to appetitive stimuli and pain to aversive stimuli. There is a monumental chasm between consciously feeling pain or pleasure and reacting appetitively or aversively to certain stimuli. Reactive behavior is not consciousness, otherwise one should be ready to ascribe consciousness to thermostats--and I am sure many do.

Many of the concepts used in this section (such as effectors, command neurons, etc..) are explained in the Motor Learning section. Note that I have not yet implemented the motivation mechanism in Animal. Stay tuned for an update.

Anticipation

Motivation is the tendency of an intelligent system to behave either appetitively or aversively to certain stimuli. This tendency is due to the logic of the reinforcement mechanism. Behavior reinforcement is always anticipatory in nature, even in situations where the behaving system appears to react to a stimulus. For examples, we may spit out a bitter tasting fruit, not because we are reacting to the unpleasant taste, but because we anticipate that the unpleasantness will cease soon afterwards. Likewise, we may go to the dentist in anticipation of our toothache diminishing after treatment or because we expect that doing so will prevent more toothaches in the future. These are all examples of aversive anticipatory behavior but appetitive behavior, too, is anticipatory. For example, we decide to eat ice-cream because we anticipate that the taste will be pleasant.

To anticipate means to participate ahead of time, i.e., to take advance action based on the likelihood of future events. In other words, to anticipate is to react to events before they actually happen. To adapt, an intelligent system must be able to anticipate the likelihood of future pain and pleasure stimuli. It can do so only through experience and learning. The motivational system itself neither learns nor anticipates anything. Learning to anticipate is the job of the memory subsystem. All inputs into the motivational system come from the memory layer which is the output layer of the perceptual system.

The Animal network diagram is reproduced below for your convenience. Note: Due to recent changes and additions to the memory page, the network diagram does not yet incorporate motivation. Stay tuned for an update. 

The Need for Intelligent Reinforcement

The problem of reinforcement learning is knowing what to reinforce. Motivation cannot rely on a blind mechanism that strengthens or weakens connections based on their temporal proximity to pain or pleasure stimuli. While temporal difference reinforcement may work well enough in small systems, it becomes prohibitive in large systems. At any given moment, the motor area receives a huge number of signals from afferent pathways. Many signals repeat and a blind mechanism is likely to associate the wrong motor commands with a given reinforcement stimulus. Also, there is a huge number of pain and pleasure sensors that must be associated with specific behaviors and sensory inputs. For examples: if I hurt my toe, I will not waste time attending to my fingers; an earache is not the same as a stomachache; thirst is not the same as hunger; etc... In sum, there is a need for an intelligent and associative response mechanism to pain and pleasure.

Motivation Through Habituation

Motor Commands as Goals

An intelligent system learns through trial and error. It makes assumptions or expectations about the temporal evolution of sensory and proprioceptive signals. Assumptions are synaptic pathways that are ultimately connected to motor output. That is to say, if the assumption is satisfied, it triggers a motor command. In this light, a command can be viewed as a primitive goal waiting to be realized or activated at the earliest opportunity. In sum, every command is a goal and every goal is based on a temporal assumption.

Competition Among Goals

The pattern of signals that flow through the system is directly affected by the system's own behavior. This creates a homeostatic effect whereby the making of a temporal assumption in one place may turn out to be in conflict with one or more assumptions somewhere else. Over time, only the strongest, oldest and most reinforced goals survive. New and weaker goals have a hard time displacing older and stronger ones. This should not be surprising as we tend to behave in rehearsed ways. For example, we are motivated to speak a certain language because of repeated exposure to the language.

Appetitive and Aversive Goals

As required by the Principle of Complementarity, a goal can be either positive or negative. That is to say, a positive goal initiates an action or movement while a negative goal terminates an ongoing action. Both types of goal are appetitive in nature. That is to say, *************The former is appetitive (as in pleasure) and the latter, aversive (as in pain). Learned goals are reinforced through habituation or repetition. There is a conservation principle that works to maintain a sort of overall parity between the number of positive and negative goals. This form of natural reinforcement is strictly based on the temporal correlations discovered during normal perceptual learning. Thus motivation is not necessarily connected to a dedicated aversive and appetitive mechanism. It is a system-wide phenomenon. This is apparent in Animal's behavior since Animal, as it stands, does not yet incorporate a dedicated reinforcement mechanism. Yet it often exhibits strong, albeit shifting, goal-directed behavior.

Goal Control

Without a dedicated reinforcement mechanism, an intelligent system's preference for certain goals is dictated solely by the haphazard strengthening  of its connections as it interacts with its environment. A way must be found to selectively modify goal preferences via the application of pain and pleasure stimuli. The solution should not be in the form of a stand-alone mechanism but should take advantage of the system's own goal creation and conflict resolution processes. This is the purpose of the motivational system. 

The Motivational System

Appetitive and Aversive Control

The motivational system is designed to be simple, scalable and adaptable to every type of behavioral system, from software agents to autonomous robots. It consists of two complementary subsystems, one for pleasure and one for pain. One of the keys to understanding how the system works is to realize that pain and pleasure are not sensations in the sense normally used for sensory signals. Rather, pain and pleasure have to do with the appetitive and aversive control of actions. The system is designed with internal pain and pleasure effectors that generate internal actions called affects. The effectors are said to be internal because they are not connected to external movement mechanisms.

In both cases, the agent has command control over a single aspect of an affect, that is, it can either start or stop an affect but not both. In the case of pleasure, the control is appetitive, i.e., the agent can only develop connections that start the affect. In the case of pain, the control is aversive, i.e., the agent can only develop connections that terminate the affect. So the affects are not pain or pleasure per se. What differentiates them are the control pathways.

Control Neurons

These are special hardwired neurons that have command control over one side of an affect. There is one control neuron for each pleasure or pain effector. A control neuron fires under specific conditions. On the pleasure side, the neuron fires soon after its corresponding effector is activated. On the pain side, the neuron fires soon after its corresponding effector is deactivated. So pleasure control neurons continually deactivate their corresponding pleasure effectors while pain control neurons continually activate their pain effectors.

Complementary Control Pathways

The two subsystems look almost identical but notice that the control pathways from the perceptual system and control neurons are reversed. On the pleasure side, the perceptual system control the start command neurons that activate the pleasure effectors while the control neurons control the stop command neurons. On the pain side, the control neurons control the start command neurons that activate the pain effectors while the perceptual system control the stop command neurons.

Pain and Pleasure Receptors

Both types of receptors work exactly the same way, i.e., they inhibit the stop command neurons. Note that while the pleasure receptors block the control neurons from inhibiting pleasure activation, the pain effectors prevent the perceptual system from inhibiting pain activation. The main idea is that pain and pleasure stimuli have a direct inhibitory effect on specific control pathways.

Pain and Pleasure Effectors

The pain and pleasure effectors are no different than normal effectors except that they are not connected to external movement mechanisms. Every effector has a pre-wired duration and activation strength (spike frequency). Both types of effector work exactly the same way. As I mentioned above, what makes the generated affects different are the control pathways which determine whether the affect is aversive or appetitive. The output signals of all effectors are fed back to the perceptual system via internal proprioceptive feedback sensors.

How It Works

The primary advantage of this approach to motivation is that it does not decide which goals to reinforce and which ones to suppress. All reinforcement decisions are made by the normal temporal learning mechanism of the adaptive agent. The perceptual system quickly creates associative connections that continually seek pleasure and avoid pain. The connections become very strong over time. The system uses proprioceptive feedback and short-term memory to associate the onset and offset of various activities with pain or pleasure. Pain and pleasure stimuli causes the system to adjust the strength of its goals automatically via feedback.

Next: Motor Learning

 

©2004-2006 Louis Savain

Copy and distribute freely