Models of Behavioural Learning

Nips Workshop at Whistler on December 10th 2005

 

Organisers:

Chris Watkins, Nicolo Cesa-Bianchi, Stefan Schaal, John Shawe-Taylor

 

Workshop Programme (in pdf format)

 

Theme:

 

Reinforcement learning is a very attractive idea.  When people first encounter RL, they usually find it compellingly persuasive. But although the last 10 years of RL research have produced deep advances in theory and a range of new algorithms, practical uses are still less widespread than we would all wish.

 

The intense research effort – and considerable progress – in RL theory is partly because RL is a clean and simple abstract model of behavioural learning.

 

The aim of this workshop is to go back to first principles, in the spirit of Cartesian systematic doubt, and to question whether there may be other plausible abstract models of behavioural learning that could serve as starting points for theoretical research.

 

The Deceptive Appeal of Reinforcement Learning

 

RL is a very appealing theory, for several reasons.

 

On closer examination, however, these apparently attractions of RL may seem less convincing.

 

Subjectivity of Rewards

 

First, in animal learning, the rewards are subjective – they are computed by the animal itself. For robots this is not a problem: the designer can give the robot the capacity to compute its own rewards, or may broadcast the rewards to it.  In animals the reward system needs to be innate. Is specifying an innate reward system the most effective way for evolution to specify complex or adaptive behaviour? This seems unclear.

 

For some types of task such as foraging for scraps of food, it seems common-sensical – though unformalised – that subjective rewards  would be a natural way to specify behaviour.  For other tasks, this is not so clear: for example, it is not nearly so plausible that learning to avoid predators can be naturally modelled as reinforcement learning.

 

Much evidence from animal studies shows that learning from rewards – also known as instrumental or operant conditioning – explains only a fraction of animal behavioural learning, and even if instrumental conditioning occurs it can be over-ridden by other learning mechanisms.

 

Single Reward Signal Inadequate for Learning Complex Behaviour

 

Second, optimising a single measure of reward seems an inadequate model of learning more complex skills. An example of a natural type of behavioural learning that is difficult to cast naturally as reinforcement learning is that of learning a competence to navigate from any starting point to any final destination within some locale.  One can specify the agent’s state as including its current position and its intended destination, but this doubles the dimensionality of the state space without appearing to provide any natural simplication of the problem.

 

RL seems in some ways problematic both for very simple and for complex types of behavioural learning: are there other natural formal models that could be profitably studied?

 

Minimum Regret Formulations of Reinforcement Learning

 

The conventional formulation of RL assumes a stationary world, in which state transitions and rewards can be statistically modelled, either explicitly or implicitly.  A surprising recent development is that competitive learning approaches are feasible for bandit problems, and for other types of RL also. These approaches make very weak assumptions about the statistical properties of the world, and yet are feasible. Whether these approaches are appropriate models of animal learning is an intriguing question.