Organisers:
Chris Watkins, Nicolo Cesa-Bianchi, Stefan Schaal, John Shawe-Taylor
Workshop
Programme (in pdf format)
Theme:
Reinforcement learning
is a very attractive idea. When people
first encounter RL, they usually find it compellingly persuasive. But although
the last 10 years of RL research have produced deep advances in theory and a
range of new algorithms, practical uses are still less widespread than we would
all wish.
The intense research
effort – and considerable progress – in RL theory is partly because RL is a
clean and simple abstract model of behavioural learning.
The aim of this
workshop is to go back to first principles, in the spirit of Cartesian
systematic doubt, and to question whether there may be other plausible abstract models of behavioural learning that could
serve as starting points for theoretical research.
The Deceptive Appeal of Reinforcement Learning
RL is a very appealing
theory, for several reasons.
On closer examination,
however, these apparently attractions of RL may seem less convincing.
Subjectivity of Rewards
First, in animal
learning, the rewards are subjective
– they are computed by the animal itself. For robots this is not a problem: the
designer can give the robot the capacity to compute its own rewards, or may
broadcast the rewards to it. In animals
the reward system needs to be innate. Is specifying an innate reward system the
most effective way for evolution to specify complex or adaptive behaviour? This
seems unclear.
For some types of task
such as foraging for scraps of food, it seems common-sensical
– though unformalised – that subjective rewards would be a
natural way to specify behaviour. For
other tasks, this is not so clear: for example, it is not nearly so plausible
that learning to avoid predators can be naturally modelled as reinforcement
learning.
Much evidence from
animal studies shows that learning from rewards – also known as instrumental or
operant conditioning – explains only a fraction of animal behavioural learning,
and even if instrumental conditioning occurs it can be over-ridden by other
learning mechanisms.
Single Reward Signal Inadequate for Learning
Complex Behaviour
Second, optimising a
single measure of reward seems an inadequate model of learning more complex
skills. An example of a natural type of behavioural learning that is difficult
to cast naturally as reinforcement learning is that of learning a competence to
navigate from any starting point to any final destination within some locale. One can specify the agent’s state as
including its current position and its intended destination, but this doubles
the dimensionality of the state space without appearing to provide any natural simplication of the problem.
RL seems in some ways
problematic both for very simple and for complex types of behavioural learning:
are there other natural formal models that could be profitably studied?
Minimum Regret Formulations of Reinforcement
Learning
The conventional
formulation of RL assumes a stationary world, in which state transitions and
rewards can be statistically modelled, either explicitly or implicitly. A surprising recent development is that
competitive learning approaches are feasible for bandit problems,
and for other types of RL also. These approaches make very weak assumptions
about the statistical properties of the world, and yet are feasible. Whether
these approaches are appropriate models of animal learning is an intriguing
question.