Reader in Artficial Intelligence at the Department of Computer Science in Royal Holloway, University of London.
This project started in Apri 2011, and ran for four years. CompLACS stands for "Composing Learning for Artificial Cognitive Systems " : the aim is to construct multi-modular cognitive systems from interacting learning components. We want to make a significant step towards inventing artificial intelligence, in the form of multiple interacting learning systems. We seek to create new models of learning and representing multiple skills, and of exploration for learning multiple skills. As part of this, we also seek new approaches to abstracting higher-level representations in which it is easy for an agent to plan.
I have no recent publications in reinforcement learning – my only qualification to do research in it is that many years ago, as a Cambridge graduate student, I was one of the people who invented the current theory. The story is this...
A central problem of artificial intelligence is to understand how animals and people learn from experience to improve their skills, to achieve new goals in new ways, and to form new and more insightful representations of the world and of their own abilities. These problems fascinated me as a graduate student, and they still do now. It is remarkable how little progress we have made on these basic questions after so much research.
As a graduate student in the early 1980s, I was free to visit a local primary school, and take children out of the class and set them problems of my choosing, and video-tape them as they tried to solve them. I started by repeating some of Piaget's experiments. I found that his observations were superb, although of course his theories were incoherent. I added plenty of problems that I devised myself. What I found extraordinary is children's flexibility in learning – faced with nearly any simple concrete problem, after a few attempts they solve it better than they did at first. How is it that children's performance gets better, not worse? How do they have such a general and wide-ranging ability to learn? How could such general abilities be modelled or replicated in computers? I still don't know.
Children seemed too complicated, so I started to read about animal learning, a topic that I had shunned as an undergraduate because it had seemed dull. It struck me that animal learning psychologists were modelling experiments rather than animals: there was no reason for an animal to know when it was in an experiment, and an animal's experience could be thought of as a continuous stream of sensations, actions, and good and bad events. Clearly, animals would need to learn to take some unpleasant actions to obtain delayed rewards later, and the theories of animal learning that were written in the books did not allow for this. I found that behavioural ecologists were modelling foraging strategies and animal life-cycles using dynamic programming to determine optimal strategies – but they were not much considering how animals could learn.
In computer science, there was the well known actor-critic control algorithm of Sutton, Barto, and Anderson, and the temporal difference method of Sutton. These were ingenious: was there a general explanation of why they worked?
It occurred to me to consider reinforcement learning as a form of incremental dynamic programming. My old, old 1989 PhD thesis "Learning from Delayed Rewards", introduced a model of reinforcement learning ( learning from rewards and punishments ) as incrementally optimising control of a Markov Decision Process (MDP), and proposed a new algorithm – which was dubbed "Q-learning" – that could in principle learn optimal control directly without modelling the transition probabilities or expected rewards of the MDP. The first rigourous proof of convergence was in Watkins and Dayan (1992). These innovations helped to stimulate much subsequent research in reinforcement learning. The notion that animals, or 'learning agents' inhabit a MDP, or a POMDP, and that learning consists of finding an optimal policy, has been dominant in reinforcement learning research since, and perhaps these basic assumptions have not been sufficiently examined.
From 1990 to 1995, I worked for hedge funds in London, so I stopped doing active research in reinforcement learning. In 1995 I visited AT&T Bell Labs at Holmdel for a year, in the Adaptive Systems Group, that was led by Larry Jackel, and then by Yann Le Cun. Vladimir Vapnik was in this immensely productive group, and I became interested in kernel methods, which were just becoming popular.
Since that time I have watched RL research closely. I have long felt the standard model of RL is deceptively attractive, but limited. Part of the aim of the CompLACS project is to find more capable models of learning.
An early paper on kernel methods in which I assisted is Weston and Watkins (1999), which presents Jason Weston's method of formulating the SV optimisation to distinguish multiple classes.
After that I became interested in whether kernel methods could be applied to non-vectorial objects, such as strings of different lengths. Watkins (1999) introduced a class of kernels that can be applied to strings. These kernels were used in Lodhi et al (2002), which was an early paper that presented positive results using string kernels. This work helped to stimulate considerable subsequent research on kernels of this and similar types.
My colleague Alex Clark then suggested that string kernels might be used to define "planar languages", which are languages in which all sentences lie on a hyperplane in a feature space induced by a string kernel. The paper Clark et al (2006) won the innovative contribution award at ECML that year. We gave examples of simple languages that could be described as hyperplanes in a kernel-induced feature space, and which therefore could be learned efficiently using kernel methods. Further results are presented in Clark et al (2010).
How complex can organisms in principle become? How can we define the complexity of an organism? Are certain types of genetic codes (codes in a general sense – not just triplet codes) more evolutionarily efficient than others? These are rather basic questions in theoretical biology, and a new approach to answering them is outlined in a preliminary paper Selective Breeding Analysed as a Communication Channel, Watkins (2008).
The trick is to define communication channel(s) based on selective breeding: the capacities of suitably chosen channels can be calculated, or estimated from simulations, and these provide strong upper bounds on possible organismal complexity. This approach can be extended to consider the relative efficiencies of different schemes of genetic encoding, and to provide insight into the adaptive complexities of communities of organisms.
I collaborated with Prof Vincent Jansen in in the Biology Department at Royal Holloway in a theoretical project on the spread of epidemics. The aim was to study how an epidemic might interact with the spread of information about the epidemic, which itself would alter people's behaviour. One possible model of this is presented in Funk et al (2009). Although perhaps elegant, this model is theoretical. As I read about epidemiology during this project, I wondered if it would be possible to do something practical that would help to defend against a sudden epidemic of a serious infectious disease, such as SARS or a new and virulent strain of flu.
During the last two decades, there has been a revolution in communications technology, in the form of mobile phones and a huge variety of uses of the Internet. These technologies can enable people to reduce social contact, and they could also be used to provide real-time monitoring of an epidemic. We leave digital footprints everywhere we go: during a severe epidemic, these footprints could in principle be used to track contacts, to study the disease in real time, and to provide real-time personalised warnings of infection risk – as well as the more obvious uses in enabling people to temporarily isolate themselves from infection if they chose to do so. I am currently seeking to develop research into these possibilities, which so far seem largely unexplored by the public health research community.
I have recently become interested in the visualisation of transcriptome data, in collaboration with James Briscoe and Natascha Bushati at the National Institute of Medical Research. We have developed a new approach to presenting a low-dimensional scatterplot of high-dimensional data, which seems to change the game in this old problem. A paper describing this has been submitted and is under review.
In 1996, Robert Macrae (now at the Systemic Risk Centre at LSE) and I considered the problem of estimating the price volatility of investment portfolios, and of using these estimates in portfolio optimisation (Macrae and Watkins 1996). To our surprise, we found that standard investment textbooks recommended optimising portfolios using entirely unsuitable risk estimators. Indeed, the textbook methods of portfolio optimisation – which were also recommended by major investment banks – could recommend meaningless portfolios and serious overtrading.
The standard method was to use the empirical covariance matrix of price changes as the covariance estimate for optimisation of a portfolio. The problem with this is that portfolio optimisation requires an estimate of the inverse of the covariance matrix, which is also known as the precision matrix — but simply inverting the empirical covariance matrix gives a terrible estimate of the precision matrix. We recommended simple and practical ways of avoiding this error.
Our papers were never published, although a short summary of our results did appear in Macrae and Watkins (1999). With 15 years of hindsight, it appears that we were among the first people to write about this problem, and elements of our research turned out to apply to the problems of both LTCM and the 2008 financial crisis.
Languages as Hyperplanes: Grammatical Inference with String Kernels
A. Clark, C. Costa Florencio, C. Watkins
Machine Learning (2010)
Published online http://dx.doi.org/10.1007/s10994-010-5218-3
The Spread of Awareness and its Impact on Epidemic Outbreaks
S. Funk, E. Gilad, C. Watkins, and V. Jansen
Proceedings of the National Academy of Sciences 106 (2009)
Selective Breeding Analysed as a Communication Channel
C.Watkins, SYNASC 2008 (Timisoara)
Languages as Hyperplanes: grammatical inference with string kernels
A. Clark, C. Costa Florencio, and C. Watkins
European Conference on Machine Learning 2006 (Berlin)
This paper won the ECML Innovative Contribution Award
Text Classification using String Kernels
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C. Watkins
Journal of Machine Learning Research 2 2002 (Feb) : pp419-444
Dynamic Alignment Kernels
In A.J. Smola, P. Bartlett, B. Scholkopf, and D Schuurmans (eds)
Advances in Large Margin Classifiers, MIT Press 1999
Multi-Class Support Vector Machines
J. Weston and C. Watkins,
6th European Symposium on Artficial Neural Networks (ESANN 99)
Safe Portfolio Optimisation
Robert Macrae and Chris Watkins
IX Symposium on Applied Stochastic Models and Data Analysis (ASMDA), Lisbon, 1999
A Disaster Waiting to Happen [pdf]
Robert Macrae and Chris Watkins,
a working document from 1996, unpublished.
Q Learning: Technical Note
C. Watkins and P. Dayan,
Machine Learning, 8, 279-292, (1992), Boston MA: Kluwer Academic Publishers
Sequential decision problems and neural networks
A.G. Barto, R.S. Sutton and C. Watkins
Department of Computer Science,
Royal Holloway University of London
Surrey TW20 0EX
tel.: (44) 01784 443419
fax: (44) 01784 439786