A LITTLE THEORY, SOME INTUITION AND A LOT OF EXPERIMENTS: DEVELOPING PROBABILISTIC MODELS FOR INFORMATION RETRIEVAL
Professor Stephen Robertson, Microsoft Research Centre, Cambridge, and City University London
Abstract: The probabilistic approach to information retrieval has proved itself to be a significant and valuable way of formulating and theorizing about problems in IR. It has provided some valuable insights, concepts, and arguments, which have contributed substantially to the state of the art. In addition, it appears to provide a sound theoretical basis for IR models. However, in common with other models, it fails to answer many questions, or provides only partial answers. In many cases, indeed, it does little more than suggest answers. There are several reasons for this; one is that it simply has little to say about some of the phenomena which affect retrieval. For example, while it may be the case that some linguistic phenomena could be described in probabilistic terms, there are other aspects of language which seem much less susceptible to probabilistic modeling. Some of these questions (not all) may be answered by experimental means.
The combination of probabilistic models and experimentation can be more than the sum of its parts: theoretical modeling and experimentation can be seen to feed on and contribute to each other in the best scientific tradition. However, it sometimes seems that our approach to experimental evaluation is antithetical to theory.
In this talk, I will discuss the experiences of some years' work in this area. These experiences are mainly associated with City University London and the team which developed the Okapi experimental retrieval system, as well as with the huge international experimental programme known as TREC. I will also discuss some recent trends in probabilistic IR and the variety of ways of looking at retrieval probabilistically.
This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 11 December 2000.