GENERATIVE SEMANTIC KERNELS FOR DOCUMENT CLASSIFICATION AND RETRIEVAL
Alexei Vinokourov, School of Communication and Information Technologies, University of Paisley
Abstract: In this work I present a theoretical framework for the derivation of an appropriate similarity metric (kernel) which can be employed for information retrieval. The derived metric emerges from a generative model of the word generation process underlying a set of documents. The parameters of the generative model are estimated from the data and as such provides an adaptive method to capture the changing interests or tastes of a user. As the model can be adapted to changing tastes so then does the emerging similarity metric and the measure of similarity reflects the generative model. I will talk about generative models based on singular value decomposition, non-negative matrix factorisation and a hierarchic probabilistic model. Results from information retrieval and document Support Vector Machine classification experiments highlights the power of this particular approach to providing an appropriate similarity measure, which can be adapted naturally based on changing user requirements.
This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 29 May 2001.