TEXT CLASSIFICATION WITH SUPPORT VECTOR MACHINES: METHODS AND THEORY
Thorsten Joachims, Knowledge Discovery Group, National German Research Center for Computer Science (GMD), Bonn, Germany
Abstract: Supervised learning of text classifiers, widely recognized as a key method for organizing online information, has recently seen a lot of attention. By now, seemingly every machine learning algorithm ever invented has been applied to this learning problem - many with success on some tasks. What we now have is a wealth of empirical results, but not much of a theory. What we want are robust learning methods that do not require expert parameter tuning, that offer good classification performance, and that come with a theory operational enough to guide real world applications. I will show in this talk that a maximum-margin approach to text classification using Support Vector Machines (SVMs) offers all this.
One key advantage of Support Vector Machines is that they are accessible for theoretical analysis. I will show how separation margin - the central complexity measure in Support Vector Machines - relates to the statistical properties of text. The analysis shows when and why SVMs can learn accurate text classifiers from relatively little training data despite the high-dimensional feature space. The result gives an understanding of which text classification tasks SVMs are most appropriate for.
The insight about the importance of margins for text inspired a transductive approach to text classification using Transductive Support Vector Machines (TSVMs). While regular Support Vector Machines try to induce a general decision function for a learning task, Transductive Support Vector Machines take into account a particular test set and try to minimize misclassifications of just those particular examples. This setting, which is very natural for information retrieval, allows the TSVM to exploit margin information from the test examples. Experiments show substantial improvements over inductive methods on many text classification tasks. I will connect these findings to leave-one-out estimation. I will also show how the basic co-training algorithm of Mitchell and Blum can be seen as a special case of Transductive Support Vector Machines.
This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 2 October 2000.