Royal Holloway logo with departmental theme Royal Holloway, University of London

A PAC-BAYESIAN BOUND FOR LINEAR CLASSIFIERS
Dr R Herbrich, Department of Computer Science, Technical University Berlin, Berlin, Germany

Abstract: We present a new bound on the generalisation error of linear classifiers which is by magnitudes tighter than previous margin bounds and gives non-trivial results in parameter regimes relevant to real world problems. The new bound is inherently connected to the luckiness approach of Shawe-Taylor et.al. (1996) by using a sample-based complexity measure: the volume of a subset of version space relative to the volume of parameter space. For linear classifiers such a quantity is given by the margin which characterises the volume of the largest ball that fits into version space. The improvement over previous bounds can be explained by a fundamentally different reasoning. While standard VC theory treats each classifier separately and aims at bounding its generalisation error, the PAC-Bayesian framework (McAllester, 1998) considers subsets of the model space and bounds the average generalisation error over the subset. These bounds involve the Bayesian prior probability of the subset of classifiers. In constrast to standard Bayesian performance guarantees the PAC-Bayesian results do not assume a correctly specified prior. In order to apply these bounds to single classifiers it is necessary to identify a subvolume of version space such that its average generalisation error is strictly related to the generalisation error of the single classifier. We show that the generalisation error of a linear classifier is bounded from above by twice the average generalisation error of the subset of all classifiers contained in the ball around it. This relation makes it possible to apply the PAC-Bayesian result to linear classifiers and leads to the new bound. In a simple yet suggestive experiment we illustrate that our bound is able to capture the shape of the learning curve. Furthermore it is argued that bound based model selection makes an implicit distributional assumption about the input space.

This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 14 December, 1999.

back


Last updated Mon, 15-Dec-2008 14:51 GMT / PS
Department of Computer Science, University of London, Egham, Surrey TW20 0EX
Tel/Fax : +44 (0)1784 443421 /439786
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@