Royal Holloway logo with departmental theme Royal Holloway, University of London

A notion of stability for statistical clustering with applications to model selection

Thursday 7 July 2005, 10.30 am, Room 325, McCrea Building
Shai Ben-David, University of Waterloo, Ontario

The goal of this talk is to offer theoretical analysis of some statistical aspects of clustering and, in particular, of some model selection considerations. We work in a framework where the data to be clustered has been sampled from some unknown probability distribution, and the aim is to gain insight into the structure of that distribution. We address the question of how to verify that a sample-based clustering reveals some structure of the underlying data rather than modeling artifacts due to the random sampling process.

We develop a formal notion of stability for sample based clustering, measuring a necessary requirement for the 'meaningfulness' of a clustering, and prove that stability can be reliably estimated from samples. We go on to show that this notion reflects a fit, or alignment, between a clustering algorithm and a given date distribution, and thus can serve as model selection tool.

 


Last updated Tue, 16-Dec-2008 11:21 GMT / PS
Department of Computer Science, University of London, Egham, Surrey TW20 0EX
Tel/Fax : +44 (0)1784 443421 /439786
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@