Learnable Representations of Language

A course at ESSLLI 2010, Copenhagen, August 2010


Alexander Clark, Royal Holloway, University of London.

Course Overview

This course addresses the learnability of language; we will discuss the theory of grammatical inference as it relates to learnability issues in language acquisition. None of the levels of the Chomsky hierarchy are learnable for reasons of computational complexity. However, we now understand that this does not rule out the existence of representations that are learnable. We will discuss a hierarchy of models that are based on observationally defined representations. The first model will be deterministic finite state automata, whose canonical representation is learnable through the Myhill-Nerode theorem. Associated learning algorithms include Angluin's reversible automata, and stochastic models such as Clark and Thollard (2004). The second model consists of context free grammars, where the non-terminals correspond to congruence classes of the language; this gives rise to the substitutable languages and the learning algorithms for NTS languages considered in Clark and Eyraud (2007). The final class is a richer context sensitive class based on the theory of residuated lattices. These are perhaps powerful enough for natural language syntax; preliminary version of this was published as Clark (2009) at Formal Grammar.

Slides and Materials

Course material -- this is a bundle of a few papers on material relevant to the course. here.

  1. Monday: Lecture 1 slides here.
  2. Tuesday: Lecture 2 slides here.
  3. Wednesday: Lecture 3 slides here. Some relevant papers: Learning substitutable languages. MAT learning of congruential CFGs.. PAC learning NTS languages.
  4. Thursday: Lecture 4 slides here.

Alexander Clark alexc@cs.rhul.ac.uk