Alex Clark's Publications

[Cla15] Alexander Clark. Learnability. In Brian MacWhinney and William O'Grady, editors, Handbook of Language Emergence, pages 377-395. John Wiley & Sons, Inc, 2015. [ bib ]
[ACZZ14] Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles. Editors introduction. In Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, editors, Algorithmic Learning Theory, volume 8776 of Lecture Notes in Computer Science, pages 1-7. Springer International Publishing, 2014. [ bib | DOI | http ]
[CY14] Alexander Clark and Ryo Yoshinaka. An algebraic approach to multiple context-free grammars. In Sergei Soloviev and Nicholas Asher, editors, Logical Aspects of Computational Linguistics, pages 57-69. Springer, 2014. [ bib | .pdf ]
[Cla14] Alexander Clark. Learning trees from strings: A strong learning algorithm for some context-free grammars. Journal of Machine Learning Research, 14:3537-3559, 2014. [ bib | .pdf ]
Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold's identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars.

[CGL13a] Alexander Clark, Gianluca Giorgolo, and Shalom Lappin. Statistical representation of grammaticality judgements: The limits of n-gram models. In Proceedings of the ACL Workshop on Cognitive Modelling and Computational Linguistics, Sofia, Bulgaria, 2013. [ bib ]
We use a set of enriched n-gram models to track grammaticality judgements for different sorts of passive sentences in English. We construct these models by specifying scoring functions to map the log probabilities (logprobs) of an n-gram model for a test set of sentences onto scores which depend on properties of the string related to the parameters of the model. We test our models on classification tasks for different kinds of passive sentences. Our experiments indicate that our n-gram models achieve high accuracy in identifying ill-formed passives in which ill-formedness depends on local relations within the n-gram frame, but they are far less successful in detecting non-local relations that produce unacceptability in other types of passive construction. We take these results to indicate some of the strengths and the limitations of word and lexical class n-gram models as candidate representations of speakers' grammatical knowledge.

[CGL13b] Alexander Clark, Gianluca Giorgolo, and Shalom Lappin. Towards a statistical model of grammaticality. In Proceedings of the 35th Annual Conference of the Cognitive Science Society, 2013. [ bib ]
The question of whether it is possible to characterise grammatical knowledge in probabilistic terms is central to determining the relationship of linguistic representation to other cognitive domains. We present a statistical model of grammaticality which maps the probabilities of a statistical model for sentences in parts of the British National Corpus (BNC) into grammaticality scores, using various functions of the parameters of the model. We test this approach with a classifier on test sets containing different levels of syntactic infelicity. With appropriate tuning, the classifiers achieve encouraging levels of accuracy. These experiments suggest that it may be possible to characterise grammaticality judgements in probabilistic terms using an enriched language model.

[CY13] Alexander Clark and Ryo Yoshinaka. Distributional learning of parallel multiple context-free grammars. Machine Learning, 2013. [ bib | DOI ]
Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms to the class of Parallel Multiple Context-Free Grammars which include two additional operations beyond the simple context-free operation of concatenation: they include both the ability to interleave or intercalate strings, together with an operation for the copying of strings as well, which allows the grammars to generate non-semilinear languages, which are outside the class of mildly context-sensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.

We present a learning algorithm for a large subclass of these grammars, that includes all regular languages but not all context-free languages. This algorithm relies on a generalisation of the notion of distribution as a function from tuples of strings to entire sentences; we define nonterminals using finite sets of these functions. Our learning algorithm uses a nonprobabilistic learning paradigm which allows for membership queries as well as positive samples; it runs in polynomial time.

[Cla13] Alexander Clark. The syntactic concept lattice another algebraic theory of the context-free languages? Journal of Logic and Computation, 2013. [ bib | DOI | Journal website | .pdf ]
The syntactic concept lattice is a residuated lattice associated with a given formal language; it arises naturally as a generalization of the syntactic monoid in the analysis of the distributional structure of the language. In this article we define the syntactic concept lattice and present its basic properties, and its relationship to the universal automaton and the syntactic congruence; we consider several different equivalent definitions, as Galois connections, as maximal factorizations and finally using universal algebra to define it as an object that has a certain universal (terminal) property in the category of complete idempotent semirings that recognize a given language, applying techniques from automata theory to the theory of context-free grammars (CFGs). We conclude that grammars that are minimal, in a certain weak sense, will always have non-terminals that correspond to elements of this lattice, and claim that the syntactic concept lattice provides a natural system of categories for representing the denotations of CFGs.

[CL13] Alexander Clark and Shalom Lappin. Complexity in language acquisition. Topics in Cognitive Science, 5(1):89-110, 2013. [ bib | DOI | http ]
Learning theory has frequently been applied to language acquisition, but discussion has largely focused on information theoretic problems—in particular on the absence of direct negative evidence. Such arguments typically neglect the probabilistic nature of cognition and learning in general. We argue first that these arguments, and analyses based on them, suffer from a major flaw: they systematically conflate the hypothesis class and the learnable concept class. As a result, they do not allow one to draw significant conclusions about the learner. Second, we claim that the real problem for language learning is the computational complexity of constructing a hypothesis from input data. Studying this problem allows for a more direct approach to the object of study—the language acquisition device—rather than the learnable class of languages, which is epiphenomenal and possibly hard to characterize. The learnability results informed by complexity studies are much more insightful. They strongly suggest that target grammars need to be objective, in the sense that the primitive elements of these grammars are based on objectively definable properties of the language itself. These considerations support the view that language acquisition proceeds primarily through data-driven learning of some form.

Keywords: Computational complexity, Language acquisition, Computational learning theory
[CY12] Alexander Clark and Ryo Yoshinaka. Beyond semilinearity: Distributional learning of parallel multiple context-free grammars. In Jeffrey Heinz, Colin de la Higuera, and Tim Oates, editors, Proceedings of the Eleventh International Conference on Grammatical Inference, volume 21 of JMLR Workshop and Conference Proceedings, pages 84-96, 2012. [ bib | .pdf ]
Semilinearity is widely held to be a linguistic invariant but, controversially, some linguistic phenomena in languages like Old Georgian and Yoruba seem to violate this constraint. In this paper we extend distributional learning to the class of parallel multiple context-free grammars, a class which as far as is known includes all attested natural languages, even taking an extreme view on these examples. These grammars may have a copying operation that can recursively copy constituents, allowing them to generate non-semilinear languages. We generalise the notion of a context to a class of functions that include copying operations. The congruential approach is ineffective at this level of the hierarchy; accordingly we extend this using dual approaches, defining nonterminals using sets of these generalised contexts. As a corollary we also extend the multiple context free grammars using the lattice based approaches.

[Cla12] Alexander Clark. Logical grammars, logical theories. In Denis Bechet and Alexandre Dikovsky, editors, Logical Aspects of Computational Linguistics, pages 1-20. Springer, 2012. [ bib | .pdf ]
Residuated lattices form one of the theoretical backbones of the Lambek Calculus as the standard free models. They also appear in grammatical inference as the syntactic concept lattice, an algebraic structure canonically defined for every language L based on the lattice of all distributionally definable subsets of strings. Recent results show that it is possible to build representations, such as context-free grammars, based on these lattices, and that these representations will be efficiently learnable using distributional learning. In this paper we discuss the use of these syntactic concept lattices as models of Lambek grammars, and use the tools of algebraic logic to try to link the proof theoretic ideas of the Lambek calculus with the more algebraic approach taken in grammatical inference. We can then reconceive grammars of various types as equational theories of the syntactic concept lattice of the language. We then extend this naturally from models based on concatenation of strings, to ones based on concatenations of discontinuous strings, which takes us from context-free formalisms to mildly context sensitive formalisms (multiple context-free grammars) and Morrill's displacement calculus.

[CL12] Alexander Clark and Shalom Lappin. Computational learning theory and language acquisition. In R. Kempson, N. Asher, and T. Fernando, editors, Philosophy of Linguistics, pages 445-475. Elsevier, 2012. [ bib | .pdf ]
[YC12a] Ryo Yoshinaka and Alexander Clark. Polynomial time learning of some multiple context-free languages with a minimally adequate teacher. In Philippe Groote and Mark-Jan Nederhof, editors, Formal Grammar, volume 7395 of Lecture Notes in Computer Science, pages 192-207. Springer Berlin Heidelberg, 2012. [ bib | DOI | http ]
[YC12b] Ryo Yoshinaka and Alexander Clark. Polynomial time learning of some multiple context-free languages with a minimally adequate teacher. In Philippe de Groote and Mark-Jan Nederhof, editors, Formal Grammar: 15th and 16th International Conference on Formal Grammar, pages 192-207. Springer-Verlag, 2012. [ bib ]
[CS11] Alexander Clark and William Gregory Sakas. Computational models of first language acquisition. Research on Language and Computation, 8(2-3):101-106, 2011. [ bib ]
[Cla11b] Alexander Clark. A language theoretic approach to syntactic structure. In Makoto Kanazawa, András Kornai, Marcus Kracht, and Hiroyuki Seki, editors, The Mathematics of Language, volume 6878 of Lecture Notes in Computer Science, pages 39-56. Springer Berlin Heidelberg, 2011. [ bib | DOI | http ]
[Cla11c] Alexander Clark. A language theoretic approach to syntactic structure. In Makoto Kanazawa, András Kornai, Marcus Kracht, and Hiroyuki Seki, editors, Proceedings of the 12th Meeting on the Mathematics of Language (MOL), pages 39-56, Nara, Japan, 2011. Springer Berlin Heidelberg. [ bib | .pdf ]
We consider the idea of defining syntactic structure relative to a language, rather than to a grammar for a language. This allows us to define a notion of hierarchical structure that is independent of the particular grammar, and that depends rather on the properties of various algebraic structures canonically associated with a language. Our goal is not necessarily to recover the traditional ideas of syntactic structure invented by linguists, but rather to come up with an objective notion of syntactic structure that can be used for semantic interpretation. The role of syntactic structure is to bring together words and constituents that are apart on the surface, so they can be combined appropriately. The approach is based on identifying concatenation operations which are non-trivial and using these to constrain the allowable local trees in a structural description.

[Cla11a] Alexander Clark. Inference of inversion transduction grammars. In Proceedings of ICML, pages 201-208, Bellevue, Washington, 2011. [ bib | .pdf ]
We present the first polynomial algorithm for learning a class of inversion transduction grammars (ITGs) that implement context free transducers - functions from strings to strings. The class of transductions that we can learn properly includes all subsequential transductions. These algorithms are based on a generalisation of distributional learning; we prove correctness of our algorithm under an identification in the limit model.

[GSTC+11] D. Glowacka, J. Shawe-Taylor, A. Clark, C. de la Higuera, and M. Johnson. Introduction to the special topic on grammar induction, representation of language and language learning. Journal of Machine Learning Research, 12:1425-1428, 2011. [ bib ]
[CCW11] Alexander Clark, Christophe Costa Florêncio, and Chris Watkins. Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 82:351-373, 2011. 10.1007/s10994-010-5218-3. [ bib | http ]
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We discuss the language-theoretic properties of this formalism with particular reference to the implicit feature maps defined by string kernels, considering the expressive power of the formalism, its closure properties and its relationship to other formalisms. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context-sensitive languages can be represented in this way and that it is possible to efficiently learn these using kernel PCA. We experimentally demonstrate the effectiveness of this approach on some standard examples of context-sensitive languages using small synthetic data sets.

[CL11] Alexander Clark and Shalom Lappin. Linguistic Nativism and the Poverty of the Stimulus. Wiley-Blackwell, Malden, MA, 2011. [ bib | .html ]
[CCFW11] A. Clark, C. Costa Florêncio, and C. Watkins. Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 82(3):351-373, 2011. [ bib ]
[Cla11d] Alexander Clark. A learnable representation for syntax using residuated lattices. In Philippe Groote, Markus Egg, and Laura Kallmeyer, editors, Formal Grammar, volume 5591 of Lecture Notes in Computer Science, pages 183-198. Springer Berlin Heidelberg, 2011. [ bib | DOI | http ]
[CEH10] Alexander Clark, Rémi Eyraud, and Amaury Habrard. Using contextual representations to efficiently learn context-free languages. Journal of Machine Learning Research, 11:2707-2744, October 2010. [ bib | .pdf ]
We present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Preliminary experimental results confirm the effectiveness of this approach.

[Cla10e] Alexander Clark. Towards general algorithms for grammatical inference. In M. Kanazawa et al, editor, Proceedings of ALT, pages 11-30. Springer-Verlag, Canberra, Australia, October 2010. Invited Paper. [ bib | .pdf ]
Many algorithms for grammatical inference can be viewed as instances of a more general algorithm which maintains a set of primitive elements, which distributionally define sets of strings, and a set of features or tests that constrain various inference rules. Using this general framework, which we cast as a process of logical inference, we re-analyse Angluin's famous lstar algorithm and several recent algorithms for the inference of context-free grammars and multiple context-free grammars. Finally, to illustrate the advantages of this approach, we extend it to the inference of functional transductions from positive data only, and we present a new algorithm for the inference of finite state transducers.

[Cla10a] Alexander Clark. Distributional learning of some context-free languages with a minimally adequate teacher. In José Sempere and Pedro Garcia, editors, Grammatical Inference: Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, pages 24-37. Springer-Verlag, September 2010. [ bib ]
Angluin showed that the class of regular languages could be learned from a Minimally Adequate Teacher (MAT) providing membership and equivalence queries. Clark and Eyraud (2007) showed that some context free grammars can be identified in the limit from positive data alone by identifying the congruence classes of the language. In this paper we consider learnability of context free languages using a MAT. We show that there is a natural class of context free languages, that includes the class of regular languages, that can be polynomially learned from a MAT, using an algorithm that is an extension of Angluin's LSTAR algorithm.

[Cla10b] Alexander Clark. Efficient, correct, unsupervised learning of context-sensitive languages. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 28-37, Uppsala, Sweden, July 2010. Association for Computational Linguistics. [ bib | .pdf ]
A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context-free languages. We present a simple algorithm for learning these grammars together with a complete self-contained proof of the correctness and efficiency of the algorithm.

[YC10] Ryo Yoshinaka and Alexander Clark. Polynomial time learning of some multiple context-free languages with a minimally adequate teacher. In Proceedings of the 15th Conference on Formal Grammar, Copenhagen, Denmark, 2010. [ bib | .pdf ]
We present an algorithm for the inference of some Multiple Context-Free Grammars from Membership and Equivalence Queries, using the Minimally Adequate Teacher model of Angluin. This is an extension of the congruence based methods for learning some Context-Free Grammars proposed by Clark (ICGI 2010). We define the natural extension of the syntactic congruence to tuples of strings, and demonstrate we can efficiently learn the class of Multiple Context-Free Grammars where the non-terminals correspond to the congruence classes under this relation.

[Cla10c] Alexander Clark. Learning context free grammars with the syntactic concept lattice. In José Sempere and Pedro Garcia, editors, Grammatical Inference: Theoretical Results and Applications. Proceedings of the International Colloquium on Grammatical Inference, pages 38-51. Springer-Verlag, 2010. [ bib | .pdf ]
The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (cfg) on the structure of this lattice; in particular by choosing non-terminals to correspond to concepts in this lattice. We present a learning algorithm for context free grammars which uses positive data and membership queries, and prove its correctness under the identification in the limit paradigm. Since the lattice itself may be infinite, we consider only a polynomially bounded subset of the set of concepts, in order to get an efficient algorithm. We compare this on the one hand to learning algorithms for context free grammars, where the non-terminals correspond to congruence classes, and on the other hand to the use of context sensitive techniques such as Binary Feature Grammars and Distributional Lattice Grammars. The class of cfgs that can be learned in this way includes inherently ambiguous and thus non-deterministic languages; this approach therefore breaks through an important barrier in cfg inference. ERRATA: there are two significant errors in this paper. I am grateful to Hans Leiss for pointing these out to me, and proposing modifications to the algorithm that correct the errors.

[CFL10] Alexander Clark, Chris Fox, and Shalom Lappin, editors. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwell, 2010. [ bib | .html ]
[CL10] Alexander Clark and Shalom Lappin. Unsupervised learning and grammar induction. In Alexander Clark, Chris Fox, and Shalom Lappin, editors, The Handbook of Computational Linguistics and Natural Language Processing, pages 197-220. Wiley-Blackwell, 2010. [ bib ]
[Cla10d] Alexander Clark. Three learnable models for the description of language. In Carlos Martín-Vide Adrian-Horia Dediu, Henning Fernau, editor, Language and Automata Theory and Applications, Fourth International Conference, LATA 2010, LNCS, pages 16-31. Springer-Verlag, 2010. [ bib | .pdf ]
Learnability is a vital property of formal grammars: representation classes should be defined in such a way that they are learnable. One way to build learnable representations is by making them objective or empiricist: the structure of the representation should be based on the structure of the language. Rather than defining a function from representation to language we should start by defining a function from the language to the representation: following this strategy gives classes of representations that are easy to learn. We illustrate this approach with three classes, defined in analogy to the lowest three levels of the Chomsky hierarchy. First, we recall the canonical deterministic finite automaton, where the states of the automaton correspond to the right congruence classes of the language. Secondly, we define context free grammars where the non-terminals of the grammar correspond to the syntactic congruence classes, and where the productions are defined by the syntactic monoid; finally we define a residuated lattice structure from the Galois connection between strings and contexts, which we call the syntactic concept lattice, and base a representation on this, which allows us to define a class of languages that includes some non-context free languages, many context-free languages and all regular languages. All three classes are efficiently learnable under suitable learning paradigms.

[CL09] Alexander Clark and Shalom Lappin. Another look at indirect negative evidence. In Proceedings of the EACL Workshop on Cognitive Aspects of Computational Language Acquisition, Athens, March 2009. [ bib | .pdf ]
Indirect negative evidence is clearly an important way for learners to constrain overgeneralisation, and yet a good learning theoretic analysis has yet to be provided for this, whether in a PAC or a probabilistic identification in the limit framework. In this paper we suggest a theoretical analysis of indirect negative evidence that allows the presence of ungrammatical strings in the input and also accounts for the relationship between grammaticality/acceptability and probability. Given independently justified assumptions about lower bounds on the probabilities of grammatical strings, we establish that a limited number of membership queries of some strings can be probabilistically simulated.

[CEH09] Alexander Clark, Rémi Eyraud, and Amaury Habrard. A note on contextual binary feature grammars. In EACL 2009 workshop on Computational Linguistic Aspects of Grammatical Inference, Athens, Greece, 2009. ACL. [ bib | .pdf ]
Contextual Binary Feature Grammars were recently proposed by (Clark et al., 2008) as a learnable representation for richly structured context-free and context sensitive languages. In this paper we examine the representational power of the formalism, its relationship to other standard formalisms and language classes, and its appropriateness for modelling natural language.

[Cla09] Alexander Clark. A learnable representation for syntax using residuated lattices. In Proceedings of the 14th Conference on Formal Grammar, Bordeaux, France, 2009. [ bib | .pdf ]
We propose a representation for natural language syntax based on the theory of residuated lattices: in particular on the Galois lattice between contexts and substrings, which we call the syntactic concept lattice. The natural representation derived from this is a richly structured context sensitive formalism that can be learned using a generalisation of distributional learning. In this paper we define the basic algebraic properties of the syntactic concept lattice, together with a representation derived from this lattice and discuss the generative power of the formalism. We establish some basic results which show that these representations, because they are defined language theoretically, can be inferred from information about the set of grammatical strings of the language. We also discuss the relation to other grammatical formalisms notably categorial grammar and context free grammars. We claim that this lattice based formalism is plausibly both learnable from evidence about the grammatical strings of a language and may be powerful enough to represent natural languages, and thus presents a potential solution to the central problem of theoretical linguistics.

[CEH08] Alexander Clark, Rémi Eyraud, and Amaury Habrard. A polynomial algorithm for the inference of context free languages. In Proceedings of International Colloquium on Grammatical Inference, pages 29-42. Springer-Verlag, September 2008. [ bib | .pdf ]
We present a polynomial algorithm for the inductive inference of a large class of context free languages, that includes all regular languages. The algorithm uses a representation which we call Binary Feature Grammars based on a set of features, capable of representing richly structured context free languages as well as some context sensitive languages. More precisely, we focus on a particular case of this representation where the features correspond to contexts appearing in the language. Using the paradigm of positive data and a membership oracle, we can establish that all context free languages that satisfy two constraints on the context distributions can be identified in the limit by this approach. The polynomial time algorithm we propose is based on a generalisation of distributional learning and uses the lattice of context occurrences. The formalism and the algorithm seem well suited to natural language and in particular to the modelling of first language acquisition.

[CCM08] Alexander Clark, François Coste, and Laurent Miclet, editors. Grammatical Inference: Algorithms and Applications, 9th International Colloquium, ICGI 2008, Saint-Malo, France, September 22-24, 2008, Proceedings, volume 5278 of Lecture Notes in Computer Science. Springer, 2008. [ bib | http ]
[CW08] Alexander Clark and Chris Watkins. Some alternatives to Parikh matrices using string kernels. Fundamenta Informaticae, 84(3-4):291-303, 2008. [ bib ]
We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices, which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyper-parameter the resulting morphism is injective.

[CE07] Alexander Clark and Rémi Eyraud. Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research, 8:1725-1745, August 2007. [ bib | .pdf ]
This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free languages. We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. It transpires that it is not necessary to identify constituents in order to learn a context-free language - it is sufficient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a context-free grammar. We also discuss modifications to the algorithm that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin's notion of reversibility for regular languages. We also demonstrate that an implementation of this algorithm is capable of learning a classic example of structure dependent syntax in English: this constitutes a refutation of an argument that has been used in support of nativist theories of language.

[Cla07] Alexander Clark. Learning deterministic context free grammars: the Omphalos competition. Machine Learning, 66(1):93-110, January 2007. [ bib | .pdf ]
This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a context-free grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.

[CCFW06] Alexander Clark, Christophe Costa Florêncio, and Chris Watkins. Languages as hyperplanes: grammatical inference with string kernels. In Proceedings of the European Conference on Machine Learning (ECML), pages 90-101, 2006. [ bib | .pdf ]
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context sensitive languages can be represented in this way and it is possible to efficiently learn these using kernel PCA. We present some experiments demonstrating the effectiveness of this approach on some standard examples of context sensitive languages using small synthetic data sets.

[Cla06a] Alexander Clark. Large scale inference of deterministic transductions: Tenjinno problem 1. In Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI), pages 227-239, 2006. [ bib | .pdf ]
We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.

[Cla06b] Alexander Clark. PAC-learning unambiguous NTS languages. In Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI), pages 59-71, 2006. [ bib | .pdf ]
Non-terminally separated (NTS) languages are a subclass of deterministic context free languages where there is a stable relationship between the substrings of the language and the non-terminals of the grammar. We show that when the distribution of samples is generated by a PCFG, based on the same grammar as the target language, the class of unambiguous NTS languages is PAC-learnable from positive data alone, with polynomial bounds on data and computation.

[CCFWS06] Alexander Clark, Christophe Costa Florêncio, Chris Watkins, and Mariette Serayet. Planar languages and learnability. In Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI), pages 148-160, 2006. [ bib | .pdf ]
Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the pre-image of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernel-based learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting context-sensitive languages. In particular we show that the cross-serial dependencies in Swiss German, that established the non-context-freeness of natural language, are learnable using a standard kernel. We demonstrate the polynomial-time identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.

[CE06a] Alexander Clark and Remi Eyraud. Learning auxiliary fronting with grammatical inference. In Proceedings of CoNLL, pages 125-132, New York, 2006. [ bib | .pdf ]
We present a simple context-free grammatical inference algorithm, and prove that it is capable of learning an interesting subclass of context-free languages. We also demonstrate that an implementation of this algorithm is capable of learning auxiliary fronting in polar interrogatives (AFIPI) in English. This has been one of the most important test cases in language acquisition over the last few decades. We demonstrate that learning can proceed even in the complete absence of examples of particular constructions, and thus that debates about the frequency of occurrence of such constructions are irrelevant. We discuss the implications of this on the type of innate learning biases that must be hypothesized to explain first language acquisition.

[GCA06b] Maria Georgescul, Alexander Clark, and Susan Armstrong. Word distributions for thematic segmentation in a support vector machine approach. In Proceedings of CoNLL, pages 101-108, New York, 2006. [ bib | .pdf ]
We investigate the appropriateness of using a technique based on support vector machines for identifying thematic structure of text streams. The thematic segmentation task is modeled as a binary-classification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Experiments are conducted with this approach by using features based on word distributions through text. We provide empirical evidence that our approach is robust, by showing good performance on three different data sets. In particular, substantial improvement is obtained over previously published results of word-distribution based systems when evaluation is done on a corpus of recorded and transcribed multi-party dialogs.

[GCA06a] Maria Georgescul, Alexander Clark, and Susan Armstrong. An analysis of methodological and quantitative aspects in the evaluation of thematic segmentation algorithms. In Proceedings of SIGDIAL, Sydney, Australia, 2006. [ bib | .pdf ]
We consider here the task of linear thematic segmentation of text documents, by using features based on word distributions in the text. For this task, a typical and often implicit assumption in previous studies is that a document has just one topic and therefore many algorithms have been tested and have shown encouraging results on artificial data sets, generated by putting together parts of different documents. We show that evaluation on synthetic data is potentially misleading and fails to give an accurate evaluation of the performance on real data. Moreover, we provide a critical review of existing evaluation metrics in the literature and we propose an improved evaluation metric.

[CE06b] Alexander Clark and Remi Eyraud. Learning auxiliary fronting with grammatical inference. In Proceedings of the 28th Annual meeting of the Cognitive Science Society, Vancouver, Canada, 2006. This paper is closely related to the paper in CONLL, but is more directed to a cognitive science audience. [ bib | .pdf ]
We present a simple context-free grammatical inference algorithm, and prove that it is capable of learning an interesting subclass of context-free languages. We also demonstrate that an implementation of this algorithm is capable of learning auxiliary fronting in polar interrogatives (AFIPI) in English. This has been one of the most important test cases in language acquisition over the last few decades. We demonstrate that learning can proceed even in the complete absence of examples of particular constructions, and thus that debates about the frequency of occurrence of such constructions are irrelevant. We discuss the implications of this on the type of innate learning biases that must be hypothesized to explain first language acquisition.

[SCCX05] William Gregory Sakas, Alexander Clark, James Cussens, and Aris Xanthos, editors. Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005. [ bib ]
[CE05] Alexander Clark and Remi Eyraud. Identification in the limit of substitutable context free languages. In Sanjay Jain, Hans Ulrich Simon, and Etsuji Tomita, editors, Proceedings of The 16th International Conference on Algorithmic Learning Theory, pages 283-296. Springer-Verlag, 2005. [ bib | .pdf ]
This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free grammars. We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. We also discuss modifications to the algorithm that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin's notion of reversibility for regular languages.

[Cla04] Alexander Clark. Grammatical inference and first language acquisition. In Workshop on Psycho-computational Models of Human Language Acquisition, Geneva, Switzerland, August 2004. [ bib | .pdf ]
One argument for parametric models of language has been learnability in the context of first language acquisition. The claim is made that “logical” arguments from learnability theory require non-trivial constraints on the class of languages. Initial formalisations of the problem (Gold, 1967) are however inapplicable to this particular situation. In this paper we construct an appropriate formalisation of the problem using a modern vocabulary drawn from statistical learning theory and grammatical inference and looking in detail at the relevant empirical facts. We claim that a variant of the Probably Approximately Correct (PAC) learning framework (Valiant, 984) with positive samples only, modified so it is not completely distribution free is the appropriate choice. Some negative results derived from cryptographic problems appear to apply in this situation but the existence of algorithms with provably good performance and subsequent work, shows how these negative results are not as strong as they initially appear, and that recent algorithms for learning regular languages partially satisfy our criteria. We then discuss the applicability of these results to parametric and non-parametric models.

[LCL04] Jey Han Lau, Alexander Clark, and Shalom Lappin. Measuring gradience in speakers’ grammaticality judgements. In Proceedings of the 36th Annual Conference of the Cognitive Science Society, Quebec City, July 2004. [ bib ]
[CT04a] Alexander Clark and Franck Thollard. PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5:473-497, May 2004. [ bib | .pdf ]
We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAC-learning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a bound on the distinguishability between states. With this, we demonstrate that the class of PDFAs is PAC-learnable using a variant of a standard state-merging algorithm and the Kullback-Leibler divergence as error function.

[CT04b] Alexander Clark and Franck Thollard. Partially distribution-free learning of regular languages from positive samples. In Proceedings of COLING, pages 85-91, Geneva, Switzerland, 2004. [ bib | .pdf ]
Regular languages are widely used in NLP today in spite of their shortcomings. Efficient algorithms that can reliably learn these languages, and which must in realistic applications only use positive samples, are necessary. These languages are not learnable under traditional distribution free criteria. We claim that an appropriate learning framework is PAC learning where the distributions are constrained to be generated by a class of stochastic automata with support equal to the target concept. We discuss how this is related to other learning paradigms. We then present a simple learning algorithm for regular languages, and a self-contained proof that it learns according to this partially distribution free criterion.

[Cla03b] Alexander Clark. Pre-processing very noisy text. In Proceedings of Workshop on Shallow Processing of Large Corpora, Corpus Linguistics, Lancaster, 2003. [ bib | .pdf ]
Existing techniques for tokenisation and sentence boundary identification are extremely accurate whenthe data is perfectly clean (Mikheev, 2002), and have been applied successfully to corpora of news feeds and other post-edited corpora. Informal written texts are readily available, and with the growth ofother informal text modalities (IRC, ICQ, SMS etc.) are becoming an interesting alternative, perhaps better suited as a source for lexical resources and language models for studies of dialogue andspontaneous speech. However, the high degree of spelling errors, irregularities and idiosyncrasies in the use of punctuation, white space and capitalisation require specialised tools. In this paper we study thedesign and implementation of a tool for pre-processing and normalisation of noisy corpora. We argue that rather than having separate tools for tokenisation, segmentation and spelling correction organisedin a pipeline, a unified tool is appropriate because of certain specific sorts of errors. We describe how a noisy channel model can be used at the character level to perform this. We describe how the sequenceof tokens needs to be divided into various types depending on their characteristics, and also how the modelling of white-space needs to be conditioned on the type of the preceding and following tokens.We use trainable stochastic transducers to model typographical errors, and other orthographic changes and a variety of sequence models for white space and the different sorts of tokens. We discuss thetraining of the models and various efficiency issues related to the decoding algorithm, and illustrate this with examples from a 100 million word corpus of Usenet news.

[Cla03a] Alexander Clark. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics (EACL), pages 59-66, 2003. [ bib | .pdf | .ps ]
[ACC+03] S. Armstrong, A. Clark, G. Coray, M. Georgescul, V. Pallotta, A. Popescu-Belis, D. Portabella, M. Rajman, and M. Starlander. Natural language queries on natural language data: a database of meeting dialogues. In 8th International Conference on Applications of Natural Language to Information Systems, Burg/Cottbus, Germany, 2003. [ bib ]
This paper describes an integrated system that enables the storage and retrieval of meeting transcripts (e.g. staff meetings). The system gives users who have not attended a meeting, or who want to review a particular point, enhanced access to an annotated version of the recorded data. This paper describes the various stages in the processing, storage and query of the data. First, we put forward the idea of shallow dialogue processing, in order to extract significant features of the meeting transcriptions for storage in a database, whose structure is briefly outlined. Low-level access to the database is provided as a Web service, which can be connected to several interfaces. A description of how multimodal input can be used with VoiceXML is also provided, thus offering an easy solution for voice and web based access to the dialogue data. The paper ends with considerations about the available data and its use in the current version of the system.

[Cla02] Alexander Clark. Memory-based learning of morphology with stochastic transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 513-520, 2002. [ bib | .pdf ]
This paper discusses the supervised learning of morphology using stochastic transducers, trained using the Expectation-Maximization (EM) algorithm. Two approaches are presented: first, using the transducers directly to model the process, and secondly using them to define a similarity measure, related to the Fisher kernel method, and then using a Memory-Based Learning (MBL) technique. These are evaluated and compared on data sets from English, German, Slovene and Arabic.

[Cla01b] Alexander Clark. Partially supervised learning of morphology with stochastic transducers. In Proc. of Natural Language Processing Pacific Rim Symposium, NLPRS 2001, pages 341-348, Tokyo, Japan, November 2001. [ bib | .pdf ]
[Cla01a] Alexander Clark. Learning morphology with Pair Hidden Markov Models. In Proc. of the Student Workshop at the 39th Annual Meeting of the Association for Computational Linguistics, pages 55-60, Toulouse, France, July 2001. [ bib | .pdf | .ps ]
In this paper I present a novel Machine Learning approach to the acquisition of stochastic string transductions based on Pair Hidden Markov Models (PHMMs), a model used in computational biology. I show how these models can be used to learn morphological processes in a variety of languages, including English, German and Arabic. Previous techniques for learning morphology have been restricted to languages with essentially concatenative morphology.

[Cla01c] Alexander Clark. Unsupervised induction of stochastic context free grammars with distributional clustering. In Proc. of Conference on Computational Natural Language Learning, pages 105-112, Toulouse, France, July 2001. [ bib | .ps | .pdf ]
[Cla01d] Alexander Clark. Unsupervised Language Acquisition: Theory and Practice. PhD thesis, COGS, University of Sussex, 2001. [ bib | .pdf ]
[Cla00] Alexander Clark. Inducing syntactic categories by context distribution clustering. In Proc. of Conference on Computational Natural Language Learning, pages 91-94, Lisbon, Portugal, 2000. [ bib | .ps | .pdf ]
[Cla98] Alexander Clark. Random fields in nlp. M. Sc. dissertation, COGS, University of Sussex, 1998. [ bib | .ps.gz ]

This file was generated by bibtex2html 1.96.