[Cla15]  Alexander Clark. Learnability. In Brian MacWhinney and William O'Grady, editors, Handbook of Language Emergence, pages 377395. John Wiley & Sons, Inc, 2015. [ bib ] 
[ACZZ14]  Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles. Editors introduction. In Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, editors, Algorithmic Learning Theory, volume 8776 of Lecture Notes in Computer Science, pages 17. Springer International Publishing, 2014. [ bib  DOI  http ] 
[CY14]  Alexander Clark and Ryo Yoshinaka. An algebraic approach to multiple contextfree grammars. In Sergei Soloviev and Nicholas Asher, editors, Logical Aspects of Computational Linguistics, pages 5769. Springer, 2014. [ bib  .pdf ] 
[Cla14] 
Alexander Clark.
Learning trees from strings: A strong learning algorithm for some
contextfree grammars.
Journal of Machine Learning Research, 14:35373559, 2014.
[ bib 
.pdf ]
Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold's identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable contextfree languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of contextfree grammars.

[CGL13a] 
Alexander Clark, Gianluca Giorgolo, and Shalom Lappin.
Statistical representation of grammaticality judgements: The limits
of ngram models.
In Proceedings of the ACL Workshop on Cognitive Modelling and
Computational Linguistics, Sofia, Bulgaria, 2013.
[ bib ]
We use a set of enriched ngram models to track grammaticality judgements for different sorts of passive sentences in English. We construct these models by specifying scoring functions to map the log probabilities (logprobs) of an ngram model for a test set of sentences onto scores which depend on properties of the string related to the parameters of the model. We test our models on classification tasks for different kinds of passive sentences. Our experiments indicate that our ngram models achieve high accuracy in identifying illformed passives in which illformedness depends on local relations within the ngram frame, but they are far less successful in detecting nonlocal relations that produce unacceptability in other types of passive construction. We take these results to indicate some of the strengths and the limitations of word and lexical class ngram models as candidate representations of speakers' grammatical knowledge.

[CGL13b] 
Alexander Clark, Gianluca Giorgolo, and Shalom Lappin.
Towards a statistical model of grammaticality.
In Proceedings of the 35th Annual Conference of the Cognitive
Science Society, 2013.
[ bib ]
The question of whether it is possible to characterise grammatical knowledge in probabilistic terms is central to determining the relationship of linguistic representation to other cognitive domains. We present a statistical model of grammaticality which maps the probabilities of a statistical model for sentences in parts of the British National Corpus (BNC) into grammaticality scores, using various functions of the parameters of the model. We test this approach with a classifier on test sets containing different levels of syntactic infelicity. With appropriate tuning, the classifiers achieve encouraging levels of accuracy. These experiments suggest that it may be possible to characterise grammaticality judgements in probabilistic terms using an enriched language model.

[CY13] 
Alexander Clark and Ryo Yoshinaka.
Distributional learning of parallel multiple contextfree grammars.
Machine Learning, 2013.
[ bib 
DOI ]
Natural languages require grammars beyond contextfree for their description. Here we extend a family of distributional learning algorithms to the class of Parallel Multiple ContextFree Grammars which include two additional operations beyond the simple contextfree operation of concatenation: they include both the ability to interleave or intercalate strings, together with an operation for the copying of strings as well, which allows the grammars to generate nonsemilinear languages, which are outside the class of mildly contextsensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.

[Cla13] 
Alexander Clark.
The syntactic concept lattice another algebraic theory of the
contextfree languages?
Journal of Logic and Computation, 2013.
[ bib 
DOI 
Journal website 
.pdf ]
The syntactic concept lattice is a residuated lattice associated with a given formal language; it arises naturally as a generalization of the syntactic monoid in the analysis of the distributional structure of the language. In this article we define the syntactic concept lattice and present its basic properties, and its relationship to the universal automaton and the syntactic congruence; we consider several different equivalent definitions, as Galois connections, as maximal factorizations and finally using universal algebra to define it as an object that has a certain universal (terminal) property in the category of complete idempotent semirings that recognize a given language, applying techniques from automata theory to the theory of contextfree grammars (CFGs). We conclude that grammars that are minimal, in a certain weak sense, will always have nonterminals that correspond to elements of this lattice, and claim that the syntactic concept lattice provides a natural system of categories for representing the denotations of CFGs.

[CL13] 
Alexander Clark and Shalom Lappin.
Complexity in language acquisition.
Topics in Cognitive Science, 5(1):89110, 2013.
[ bib 
DOI 
http ]
Learning theory has frequently been applied to language acquisition, but discussion has largely focused on information theoretic problems—in particular on the absence of direct negative evidence. Such arguments typically neglect the probabilistic nature of cognition and learning in general. We argue first that these arguments, and analyses based on them, suffer from a major flaw: they systematically conflate the hypothesis class and the learnable concept class. As a result, they do not allow one to draw significant conclusions about the learner. Second, we claim that the real problem for language learning is the computational complexity of constructing a hypothesis from input data. Studying this problem allows for a more direct approach to the object of study—the language acquisition device—rather than the learnable class of languages, which is epiphenomenal and possibly hard to characterize. The learnability results informed by complexity studies are much more insightful. They strongly suggest that target grammars need to be objective, in the sense that the primitive elements of these grammars are based on objectively definable properties of the language itself. These considerations support the view that language acquisition proceeds primarily through datadriven learning of some form. Keywords: Computational complexity, Language acquisition, Computational learning theory 
[CY12] 
Alexander Clark and Ryo Yoshinaka.
Beyond semilinearity: Distributional learning of parallel multiple
contextfree grammars.
In Jeffrey Heinz, Colin de la Higuera, and Tim Oates, editors,
Proceedings of the Eleventh International Conference on Grammatical
Inference, volume 21 of JMLR Workshop and Conference Proceedings,
pages 8496, 2012.
[ bib 
.pdf ]
Semilinearity is widely held to be a linguistic invariant but, controversially, some linguistic phenomena in languages like Old Georgian and Yoruba seem to violate this constraint. In this paper we extend distributional learning to the class of parallel multiple contextfree grammars, a class which as far as is known includes all attested natural languages, even taking an extreme view on these examples. These grammars may have a copying operation that can recursively copy constituents, allowing them to generate nonsemilinear languages. We generalise the notion of a context to a class of functions that include copying operations. The congruential approach is ineffective at this level of the hierarchy; accordingly we extend this using dual approaches, defining nonterminals using sets of these generalised contexts. As a corollary we also extend the multiple context free grammars using the lattice based approaches.

[Cla12] 
Alexander Clark.
Logical grammars, logical theories.
In Denis Bechet and Alexandre Dikovsky, editors, Logical Aspects
of Computational Linguistics, pages 120. Springer, 2012.
[ bib 
.pdf ]
Residuated lattices form one of the theoretical backbones of the Lambek Calculus as the standard free models. They also appear in grammatical inference as the syntactic concept lattice, an algebraic structure canonically defined for every language L based on the lattice of all distributionally definable subsets of strings. Recent results show that it is possible to build representations, such as contextfree grammars, based on these lattices, and that these representations will be efficiently learnable using distributional learning. In this paper we discuss the use of these syntactic concept lattices as models of Lambek grammars, and use the tools of algebraic logic to try to link the proof theoretic ideas of the Lambek calculus with the more algebraic approach taken in grammatical inference. We can then reconceive grammars of various types as equational theories of the syntactic concept lattice of the language. We then extend this naturally from models based on concatenation of strings, to ones based on concatenations of discontinuous strings, which takes us from contextfree formalisms to mildly context sensitive formalisms (multiple contextfree grammars) and Morrill's displacement calculus.

[CL12]  Alexander Clark and Shalom Lappin. Computational learning theory and language acquisition. In R. Kempson, N. Asher, and T. Fernando, editors, Philosophy of Linguistics, pages 445475. Elsevier, 2012. [ bib  .pdf ] 
[YC12a]  Ryo Yoshinaka and Alexander Clark. Polynomial time learning of some multiple contextfree languages with a minimally adequate teacher. In Philippe Groote and MarkJan Nederhof, editors, Formal Grammar, volume 7395 of Lecture Notes in Computer Science, pages 192207. Springer Berlin Heidelberg, 2012. [ bib  DOI  http ] 
[YC12b]  Ryo Yoshinaka and Alexander Clark. Polynomial time learning of some multiple contextfree languages with a minimally adequate teacher. In Philippe de Groote and MarkJan Nederhof, editors, Formal Grammar: 15th and 16th International Conference on Formal Grammar, pages 192207. SpringerVerlag, 2012. [ bib ] 
[CS11]  Alexander Clark and William Gregory Sakas. Computational models of first language acquisition. Research on Language and Computation, 8(23):101106, 2011. [ bib ] 
[Cla11b]  Alexander Clark. A language theoretic approach to syntactic structure. In Makoto Kanazawa, András Kornai, Marcus Kracht, and Hiroyuki Seki, editors, The Mathematics of Language, volume 6878 of Lecture Notes in Computer Science, pages 3956. Springer Berlin Heidelberg, 2011. [ bib  DOI  http ] 
[Cla11c] 
Alexander Clark.
A language theoretic approach to syntactic structure.
In Makoto Kanazawa, András Kornai, Marcus Kracht, and Hiroyuki Seki,
editors, Proceedings of the 12th Meeting on the Mathematics of Language
(MOL), pages 3956, Nara, Japan, 2011. Springer Berlin Heidelberg.
[ bib 
.pdf ]
We consider the idea of defining syntactic structure relative to a language, rather than to a grammar for a language. This allows us to define a notion of hierarchical structure that is independent of the particular grammar, and that depends rather on the properties of various algebraic structures canonically associated with a language. Our goal is not necessarily to recover the traditional ideas of syntactic structure invented by linguists, but rather to come up with an objective notion of syntactic structure that can be used for semantic interpretation. The role of syntactic structure is to bring together words and constituents that are apart on the surface, so they can be combined appropriately. The approach is based on identifying concatenation operations which are nontrivial and using these to constrain the allowable local trees in a structural description.

[Cla11a] 
Alexander Clark.
Inference of inversion transduction grammars.
In Proceedings of ICML, pages 201208, Bellevue, Washington,
2011.
[ bib 
.pdf ]
We present the first polynomial algorithm for learning a class of inversion transduction grammars (ITGs) that implement context free transducers  functions from strings to strings. The class of transductions that we can learn properly includes all subsequential transductions. These algorithms are based on a generalisation of distributional learning; we prove correctness of our algorithm under an identification in the limit model.

[GSTC^{+}11]  D. Glowacka, J. ShaweTaylor, A. Clark, C. de la Higuera, and M. Johnson. Introduction to the special topic on grammar induction, representation of language and language learning. Journal of Machine Learning Research, 12:14251428, 2011. [ bib ] 
[CCW11] 
Alexander Clark, Christophe Costa Florêncio, and Chris Watkins.
Languages as hyperplanes: grammatical inference with string kernels.
Machine Learning, 82:351373, 2011.
10.1007/s1099401052183.
[ bib 
http ]
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We discuss the languagetheoretic properties of this formalism with particular reference to the implicit feature maps defined by string kernels, considering the expressive power of the formalism, its closure properties and its relationship to other formalisms. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly contextsensitive languages can be represented in this way and that it is possible to efficiently learn these using kernel PCA. We experimentally demonstrate the effectiveness of this approach on some standard examples of contextsensitive languages using small synthetic data sets.

[CL11]  Alexander Clark and Shalom Lappin. Linguistic Nativism and the Poverty of the Stimulus. WileyBlackwell, Malden, MA, 2011. [ bib  .html ] 
[CCFW11]  A. Clark, C. Costa Florêncio, and C. Watkins. Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 82(3):351373, 2011. [ bib ] 
[Cla11d]  Alexander Clark. A learnable representation for syntax using residuated lattices. In Philippe Groote, Markus Egg, and Laura Kallmeyer, editors, Formal Grammar, volume 5591 of Lecture Notes in Computer Science, pages 183198. Springer Berlin Heidelberg, 2011. [ bib  DOI  http ] 
[CEH10] 
Alexander Clark, Rémi Eyraud, and Amaury Habrard.
Using contextual representations to efficiently learn contextfree
languages.
Journal of Machine Learning Research, 11:27072744, October
2010.
[ bib 
.pdf ]
We present a polynomial update time algorithm for the inductive inference of a large class of contextfree languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured contextfree languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of contextfree languages, that includes all regular languages and those contextfree languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Preliminary experimental results confirm the effectiveness of this approach.

[Cla10e] 
Alexander Clark.
Towards general algorithms for grammatical inference.
In M. Kanazawa et al, editor, Proceedings of ALT, pages 1130.
SpringerVerlag, Canberra, Australia, October 2010.
Invited Paper.
[ bib 
.pdf ]
Many algorithms for grammatical inference can be viewed as instances of a more general algorithm which maintains a set of primitive elements, which distributionally define sets of strings, and a set of features or tests that constrain various inference rules. Using this general framework, which we cast as a process of logical inference, we reanalyse Angluin's famous lstar algorithm and several recent algorithms for the inference of contextfree grammars and multiple contextfree grammars. Finally, to illustrate the advantages of this approach, we extend it to the inference of functional transductions from positive data only, and we present a new algorithm for the inference of finite state transducers.

[Cla10a] 
Alexander Clark.
Distributional learning of some contextfree languages with a
minimally adequate teacher.
In José Sempere and Pedro Garcia, editors, Grammatical
Inference: Theoretical Results and Applications. Proceedings of the
International Colloquium on Grammatical Inference, pages 2437.
SpringerVerlag, September 2010.
[ bib ]
Angluin showed that the class of regular languages could be learned from a Minimally Adequate Teacher (MAT) providing membership and equivalence queries. Clark and Eyraud (2007) showed that some context free grammars can be identified in the limit from positive data alone by identifying the congruence classes of the language. In this paper we consider learnability of context free languages using a MAT. We show that there is a natural class of context free languages, that includes the class of regular languages, that can be polynomially learned from a MAT, using an algorithm that is an extension of Angluin's LSTAR algorithm.

[Cla10b] 
Alexander Clark.
Efficient, correct, unsupervised learning of contextsensitive
languages.
In Proceedings of the Fourteenth Conference on Computational
Natural Language Learning, pages 2837, Uppsala, Sweden, July 2010.
Association for Computational Linguistics.
[ bib 
.pdf ]
A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a latticetheoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all contextfree languages and some noncontextfree languages. We present a simple algorithm for learning these grammars together with a complete selfcontained proof of the correctness and efficiency of the algorithm.

[YC10] 
Ryo Yoshinaka and Alexander Clark.
Polynomial time learning of some multiple contextfree languages with
a minimally adequate teacher.
In Proceedings of the 15th Conference on Formal Grammar,
Copenhagen, Denmark, 2010.
[ bib 
.pdf ]
We present an algorithm for the inference of some Multiple ContextFree Grammars from Membership and Equivalence Queries, using the Minimally Adequate Teacher model of Angluin. This is an extension of the congruence based methods for learning some ContextFree Grammars proposed by Clark (ICGI 2010). We define the natural extension of the syntactic congruence to tuples of strings, and demonstrate we can efficiently learn the class of Multiple ContextFree Grammars where the nonterminals correspond to the congruence classes under this relation.

[Cla10c] 
Alexander Clark.
Learning context free grammars with the syntactic concept lattice.
In José Sempere and Pedro Garcia, editors, Grammatical
Inference: Theoretical Results and Applications. Proceedings of the
International Colloquium on Grammatical Inference, pages 3851.
SpringerVerlag, 2010.
[ bib 
.pdf ]
The Syntactic Concept Lattice is a residuated lattice based on the distributional structure of a language; the natural representation based on this is a context sensitive formalism. Here we examine the possibility of basing a context free grammar (cfg) on the structure of this lattice; in particular by choosing nonterminals to correspond to concepts in this lattice. We present a learning algorithm for context free grammars which uses positive data and membership queries, and prove its correctness under the identification in the limit paradigm. Since the lattice itself may be infinite, we consider only a polynomially bounded subset of the set of concepts, in order to get an efficient algorithm. We compare this on the one hand to learning algorithms for context free grammars, where the nonterminals correspond to congruence classes, and on the other hand to the use of context sensitive techniques such as Binary Feature Grammars and Distributional Lattice Grammars. The class of cfgs that can be learned in this way includes inherently ambiguous and thus nondeterministic languages; this approach therefore breaks through an important barrier in cfg inference. ERRATA: there are two significant errors in this paper. I am grateful to Hans Leiss for pointing these out to me, and proposing modifications to the algorithm that correct the errors.

[CFL10]  Alexander Clark, Chris Fox, and Shalom Lappin, editors. The Handbook of Computational Linguistics and Natural Language Processing. WileyBlackwell, 2010. [ bib  .html ] 
[CL10]  Alexander Clark and Shalom Lappin. Unsupervised learning and grammar induction. In Alexander Clark, Chris Fox, and Shalom Lappin, editors, The Handbook of Computational Linguistics and Natural Language Processing, pages 197220. WileyBlackwell, 2010. [ bib ] 
[Cla10d] 
Alexander Clark.
Three learnable models for the description of language.
In Carlos MartínVide AdrianHoria Dediu, Henning Fernau, editor,
Language and Automata Theory and Applications, Fourth International
Conference, LATA 2010, LNCS, pages 1631. SpringerVerlag, 2010.
[ bib 
.pdf ]
Learnability is a vital property of formal grammars: representation classes should be defined in such a way that they are learnable. One way to build learnable representations is by making them objective or empiricist: the structure of the representation should be based on the structure of the language. Rather than defining a function from representation to language we should start by defining a function from the language to the representation: following this strategy gives classes of representations that are easy to learn. We illustrate this approach with three classes, defined in analogy to the lowest three levels of the Chomsky hierarchy. First, we recall the canonical deterministic finite automaton, where the states of the automaton correspond to the right congruence classes of the language. Secondly, we define context free grammars where the nonterminals of the grammar correspond to the syntactic congruence classes, and where the productions are defined by the syntactic monoid; finally we define a residuated lattice structure from the Galois connection between strings and contexts, which we call the syntactic concept lattice, and base a representation on this, which allows us to define a class of languages that includes some noncontext free languages, many contextfree languages and all regular languages. All three classes are efficiently learnable under suitable learning paradigms.

[CL09] 
Alexander Clark and Shalom Lappin.
Another look at indirect negative evidence.
In Proceedings of the EACL Workshop on Cognitive Aspects of
Computational Language Acquisition, Athens, March 2009.
[ bib 
.pdf ]
Indirect negative evidence is clearly an important way for learners to constrain overgeneralisation, and yet a good learning theoretic analysis has yet to be provided for this, whether in a PAC or a probabilistic identification in the limit framework. In this paper we suggest a theoretical analysis of indirect negative evidence that allows the presence of ungrammatical strings in the input and also accounts for the relationship between grammaticality/acceptability and probability. Given independently justified assumptions about lower bounds on the probabilities of grammatical strings, we establish that a limited number of membership queries of some strings can be probabilistically simulated.

[CEH09] 
Alexander Clark, Rémi Eyraud, and Amaury Habrard.
A note on contextual binary feature grammars.
In EACL 2009 workshop on Computational Linguistic Aspects of
Grammatical Inference, Athens, Greece, 2009. ACL.
[ bib 
.pdf ]
Contextual Binary Feature Grammars were recently proposed by (Clark et al., 2008) as a learnable representation for richly structured contextfree and context sensitive languages. In this paper we examine the representational power of the formalism, its relationship to other standard formalisms and language classes, and its appropriateness for modelling natural language.

[Cla09] 
Alexander Clark.
A learnable representation for syntax using residuated lattices.
In Proceedings of the 14th Conference on Formal Grammar,
Bordeaux, France, 2009.
[ bib 
.pdf ]
We propose a representation for natural language syntax based on the theory of residuated lattices: in particular on the Galois lattice between contexts and substrings, which we call the syntactic concept lattice. The natural representation derived from this is a richly structured context sensitive formalism that can be learned using a generalisation of distributional learning. In this paper we define the basic algebraic properties of the syntactic concept lattice, together with a representation derived from this lattice and discuss the generative power of the formalism. We establish some basic results which show that these representations, because they are defined language theoretically, can be inferred from information about the set of grammatical strings of the language. We also discuss the relation to other grammatical formalisms notably categorial grammar and context free grammars. We claim that this lattice based formalism is plausibly both learnable from evidence about the grammatical strings of a language and may be powerful enough to represent natural languages, and thus presents a potential solution to the central problem of theoretical linguistics.

[CEH08] 
Alexander Clark, Rémi Eyraud, and Amaury Habrard.
A polynomial algorithm for the inference of context free languages.
In Proceedings of International Colloquium on Grammatical
Inference, pages 2942. SpringerVerlag, September 2008.
[ bib 
.pdf ]
We present a polynomial algorithm for the inductive inference of a large class of context free languages, that includes all regular languages. The algorithm uses a representation which we call Binary Feature Grammars based on a set of features, capable of representing richly structured context free languages as well as some context sensitive languages. More precisely, we focus on a particular case of this representation where the features correspond to contexts appearing in the language. Using the paradigm of positive data and a membership oracle, we can establish that all context free languages that satisfy two constraints on the context distributions can be identified in the limit by this approach. The polynomial time algorithm we propose is based on a generalisation of distributional learning and uses the lattice of context occurrences. The formalism and the algorithm seem well suited to natural language and in particular to the modelling of first language acquisition.

[CCM08]  Alexander Clark, François Coste, and Laurent Miclet, editors. Grammatical Inference: Algorithms and Applications, 9th International Colloquium, ICGI 2008, SaintMalo, France, September 2224, 2008, Proceedings, volume 5278 of Lecture Notes in Computer Science. Springer, 2008. [ bib  http ] 
[CW08] 
Alexander Clark and Chris Watkins.
Some alternatives to Parikh matrices using string kernels.
Fundamenta Informaticae, 84(34):291303, 2008.
[ bib ]
We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices, which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of realvalued matrices under multiplication: one is based on the subsequence kernel and the other on the gapweighted string kernel. For the latter kernel we demonstrate that for many values of the gapweight hyperparameter the resulting morphism is injective.

[CE07] 
Alexander Clark and Rémi Eyraud.
Polynomial identification in the limit of substitutable contextfree
languages.
Journal of Machine Learning Research, 8:17251745, August
2007.
[ bib 
.pdf ]
This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of contextfree languages. We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. It transpires that it is not necessary to identify constituents in order to learn a contextfree language  it is sufficient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a contextfree grammar. We also discuss modifications to the algorithm that produces a reduction system rather than a contextfree grammar, that will be much more compact. We discuss the relationship to Angluin's notion of reversibility for regular languages. We also demonstrate that an implementation of this algorithm is capable of learning a classic example of structure dependent syntax in English: this constitutes a refutation of an argument that has been used in support of nativist theories of language.

[Cla07] 
Alexander Clark.
Learning deterministic context free grammars: the Omphalos
competition.
Machine Learning, 66(1):93110, January 2007.
[ bib 
.pdf ]
This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a contextfree grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Nonterminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.

[CCFW06] 
Alexander Clark, Christophe Costa Florêncio, and Chris Watkins.
Languages as hyperplanes: grammatical inference with string kernels.
In Proceedings of the European Conference on Machine Learning
(ECML), pages 90101, 2006.
[ bib 
.pdf ]
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context sensitive languages can be represented in this way and it is possible to efficiently learn these using kernel PCA. We present some experiments demonstrating the effectiveness of this approach on some standard examples of context sensitive languages using small synthetic data sets.

[Cla06a] 
Alexander Clark.
Large scale inference of deterministic transductions: Tenjinno
problem 1.
In Proceedings of the 8th International Colloquium on
Grammatical Inference (ICGI), pages 227239, 2006.
[ bib 
.pdf ]
We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.

[Cla06b] 
Alexander Clark.
PAClearning unambiguous NTS languages.
In Proceedings of the 8th International Colloquium on
Grammatical Inference (ICGI), pages 5971, 2006.
[ bib 
.pdf ]
Nonterminally separated (NTS) languages are a subclass of deterministic context free languages where there is a stable relationship between the substrings of the language and the nonterminals of the grammar. We show that when the distribution of samples is generated by a PCFG, based on the same grammar as the target language, the class of unambiguous NTS languages is PAClearnable from positive data alone, with polynomial bounds on data and computation.

[CCFWS06] 
Alexander Clark, Christophe Costa Florêncio, Chris Watkins, and Mariette
Serayet.
Planar languages and learnability.
In Proceedings of the 8th International Colloquium on
Grammatical Inference (ICGI), pages 148160, 2006.
[ bib 
.pdf ]
Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the preimage of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernelbased learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting contextsensitive languages. In particular we show that the crossserial dependencies in Swiss German, that established the noncontextfreeness of natural language, are learnable using a standard kernel. We demonstrate the polynomialtime identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.

[CE06a] 
Alexander Clark and Remi Eyraud.
Learning auxiliary fronting with grammatical inference.
In Proceedings of CoNLL, pages 125132, New York, 2006.
[ bib 
.pdf ]
We present a simple contextfree grammatical inference algorithm, and prove that it is capable of learning an interesting subclass of contextfree languages. We also demonstrate that an implementation of this algorithm is capable of learning auxiliary fronting in polar interrogatives (AFIPI) in English. This has been one of the most important test cases in language acquisition over the last few decades. We demonstrate that learning can proceed even in the complete absence of examples of particular constructions, and thus that debates about the frequency of occurrence of such constructions are irrelevant. We discuss the implications of this on the type of innate learning biases that must be hypothesized to explain first language acquisition.

[GCA06b] 
Maria Georgescul, Alexander Clark, and Susan Armstrong.
Word distributions for thematic segmentation in a support vector
machine approach.
In Proceedings of CoNLL, pages 101108, New York, 2006.
[ bib 
.pdf ]
We investigate the appropriateness of using a technique based on support vector machines for identifying thematic structure of text streams. The thematic segmentation task is modeled as a binaryclassification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Experiments are conducted with this approach by using features based on word distributions through text. We provide empirical evidence that our approach is robust, by showing good performance on three different data sets. In particular, substantial improvement is obtained over previously published results of worddistribution based systems when evaluation is done on a corpus of recorded and transcribed multiparty dialogs.

[GCA06a] 
Maria Georgescul, Alexander Clark, and Susan Armstrong.
An analysis of methodological and quantitative aspects in the
evaluation of thematic segmentation algorithms.
In Proceedings of SIGDIAL, Sydney, Australia, 2006.
[ bib 
.pdf ]
We consider here the task of linear thematic segmentation of text documents, by using features based on word distributions in the text. For this task, a typical and often implicit assumption in previous studies is that a document has just one topic and therefore many algorithms have been tested and have shown encouraging results on artificial data sets, generated by putting together parts of different documents. We show that evaluation on synthetic data is potentially misleading and fails to give an accurate evaluation of the performance on real data. Moreover, we provide a critical review of existing evaluation metrics in the literature and we propose an improved evaluation metric.

[CE06b] 
Alexander Clark and Remi Eyraud.
Learning auxiliary fronting with grammatical inference.
In Proceedings of the 28th Annual meeting of the Cognitive
Science Society, Vancouver, Canada, 2006.
This paper is closely related to the paper in CONLL, but is more
directed to a cognitive science audience.
[ bib 
.pdf ]
We present a simple contextfree grammatical inference algorithm, and prove that it is capable of learning an interesting subclass of contextfree languages. We also demonstrate that an implementation of this algorithm is capable of learning auxiliary fronting in polar interrogatives (AFIPI) in English. This has been one of the most important test cases in language acquisition over the last few decades. We demonstrate that learning can proceed even in the complete absence of examples of particular constructions, and thus that debates about the frequency of occurrence of such constructions are irrelevant. We discuss the implications of this on the type of innate learning biases that must be hypothesized to explain first language acquisition.

[SCCX05]  William Gregory Sakas, Alexander Clark, James Cussens, and Aris Xanthos, editors. Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005. [ bib ] 
[CE05] 
Alexander Clark and Remi Eyraud.
Identification in the limit of substitutable context free languages.
In Sanjay Jain, Hans Ulrich Simon, and Etsuji Tomita, editors,
Proceedings of The 16th International Conference on Algorithmic Learning
Theory, pages 283296. SpringerVerlag, 2005.
[ bib 
.pdf ]
This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of contextfree grammars. We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. We also discuss modifications to the algorithm that produces a reduction system rather than a contextfree grammar, that will be much more compact. We discuss the relationship to Angluin's notion of reversibility for regular languages.

[Cla04] 
Alexander Clark.
Grammatical inference and first language acquisition.
In Workshop on Psychocomputational Models of Human Language
Acquisition, Geneva, Switzerland, August 2004.
[ bib 
.pdf ]
One argument for parametric models of language has been learnability in the context of first language acquisition. The claim is made that “logical” arguments from learnability theory require nontrivial constraints on the class of languages. Initial formalisations of the problem (Gold, 1967) are however inapplicable to this particular situation. In this paper we construct an appropriate formalisation of the problem using a modern vocabulary drawn from statistical learning theory and grammatical inference and looking in detail at the relevant empirical facts. We claim that a variant of the Probably Approximately Correct (PAC) learning framework (Valiant, 984) with positive samples only, modified so it is not completely distribution free is the appropriate choice. Some negative results derived from cryptographic problems appear to apply in this situation but the existence of algorithms with provably good performance and subsequent work, shows how these negative results are not as strong as they initially appear, and that recent algorithms for learning regular languages partially satisfy our criteria. We then discuss the applicability of these results to parametric and nonparametric models.

[LCL04]  Jey Han Lau, Alexander Clark, and Shalom Lappin. Measuring gradience in speakers’ grammaticality judgements. In Proceedings of the 36th Annual Conference of the Cognitive Science Society, Quebec City, July 2004. [ bib ] 
[CT04a] 
Alexander Clark and Franck Thollard.
PAClearnability of probabilistic deterministic finite state
automata.
Journal of Machine Learning Research, 5:473497, May 2004.
[ bib 
.pdf ]
We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAClearning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a bound on the distinguishability between states. With this, we demonstrate that the class of PDFAs is PAClearnable using a variant of a standard statemerging algorithm and the KullbackLeibler divergence as error function.

[CT04b] 
Alexander Clark and Franck Thollard.
Partially distributionfree learning of regular languages from
positive samples.
In Proceedings of COLING, pages 8591, Geneva, Switzerland,
2004.
[ bib 
.pdf ]
Regular languages are widely used in NLP today in spite of their shortcomings. Efficient algorithms that can reliably learn these languages, and which must in realistic applications only use positive samples, are necessary. These languages are not learnable under traditional distribution free criteria. We claim that an appropriate learning framework is PAC learning where the distributions are constrained to be generated by a class of stochastic automata with support equal to the target concept. We discuss how this is related to other learning paradigms. We then present a simple learning algorithm for regular languages, and a selfcontained proof that it learns according to this partially distribution free criterion.

[Cla03b] 
Alexander Clark.
Preprocessing very noisy text.
In Proceedings of Workshop on Shallow Processing of Large
Corpora, Corpus Linguistics, Lancaster, 2003.
[ bib 
.pdf ]
Existing techniques for tokenisation and sentence boundary identification are extremely accurate whenthe data is perfectly clean (Mikheev, 2002), and have been applied successfully to corpora of news feeds and other postedited corpora. Informal written texts are readily available, and with the growth ofother informal text modalities (IRC, ICQ, SMS etc.) are becoming an interesting alternative, perhaps better suited as a source for lexical resources and language models for studies of dialogue andspontaneous speech. However, the high degree of spelling errors, irregularities and idiosyncrasies in the use of punctuation, white space and capitalisation require specialised tools. In this paper we study thedesign and implementation of a tool for preprocessing and normalisation of noisy corpora. We argue that rather than having separate tools for tokenisation, segmentation and spelling correction organisedin a pipeline, a unified tool is appropriate because of certain specific sorts of errors. We describe how a noisy channel model can be used at the character level to perform this. We describe how the sequenceof tokens needs to be divided into various types depending on their characteristics, and also how the modelling of whitespace needs to be conditioned on the type of the preceding and following tokens.We use trainable stochastic transducers to model typographical errors, and other orthographic changes and a variety of sequence models for white space and the different sorts of tokens. We discuss thetraining of the models and various efficiency issues related to the decoding algorithm, and illustrate this with examples from a 100 million word corpus of Usenet news.

[Cla03a]  Alexander Clark. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics (EACL), pages 5966, 2003. [ bib  .pdf  .ps ] 
[ACC^{+}03] 
S. Armstrong, A. Clark, G. Coray, M. Georgescul, V. Pallotta, A. PopescuBelis,
D. Portabella, M. Rajman, and M. Starlander.
Natural language queries on natural language data: a database of
meeting dialogues.
In 8th International Conference on Applications of Natural
Language to Information Systems, Burg/Cottbus, Germany, 2003.
[ bib ]
This paper describes an integrated system that enables the storage and retrieval of meeting transcripts (e.g. staff meetings). The system gives users who have not attended a meeting, or who want to review a particular point, enhanced access to an annotated version of the recorded data. This paper describes the various stages in the processing, storage and query of the data. First, we put forward the idea of shallow dialogue processing, in order to extract significant features of the meeting transcriptions for storage in a database, whose structure is briefly outlined. Lowlevel access to the database is provided as a Web service, which can be connected to several interfaces. A description of how multimodal input can be used with VoiceXML is also provided, thus offering an easy solution for voice and web based access to the dialogue data. The paper ends with considerations about the available data and its use in the current version of the system.

[Cla02] 
Alexander Clark.
Memorybased learning of morphology with stochastic transducers.
In Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (ACL), pages 513520, 2002.
[ bib 
.pdf ]
This paper discusses the supervised learning of morphology using stochastic transducers, trained using the ExpectationMaximization (EM) algorithm. Two approaches are presented: first, using the transducers directly to model the process, and secondly using them to define a similarity measure, related to the Fisher kernel method, and then using a MemoryBased Learning (MBL) technique. These are evaluated and compared on data sets from English, German, Slovene and Arabic.

[Cla01b]  Alexander Clark. Partially supervised learning of morphology with stochastic transducers. In Proc. of Natural Language Processing Pacific Rim Symposium, NLPRS 2001, pages 341348, Tokyo, Japan, November 2001. [ bib  .pdf ] 
[Cla01a] 
Alexander Clark.
Learning morphology with Pair Hidden Markov Models.
In Proc. of the Student Workshop at the 39th Annual Meeting of
the Association for Computational Linguistics, pages 5560, Toulouse,
France, July 2001.
[ bib 
.pdf 
.ps ]
In this paper I present a novel Machine Learning approach to the acquisition of stochastic string transductions based on Pair Hidden Markov Models (PHMMs), a model used in computational biology. I show how these models can be used to learn morphological processes in a variety of languages, including English, German and Arabic. Previous techniques for learning morphology have been restricted to languages with essentially concatenative morphology.

[Cla01c]  Alexander Clark. Unsupervised induction of stochastic context free grammars with distributional clustering. In Proc. of Conference on Computational Natural Language Learning, pages 105112, Toulouse, France, July 2001. [ bib  .ps  .pdf ] 
[Cla01d]  Alexander Clark. Unsupervised Language Acquisition: Theory and Practice. PhD thesis, COGS, University of Sussex, 2001. [ bib  .pdf ] 
[Cla00]  Alexander Clark. Inducing syntactic categories by context distribution clustering. In Proc. of Conference on Computational Natural Language Learning, pages 9194, Lisbon, Portugal, 2000. [ bib  .ps  .pdf ] 
[Cla98]  Alexander Clark. Random fields in nlp. M. Sc. dissertation, COGS, University of Sussex, 1998. [ bib  .ps.gz ] 
This file was generated by bibtex2html 1.96.