Standard grammars

Most programming language standards include formal syntax descriptions using some variant of Backus-Naur Form (BNF).

These pages are an archive of such grammars.

Perhaps oddly, real compilers rarely use the actual grammar from the standards document. Sometimes this is because the standard grammar is not LALR(1) or LL(1) and so cannot be directly used with well-known parser generator tools. Sometimes, especially in the case of languages like C++, compilers were developed in parallel with, or in advance of, the standard and so ended up with different grammars. Sometimes, people just want to do it their way.

We think this is an uncomfortable situation, given the undecidability of grammar equivalence and the level of trust we all place in the correctness of compilers. Implementing language translators is a rather error-prone task: implementors need usable grammars with clear provenance available in electronic form.

On a more mundane level, sometimes people ask 'how many states are there in the parse table for C++'. This is a pretty ill-formed kind of question because (i) there are at least six main versions of C++, (ii) there are (infinitely) many grammars for any given language, and (iii) there are a variety of parsing automata. A reasonable question is 'how many states are there in the LALR(1) automaton for the grammar in Appendix A of ISO 14882:1998?'

An added complication is that many grammars are expressed in some form of extended BNF which may need to be translated into regular BNF before parse tables can be constructed. There are many ways to do the translations: if you want an efficient YACC-generated parser then you will translate iteration into left recursion, but a left recursive grammar yields a non-terminating recursive descent parser so you would need to translate to right recursion instead if you wanted an LL(1) grammar.

Ideally, there would be a place where you could find standard grammars in their raw form, along with versions suitable for use with YACC and other parser generators and even pre-built tables for LL(1), LR(0), SLR(1), LR(1) and LALR(1) parsing laid out in an easy-to-use standard form.

Well, this is that place. We've developed a variety of tools to support this work. gramex is used to extract grammar rules from the visually formatted grammars that are used by the C, C++ and Java specifications; gramconv performs EBNF to BNF conversions and converts between various grammar formats; and gtb (the Grammar ToolBox) is used to make parse tables and produce other grammar-related objects.

Java Modula-2 Oberon Occam Pascal Verilog VHDL