Centre for Software Language Engineering, Royal Holloway, University of London

This is a brief, informal overview of the GLL algorithm, assuming a familiarity with context free grammars and parsing. It is designed as a reference for use when reading example parsers. Fill in the detail by reading the full published material, an introduction to the GLL recognition algorithm can be found here.

In a GLL parser the parse functions associated with a traditional
recursive descent (RD) parser are replaced with algorithm line labels,
goto statements and an explicit stack. For each
nonterminal *A* there is a labelled
block of code corresponding to each alternate of
*A* and a return label for each position immediately after each
nonterminal instance in the context free grammar.
Think of a GLL parser as traversing the grammar using the input string.
At each step of the traversal,
(i) if the grammar pointer is immediately before a terminal we attempt to
match it to the current input symbol; (ii) if it is immediately before a
nonterminal *B* then the pointer moves to the start of the block of code
associated with *B* (equivalent to a function call in an RD parser);
(iii) if it is at the end of an alternate of *A* then
it moves to the position immediately after the instance of *A* from which
it came (equivalent to a function return in an RD parser).
At step (ii) the return label is pushed onto the stack and at step
(iii) the stack is popped to retrieve the return label.

In general there can be multiple grammar traversals, and multiple
corresponding stacks. These are combined into a *graph structured
stack* (GSS), the stack elements are nodes of the graph and
there is a directed edge from node *u* to node *v* if
*u* is immediately
above *v* on a stack. For a GLL parser the edges of the GSS are each
labelled with a derivation tree node. This node will be the left child
of the derivation tree node constructed when the associated subparse
is complete. The GSS is built using the function
create()
and return labels are retrieved using the function
pop().

The FIRST sets of alternates are used as guards, and a travesal continues into an alternate only if the current input symbol belongs to the guard set. The function test() performs this check.

Grammar traversals may fork at the start of a block of code for
nonterminal *A* if more than one alternate of *A*
has the current input symbol in its FIRST set.
Each of the possible traversal threads is stored in a
process descriptor, a 4-tupl
(L,c_{U},c_{I},c_{N}).
The descriptors waiting to
be continued are held in a set ** R**, and a set of all
the descriptors created is also maintained to avoid repetition.
The GLL algorithm works by removing descriptors from

A GSS is the result of merging
many stacks so a node *u* is often the top element of
several stacks. If a new child is
added to *u* after it has been
popped, then the pop has to be applied to the new child. It is not
possible, in general, to order the actions so that no pop is applied
until after all the children have been added, so this situation is
handled by keeping a record of
the pop actions that have been performed so that they can be applied to new
children if necessary. A set ** P** is used for this, elements are
added to it by

Derivation trees are constructed by a GLL parser as it traverses the
grammar. The potentially multiple trees (the underlying grammar may
be ambiguous) are combined into a *shared packed parse forest*
(SPPF) which is partially binarised to ensure that it always of at
most cubic size in the length of the input string.
The functions
getNodeT() and getNodeP()
build the SPPF.

Here is a glossary of the notation that we commonly use in our algorithm descriptions.

© Elizabeth Scott and Adrian Johnstone

Centre for Software Language Engineering, Royal Holloway, University of London