Formalisation and Static Analysis of CSS3 SelectorsMatthew Hague Anthony Widjaja LinRoyal Holloway, University of London YaleNUS College, Singapore 
Abstract: Cascading Style Sheets (CSS) is the de facto language for styling a webpage, as developed and maintained by World Wide Web Consortium (W3C). Together with HTML and JavaScript, CSS has played an integral role in modern web technologies. At the heart of CSS are CSS selectors, which are formal specifications of which nodes in a given document object model (DOM) are to be selected for declarations of certain visual properties (e.g. set colour to red). Despite the apparent simplicity of CSS selectors, CSS selectors allow many complex features, e.g., tree traversals, string constraints on attributes, constraints for counting the number of children, and restricted negations. This paper describes the first precise formalisation of CSS Level 3 (CSS3) selector language, currently the stable version of the language. Unlike previous formalisations, our formalisation attempts to capture CSS3 in its entirety, as specified in the latest W3C Recommendation. We then study two fundamental static analysis problems which have found applications in optimisation of CSS files: (1) satisfiability: given a CSS selector, decide whether it can ever select a node in some DOM tree (2) intersection: given two CSS selectors, decide whether they can ever select the same node in some DOM tree. The second problem, in particular, has immediate applications in minimisation of CSS files by refactoring similar styling rules. We show that they are decidable and, in fact, NPcomplete. Furthermore, we show that both problems can be reduced in polynomialtime to satisfiability of quantifierfree Presburger formulas, for which highly optimised SMT solvers are available.
Cascading Style Sheets (CSS), together with HTML and JavaScript, has been a cornerstone language for the web since its inception in 1996. The language facilitates a clear separation between presentation and web content, and plays a pivotal role in the design of a semantically meaningful HTML.
CSS is widely perceived to be a simple language. One simply needs to define a sequence of (styling) rules, each containing a set of (node) selectors (e.g. h1, h2, which select h1 and h2 HTML elements in a given document) and a set of property declarations (e.g. fontweight: normal; and marginbottom: 10px;) that will be assigned to all selected nodes^{1}. The power of CSS comes from its powerful selectors in that they can express navigational properties over web documents, constraints for counting the number of children, string constraints for reasoning about attributes, conjunctions, and negations, to name a few. For example, the CSS selector
selects all nodes with class .moon but not class .water such that it has an ancestor with class .planet such that it is the ith child of the parent where i ≡ 1 (mod 7 ). In this respect, CSS selectors exhibit a striking similarity to XPath [1] which is a popular query language for selecting nodes or nodesets in an XML document. Both CSS and XPath are defined and maintained by World Wide Web Consortium (W3C).
Despite the apparent simplicity of the language, CSS selectors can be quite subtle. For example, let us take the latest evolution of the CSS language, i.e., CSS3 [7]. Consider the following properties over web documents:
It might come as a surprise that (1), (4), (5) are expressible as CSS selectors, while (2), (3), and (6) are not expressible as CSS selectors. [See Remark 1 for why Property (6) is not expressible.] Again, it might come as a surprise that (3) is in fact expressible in the XPath query language^{2}. This suggests that the resemblance to XPath, whose expressivity issues have received a lot of attention in databases [21, 22, 32, 33, 31], is again not as straightforward.
In recent years there have been a lot of work on static analysis and optimisation/minimisation of CSS styling rules in HTML applications, e.g., [12, 24, 6, 23, 14, 29]. Such work has been motivated by the wellknown issues of CSS bloat — typically caused by HTML plugins with generic style sheets and CSS preprocessors, to name a few — whose amount has been estimated to be around 60% on average for typical industrial applications [24]. The latest W3C Recommendation of CSS Selectors Level 3 [7] gives an informal, natural language specification (more than 30 pages in printed form), which is not easy to work with when developing static analysis and optimisation tools for CSS. For this reason, researchers have attempted to develop concise formalisations of the CSS selector language, e.g., in µcalculus [12].
Despite the amount of work in developing practical static analysis and optimisation tools for CSS, many basic foundational issues for static analysis of CSS have remained unsolved. Perhaps the simplest and the most fundamental static analysis problem for CSS concerns the decidability of satisfiability (a.k.a. nonemptiness) of CSS3 selectors: given a CSS selector, check whether it is satisfiable (i.e. the selector may match some node in some tree). If decidable, what is its computational complexity? Satisfiability often forms a basis of other static analysis problems. One such static analysis problem that is of immediate practical importance concerns the nondisjointness (a.k.a. intersection) problem for the CSS3 selectors: given two CSS selectors, decide whether they can simultaneously select the same node on some tree. Decidability and complexity for the intersection problem for CSS is currently not known. A solution to this problem is required to be able to perform CSS file minimisation [23] via the reordering and merging of styling rules (see Section 2 for an example). In particular, if two styling rules match the same node, the order of declarations in a file can determine the appearance of the document. To be able to safely reorder two rules, one must be sure that the selectors of the two rules will never match the same node in a tree. Existing solutions (e.g. [23]) only overapproximate the order dependency relation. A more precise relation would allow more CSS rules to be reordered (and therefore optimised).
Finally, we shall remark that these questions are not only of theoretical interest, but they are also useful for actual static analysis tools, which currently cannot handle the CSS selector language in its entirety (e.g. [6] handle ∼ 70% of the selectors from their benchmarks). Part of the problem seems to be that hitherto there is no formalisation of CSS3 selectors as specified in the W3C Recommendation [7]. For example, the tree logic defined in [12] can express Property (3) above, which is not expressible as a CSS3 selector, but did not incorporate constraints on attributes (i.e. strings).
In this paper we present a 1.5page formalisation of the CSS3 selector language. Unlike previous formalisations, our formalisation attempt captures CSS3 in its entirety, as specified in the latest W3C Recommendation [7]. In particular, our formalisation models CSS3 constraints on attributes (i.e. strings), which yields semantics of CSS selectors as a set of finite “data trees” (i.e. whose node labels range over an infinite set). Equipped with the new formalisation, we then study the two aforementioned fundamental static analysis problems for CSS3: (1) satisfiability (a.k.a. nonemptiness) of CSS3 selectors, and (2) nondisjointness (a.k.a. nonemptinessofintersection, or simply “intersection”) of two CSS3 selectors. We show that both problems are decidable and, in fact, NPcomplete. This is in contrast to the previous formalisation of CSS selectors as a tree logic (which do not cover CSS3 in its entirety), in which both problems would be EXPTIMEcomplete.
With a very simple set of navigational axes (i.e. up, left, but neither down nor right), it might be a surprise that CSS satisfiability is NPhard especially in the light of the polynomialtime solvability result for satisfiability for the XPath fragment^{3} that uses only this set of navigational axes, e.g., see [15]. It turns out that one cause of this NPhardness is CSS selectors’ ability to do modulo arithmetic, conjunctions, and restricted negations; these can be used to encode the satisfiability of intersections of negations of arithmetic progressions, which is known to be NPcomplete [30].
For the upper bound, we show that both problems can be reduced in polynomialtime to satisfiability of existential Presburger formulas, for which highly optimised SMT solvers are available. Our result suggests the potential application of SMT solvers for providing an efficient solution for static analysis of CSS3 selectors. Several tricky features of CSS3 that have to be handled in this reduction includes constraints for counting the number of children (which could express the existence of exponentially many children in the tree), builtin regular string matching constraints for the attribute values, uniqueness of id attributes in the tree, conjunctions, and restricted negations. A careful analysis of the language is required in pinpointing the exact computational complexity. For example, if CSS3 were to allow unrestricted unrestricted regular string matching constraints, it would be able to encode intersection of regular languages, which is PSPACEcomplete [16].
Although there has not been much work on the foundation of CSS, a lot of fundamental work exists on its cousin XPath. The first formalisation of XPath was given by Wadler [34]. Database researchers then developed several the core logical versions of the language that allows them to study expressiveness and static analysis issues in a cleaner setting, e.g., see [21, 5, 13, 32, 33, 31, 22]. To obtain a cleaner version of XPath, one often makes the assumption that there are only finitely many attribute values, which is not true for XPath in its entirety. XPath is also known to share many similarities to logics and automata over unranked trees, e.g., see [18] for a readable survey on this. Many results on the decidability and complexity of static analysis of XPath in this setting (with or without DTDs, and possibly with some navigational directions disallowed) are available, e.g., see [15, 4, 26, 10, 20, 11]. Note that the static analysis problems for CSS that we consider in this paper do not come with a schema (e.g. in DTD), which is also the case for most existing work in static analysis and optimisation/minimisation on CSS [12, 24, 6, 23, 14, 29]. As we previously mentioned, if one were to make a similar simplification (e.g. finitely many attributes, no global id constraints, and no modular counting constraints on the number of children), CSS would be an XPath fragment in the database setting with only upward and leftward navigational combinators, which is known to be polynomialtime solvable [15]. This is in contrast to our NPcompleteness for satisfiability for CSS3 selectors. The simplified setting of XML has recently been extended (e.g. see [9]) to models that allow reasoning of attribute values (from an abstract infinite domain), but this mostly results in models whose satisfiability problems exhibit high computational complexity (at least EXPTIMEhard) since they allow general comparisons of data values.
We also mention the work on automata on unranked trees (labels in the trees from a finite alphabet) that allow counting constraints on the number of children, e.g., see [28]. Such work typically allow counting constraints that are much more complex than what CSS3 allow, which is the reason for that checking emptiness for such automata is PSPACEcomplete.
Section 2 provides a CSS primer. We fix mathematical notation and terminologies that we use throughout the paper in Section 3. Section 4 provides a formalisation of CSS3 selectors. In Section 5 we provide an overview of results on satisfiability and intersection of CSS3 selectors and, in particular, proofs of the NP lower bounds. Section 6 and 7 develop the machinery for proving an NP upper bound for these problems. We conclude with future work in Section 8. See Appendix for omitted details.
We give a brief overview of CSS selectors and motivate the problems studied in this paper by means of an example. We refer the readers to [2] for a gentle and more complete introduction to CSS.
A very simple web page is given in Figure 1. In this section we describe both HTML and CSS. The page contents are given by the HTML document shown in Figure 2. In Figure 3 we give the CSS file used to style the page. Although our example uses HTML, we note that CSS can also be used to identify nodes in XML trees.
HTML documents are written in a markup language that gives a tree structure. Each node in the tree is an element e. The description of the node begins with a tag ⟨e⟩ and ends with a tag ⟨/e⟩. Nodes appearing between these tags describe the children. As can be seen in Figure 2 the root of the tree is an HTML element containing two children: a head and a body. The head element contains metadata about the page, while the body element contains the page contents. The h1 element is the heading, while the table element contains the prices. Each row of the table is a tr element, and each data item is a td element.
The opening tag of some of these elements contains additional information. E.g. id=‘‘prices’’ and class=‘‘fruit’’. These are attributes that label a node. There are many possible attributes, but the id and class attributes have a special purpose for CSS selectors, as will be seen below. Note the table row containing tomatoes has two class values: fruit and vegetable.
The CSS stylesheet given in Figure 3 is used to customise its appearance. The stylesheet is a list of rules, each having the form
where selectors is a commaseparated list of selectors and declarations is a semicolon separated list of (property) declarations. Selectors are used to identify nodes in the document tree. A rule is applied to a node if one of its selectors matches the node. Each declaration in a rule specifies how some aspect of the node is displayed.
For example, the simplest rule is the first
In this rule the selector identifies any h1 element. The declaration specifies that the background should be light green. Hence, the heading of the webpage has a light green background.
The next rule
has a selector that identifies any node whose class attribute contains “fruit”. The syntax “.” is used to specify classes. However, this is just syntactic sugar. CSS allows selectors to specify the contents of arbitrary attributes. The above rule could have been alternatively written
where ~= is an operator specifying that the attribute is a spaceseparated list of words, one of which is the value on the right. Similarly, #ι asserts that the id attribute is ι, or, equivalently, [id = ι].
The selector
demonstrates several more complex features of CSS. The first thing to notice is that the selector is split into two parts: table#prices and tr:nthchild(2n + 1). These two parts put constraints on two separate nodes. The node identified by the selector is the node matched by tr:nthchild(2n + 1). The whitespace between the two parts indicates that the second (matched) node should be a descendent of the first. CSS also allows the operators >, +, and ~ to indicate a childof, neighbourof, and siblingof relationship respectively.
The first part table#prices is a conjunction, asserting that the first node is both a table element and has the ID “prices”. The second part tr:nthchild(2n + 1) is also a conjunction. It asserts that the matched element is a tr element and satisfies :nthchild(2n + 1). The :nthchild(2n + 1) asserts that the node is the ιth child of its parent such that ι = 2n + 1 for some integer value n. [The leftmost child is the first, instead of zeroth, child of the parent node.] Thus, the stylesheet in Figure 3 renders even rows of the table with a normal font weight, and odd rows in bold.
In this section we discuss applications of the intersection problem. Consider from Figure 3. Readers familiar with CSS might be tempted to minimise the CSS in Figure 3 by replacing the first three rules with
Here, the first rule has a commaseparated list of selectors that identify which nodes should have a light green background. Merging the rules like this avoids the declaration “background: lightgreen” appearing twice, reducing the size of the file. However, such a transformation will change the appearance of the page. In particular, the tomato row will display in yellow rather than green. This is because the tomato row matches both .fruit and .vegetable. In cases when such a conflict arises, the CSS specification first uses a notion called specificity – which is a measure easily calculated from a selector’s syntax that we will not discuss here. In this case both selectors have the same specificity so the rule appearing latest in the file is used. The optimisation proposed above changes the order of the rules in the file and as a result changes the appearance. Hence, before attempting to minimise a file by merging related rules, we must first check whether selectors may overlap. Note, since industrial CSS files may contain thousands of rules, potentially millions of (pairwise) comparisons must be performed. Hence, it is important to pinpoint the precise complexity of the problem.
As usual, ℤ denotes the set of all integers. We use ℕ to denote the set {0,1,…,} of all natural numbers. Let ℕ_{>0} = ℕ ∖{0} denote the set of all positive integers. For an integer x we define x to be the absolute value of x. In the sequel, for a set S, we will use S^{*} (resp. S^{+}) to denote the set of sequences (resp. nonempty sequences) of elements from S. When the meaning is clear, if S is a singleton {s}, we will denote {s}^{*} (resp. {s}^{+}) by s^{*} (resp. s^{+}).
A tree domain is a set D ⊆ (ℕ_{>0})^{∗} of nodes that is both prefixclosed and youngersibling closed. That is ηι ∈ D implies both η ∈ D, and ηι′ ∈ D for all ι′ < ι.
A Σlabelled tree is a pair T = (D, λ) where D is a tree domain, and λ : D → Σ is a labelling function of the nodes of T.
Next we recall terminologies for relationships between nodes in trees. To avoid notational clutter, we deliberately choose notation that resembles the syntax of CSS. In the following, take η, η′ ∈ D. We write η ≫ η′ if η is a (strict) ancestor of η′, i.e., there is some η″ ∈ ℕ_{>0}^{+} such that η′ = ηη″. We write η > η′ if η is the parent of η′, i.e., there is some ι ∈ ℕ_{>0} such that η′ = η ι. We write η + η′ if η is the direct younger sibling of η′, i.e., there is some η″ and ι ∈ ℕ_{>0} such that η = η″ ι and η′ = η″ (ι − 1). We write η ∼ η′ if η is a younger sibling of η′, i.e., there is some η″ and ι, ι′ ∈ ℕ_{>0} with ι < ι′ such that η = η″ ι′ and η′ = η″ ι. Finally, є is the root node in any tree.
Recall that Presburger arithmetic is a firstorder theory for reasoning about integer linear arithmetic [17]. In the sequel, we will only use the existential fragment of the theory (a.k.a. existential Presburger arithmetic), which is wellknown to be NPcomplete [27]. Formulas θ of existential Presburger arithmetic have the form
θ, θ′ = E ∼ E  ⎪ ⎪  ∃ x . θ  ⎪ ⎪  θ ∧ θ′  ⎪ ⎪  θ ∨ θ′ 
where ∼ ∈ {=, >, ≥, <, ≤} and E are (integer) expressions E of the form a_{0} + a_{1}x_{1} + ⋯ + a_{n}x_{n} where each a_{i} is an integer (with a binary representation) and each x_{i} denotes an (integer) variable. We assume a standard semantics of Presburger arithmetic. Note, the existential operator ∃ quantifies over nonnegative integers ℕ. We will implicitly assume that all free variables are existentially quantified. Moreover, we will occasionally allow quantification over a finite set of elements. It is straightforward to encode this into a quantification over integers of a bounded range.
In this section we give a formal definition of document trees and CSS3 selectors. Our formalisation follows the latest version of W3C recommendation of CSS3 selectors [7].
In this section we introduce Documents Objects Models (DOMs), which we also refer to as document trees. A document consists of a number of elements, which in turn may have subelements as children. Each node in the tree is given a type consisting of an element name and a namespace. For example an element P in the HTML namespace is a paragraph in an HTML document. In addition each node may be labelled by attributes. Each attribute is identified by a namespace (which may differ from that of the node) and an attribute name. The attributes take string values. Finally, a node may be labelled by a number of pseudoclasses which specify properties of the node in the document. For example :enabled means that the node is enabled and the user may interact with it, while :empty indicates that the node has no children. The set of pseudoclasses is fixed by the CSS specification.
In the formal definition below we permit a possibly infinite set of namespace, element, and attribute names. This is because the document writer can use any string of characters as a name. Thus, the sets of possible names cannot be fixed to a finite set from the point of view of CSS selectors, which may be applied to any document.
We denote the set of pseudoclasses as
P =  ⎧ ⎪ ⎨ ⎪ ⎩ 
 ⎫ ⎪ ⎬ ⎪ ⎭  . 
Then, given a possibly infinite set of namespaces NS, a possibly infinite set of elements E, a possibly infinite set of attribute names A, and a possibly infinite alphabet^{4} Γ containing the special characters ␣ and  (space and dash respectively), a document tree is a Σlabelled tree (D, λ), where
Σ :=  ⎛ ⎝ 
 ⎞ ⎠  . 
Here the notation F_{fin}(NS × A, Γ^{∗}) denotes the set of partial functions from (NS × A) to Γ^{∗} whose domain is finite. In other words, each node in a document tree is labeled by a namespace, an element, a function associating a finite number of namespaceattribute pairs to an attribute value (string), and one or more of the pseudoclasses. For a function f_{A} ∈ F_{fin}(NS × A, Γ^{∗}) we say f_{A}(s, a) = ⊥ when f_{A} is undefined over s ∈ NS and a ∈ A, where ⊥ ∉ Γ^{∗} is a special undefined value.
Furthermore, we assume special attribute names class, id ∈ A that will be used to attach classes (see later) and IDs to nodes.
We will write s:e for an element e with namespace s in NS × E and s:a for an attribute a with namespace s in NS × A. For convenience, when λ(η) = (s, e, f_{A}, P) we write
λ_{S}  ⎛ ⎝  η  ⎞ ⎠  = s, λ_{E}  ⎛ ⎝  η  ⎞ ⎠  = e, λ_{A}  ⎛ ⎝  η  ⎞ ⎠  = f_{A}, λ_{P}  ⎛ ⎝  η  ⎞ ⎠  = P . 
There are several consistency constraints on the pseudoclasses labelling a node.
In the sequel, we will tacitly assume that document trees satisfy these consistency constraints.
We write Trees(NS, E, A, Γ) for the set of such trees.
Note, readers familiar with HTML may have expected clauses asserting, for example, that if an node matches :hover, then its parent and “label controls” should also match :hover. However, this is part of the HTML5 specification, not of CSS3. In fact, the CSS3 selectors specification explicitly states that a node matching :hover does not imply its parent must also match :hover.
Finally, note that CSS selectors can be applied to any document tree, e.g. XML or HTML. The semantics of CSS3 from the CSS3 selector specification [7] may allow document formats that are forbidden by the HTML5 specification.
In the following sections we define CSS syntax and semantics. In formally, as CSS selector consists of node selectors σ – which match individual nodes in the tree – combined using the operators >>, >, +, and ~. These operators express the descendantof, childof, neighbourof, and siblingof relations respectively. Note, we use slightly different syntax to their counterpart semantical operators ≫, >, +, and ∼ in order to distinguish syntax from meaning.
A node selector σ has the form τΘ where τ constrains the type of the node. That is, τ places restrictions on the namespace and element labels of the node. The rest of the selector is a set Θ of simple selectors that assert atomic properties of the node. These may take several forms.
Class and ID selectors .v and #v assert that the node has a class or ID v respectively. Note, a node may have several classes (given as a string by a spaceseparated list of classes) but only one ID.
The next type of simple selector is the attribute selectors that generally take the form [sa op v] for some namespace s, attribute a, operator op ∈ {=, ~=, =, ^=, $=, *=}, and some string v ∈ Γ^{∗}. The namespace may be omitted if the specification is agnostic about the namespace of the attribute. The operators =, ^=, $=, and *= take their meaning from regular expressions. That is, equals, beginswith, endswith, and contains respectively. The remaining operators are more subtle. The ~= operator means the attribute is a string of spaceseparated values, one of which is v. The = operator is intended for use with language identification (e.g. enGB means “English” as spoken in Great Britain). Thus = asserts that either the attribute has value v or is a string of the form vv′ where  is the dash character, and v′ is some string.
Attribute selectors may also take the form [sa] or [a] to assert that the attribute is merely defined on the node.
Next, simple selectors may specify which pseudoclasses label a node. E.g. the selector :enabled ensures the node is currently enabled in the document.
There are several further kinds of pseudoclasses. Of particular interest are selectors such as :nthchild(αn + β). These assert that the node has a particular position in the sibling order. For example :nthchild(2n + 1) means there is some n ≥ 0 such that the node is the (2n + 1)th node in the sibling order. That is, the node is at an odd position.
Finally, simple selectors may be negations :not(θ) of a simple selector θ with the condition that negations cannot be nested.
Fix NS, E, A, and Γ as in the previous section. We define Sel for the set of (CSS) selectors and NSel for the set of node selectors. The set Sel is the set of formulas ψ defined as:

where
ϕ ::= σ  ⎪ ⎪  ϕ >> σ  ⎪ ⎪  ϕ > σ  ⎪ ⎪  ϕ + σ  ⎪ ⎪  ϕ ~ σ  ⎪ ⎪ 
where σ ∈ NSel is a node selector with syntax σ ::= τ Θ with τ having the form
τ ::= *  ⎪ ⎪  (s*)  ⎪ ⎪  e  ⎪ ⎪  (se) 
where s ∈ NS and e ∈ E and Θ is a possibly empty set of conditions θ with syntax
θ ::= θ_{¬}  ⎪ ⎪  :not(σ_{¬}) 
where θ_{¬} and σ_{¬} are conditions that do not contain negation, i.e.:
σ_{¬} ::= *  ⎪ ⎪  (s*)  ⎪ ⎪  e  ⎪ ⎪  (se)  ⎪ ⎪  θ_{¬} 
and θ_{¬} =

with s ∈ NS, e ∈ E, a ∈ A, v ∈ Γ^{∗}, and α, β ∈ ℤ. Note, we will omit Θ when is it empty.
CSS selectors can also finish with a pseudoelement. For example ϕ::before. These match nodes that are not formally part of a document tree. In the case of ϕ::before the selector matches a phantom node appearing before the node matched by ϕ. These can be used to insert content into the tree for stylistic purposes. For example places a “>” symbol before the rendering of any node with class a.
We divide CSS selectors into five different types depending on the pseudoelement appearing at the end of the selector.

We are interested here in the nodes matched by a selector. The pseudoelements ::firstline, ::firstletter, ::before, and ::after essentially match nodes inserted into the DOM tree. The CCS3 specification outlines how these nodes should be created. For our purposes we only need to know that the five syntactic cases in the above grammar can never match the same inserted node, and the selectors ::firstletter and ::firstline require that the node matched by ϕ is not empty.
Since we are interested here in the nonemptiness and nonemptinessofintersection problems, we will omit pseudoelements in the sequel, under the assumptions that
In this way, we can test nonemptiness of a selector by testing its replacement. For nonemptinessofintersection, we know if two selectors end with different pseudoelements (or one does not contain a pseudoelement, and one does), their intersection is necessarily empty. Thus, to check nonemptinessofintersection, we immediately return “empty” for any two selectors ending with different pseudoelements. To check two selectors ending with the same pseudoelement, the problem reduces to testing the intersection of their replacements.
We now define the semantics of CSS3 selectors. The semantics of a selector is defined with respect to a document tree and a node in the tree.
More precisely, the semantics of CSS3 selectors ϕ is defined inductively with respect to a document tree T = (D, λ) and a node η ∈ D as follows:

and for node selectors we have

and, for the remaining selectors we have (noting vv′ is the concatenation of the strings v and v′ and vv′ is the concatenation of v and v′ with a “” in between)

with the missing attribute selector being (noting v␣v′ is the concatenation of v and v′ with the space character ␣ in between)

then, for the counting selectors

Note, we diverge from the full CSS specification in a number of places. However, we do not lose expressivity.
Having formalised CSS3 selectors, we now consider basic static analsis problems for CSS3, especially nonemptiness (a.k.a. satisfiability) and intersection (a.k.a. nondisjointness) problems. Our main result is that both problems are NPcomplete. The complexity holds whether or not numbers in CSS3 selectors are represented in unary or binary.
We give the hardness results below. Membership is shown in the following sections. We introduce CSS automata for this purpose. In Proposition 5 we show each CSS selector can be translated in polynomial time into an equivalent CSS automaton. In Lemma 10 we show nonemptiness of CSS automata can be reduced to satisfiability of existential Presburger arithmetic, which is wellknown to be in NP. This gives us membership of nonemptiness in NP. For intersection, we show in Proposition 6 that CSS automata are closed under intersection, and two automata can be intersected in polynomial time. Thus we reduce intersection in polynomial time to nonemptiness of a CSS automaton, which is in NP.
We give a polynomialtime reduction from the NPcomplete problem of nonuniversality of unions of arithmetic progressions [30, Proof of Theorem 6.1]. To define this, we first fix some notation. Given a pair (α, β) ∈ ℕ × ℕ, we define [[(α,β) ]] to be the set of natural numbers of the form α n + β for n ∈ ℕ. That is, [[(α,β) ]] represents an arithmetic progression, where α represents the period and β represents the offset. Let E ⊆ ℕ × ℕ be a finite subset of pairs (α, β). We define [[E ]] = ∪_{(α,β)∈ E} [[(α,β) ]]. The NPcomplete problem is: given E (where numbers may be represented in unary or in binary representation), is [[E ]] ≠ ℕ? Observe that this problem is equivalent to checking whether [[E+1 ]] ≠ ℕ_{>0} where E+1 is defined by adding 1 to the period β of each arithmetic progression (α,β) in E. By complementation, this last problem is equivalent to checking whether ℕ_{>0} ∖ [[E+1 ]] ≠ ∅. Since ℕ_{>0} ∖ [[E+1 ]] = ∩_{(α,β) ∈ E} [[α,β+1 ]], the problem can be seen to be equivalent to testing the nonemptiness of
*  ⎧ ⎨ ⎩  :not(:nthchild(αn + (β+1)))  
 ⎫ ⎬ ⎭  . 
Thus, nonemptiness is NPhard. ▫
Nonemptiness reduces to nonemptinessofintersection. That is, ϕ is not empty iff the intersection of ϕ and * is nonempty. From Lemma 3 we get the result. ▫
We will now develop a technique that allows us to simultaneously derive an NP upper bound for both the nonemptiness and the intersection problems for CSS3 selectors. Loosely speaking, these problems would be equivalent if CSS3 selectors were closed under intersection, which unfortunately are not due to the presence of CSS combinators in general. To this end, we follow a standard automatatheoretic approach: develop a general class of (tree) automata that subsume CSS3 selectors and are furthermore closed under intersection. We call these automata CSS automata. The result in this section shows that both nonemptiness and intersection problems for CSS3 selectors can be cast into the nonemptiness problem for CSS automata.
A CSS automaton navigates a tree structure in a similar manner to a CSS selector. That is, transitions may only move down the tree, or to a sibling. Furthermore, at each transition, a number of properties can be checked.
The reason we use automata as an intermediate step is that the automata navigate the tree in a uniform fashion. A direct intersection of two selectors is nontrivial since selectors may descend to different child nodes, and then meet again after the application of a number of sibling combinators. Thus their paths may diverge and combine several times. The automata are designed to always navigate to the first child, and then move from sibling to sibling. Thus, the intersection of CSS automata can be done with a straightforward product construction.
Intuitively, the modalities ↓, →, →_{+}, and ∘ perform the following operations. The ↓ operator moves to the first child of the current node. The → operator moves to the next sibling of the current node. Finally, the →_{+} operator moves an arbitrary number of siblings to the right.
Note, we only allow self loops in the automata, since CSS does not have loops. Loops in the automata are used to skip over nodes that are not matched by a node selector in the full selector (e.g. .v ~ .v′ matches two sibling nodes with classes v and v′ respectively; loops will be used to skip over all nodes appearing between these two matched nodes). We do not allow → to label a loop – this is for the purposes of the NP proof: a sequence of transitions over a loop → can be more succinctly represented as a single loop transition labelled →_{+}.
An astute reader may complain that →_{+} does not need to appear on a loop since it can already pass over an arbitrary number of nodes. However, the product construction used for intersection becomes easier if →_{+} appears only on loops. This is because a single →_{+} transition in one automaton may correspond to several transitions of the other. Hence, it is convenient to be able to use the →_{+} transition several times instead of splitting a single transition into several parts.
There is no analogue of →_{+} for ↓ because →_{+} is needed to handle selectors such as :nthchild(αn + β) which enforce the existence of a number of preceding children. There is no such CSS selector to count the depth of the tree. Finally ∘ marks the node matched by the selector.
We write q –[σ, d]→ q′ to denote a transition (q, d, σ, q′) ∈ Δ.
A document tree T = (D, λ) and node η ∈ D is accepted by a CSS automaton A, written (T, η) ∈ L(A), if there exists a sequence
q_{0}, η_{0}, q_{1}, η_{1}, …, q_{ℓ}, η_{ℓ}, q_{ℓ+1} ∈  ⎛ ⎝  Q × D  ⎞ ⎠  ^{∗}×  ⎧ ⎨ ⎩  q_{f}  ⎫ ⎬ ⎭ 
such that
For each CSS selector ϕ we can construct a CSS automaton A_{ϕ} such that for all trees T
T, η ⊨ ϕ ⇔  ⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{ϕ}  ⎞ ⎠  . 
Given a CSS selector ϕ, we define A_{ϕ} as follows. We can write ϕ uniquely in the form σ_{1} o_{1} σ_{2} o_{2} ⋯ o_{n−1} σ_{n} where each σ_{i} is a node selector, and each o_{i}∈ {>>, >, +, ~}. We define
A_{ϕ} =  ⎛ ⎝  Q, E, Δ, q^{in}, q_{f}  ⎞ ⎠ 
where

and we define the transition relation Δ to contain the following transitions, which are used to navigate from the root of the tree to the node matched by σ_{1}, and to read the final node matched by σ_{n} (and the selector as a whole) respectively.
∘_{1} –[*, ↓, →_{+}]→ ∘_{1} and ∘_{n} –[σ_{n}, ∘]→ q_{f} 
and for each 1 ≤ i < n,
We prove the following in Lemma 11 (soundness) and Lemma 12 (completeness) in Appendix A.1.
Intersection of CSS automata is a standard product construction where →_{+} can match a sequence of → transitions. We first define the intersection of two node selectors. Note node selectors are of the form τ Θ where τ ∈ { *, (s*), e, (se)  s ∈ NS ∧ e ∈ E }. Hence, letting Θ = Θ_{1} ∪ Θ_{2}. we define
τ_{1} Θ_{1} ⋂ τ_{2} Θ_{2} = 

We give the product construction required for intersection below. Note that the synchronisation of a transition labelled → with a transition labelled →_{+} relies on →_{+} only labelling loops^{7}.
A_{1} ⋂ A_{2} =  ⎛ ⎝ 
 ⎞ ⎠ 

The following proposition asserts correctness of the above construction, which we prove in Appendix A.2.
⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{1}  ⎞ ⎠  ∧  ⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{2}  ⎞ ⎠  ⇔  ⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{1} ⋂ A_{2}  ⎞ ⎠  . 
We show nonemptiness of a CSS automaton can be decided in NP by a polynomialtime reduction to existential Presburger arithmetic. That is, given a CSS automaton A, our algorithm constructs an existential Presburger formula θ_{A} such that A accepts a nonempty language iff θ_{A} is satisfiable. Our existential Presburger encoding relies on the fact that any loop in the automaton is a self loop that only needs to be taken at most once. In the case of loops labelled ↓, a CSS formula cannot track the depth in the tree, so repeated uses of the loop will only introduce redundant nodes. For loops labelled →_{+}, it may be that selectors such as :nthchild(αn + β) enforce the existence of a number of intermediate nodes. However, since →_{+} can cross several nodes, such loops also only needs to be taken once. Hence, each transition only needs to appear once in an accepting run. That is, if there is an accepting run of a CSS automaton with n transitions, there is also an accepting run of length at most n.
For the remainder of the section, we fix a CSS automaton A = (Q, Δ, q^{in}, q_{f}) and show how to construct the formula θ_{A} in polynomialtime.
Before giving the details of the reduction to existential Presburger, we first show that the number of namespaces and elements can be bounded linearly in the size of the automaton.
We write
Let {τ_{1}, …, τ_{Aσ}} be a set of fresh namespaced elements and
⇃  ⎛ ⎝  E_{A}  ⎞ ⎠  = E_{A} ⊎  ⎧ ⎨ ⎩  τ_{1}, …, τ 
 ⎫ ⎬ ⎭  ⊎  ⎧ ⎨ ⎩  ⊥  ⎫ ⎬ ⎭ 
where there is a bijection θ : S_{A} → {τ_{1}, …, τ_{Aσ}} such that for each t ∈ S_{A} we have θ(t) = τ and
We prove the following proposition in Appendix B.1. Intuitively, since only a finite number of nodes can be directly inspected by a CSS automaton, all others can be relabelled to a dummy type unless their type matches one of the inspected nodes.
Thus, we can define bounded sets of namespaces and elements

When encoding nonemptiness checks we will need to encode the satisfiability of conjunctions of attribute selectors. We show here how to perform this encoding. First we collect attribute selectors that apply to individual attributes. Next, we argue that if a collection of selectors can be satisfied by an attribute value, then there is a value of polynomially bounded length. Using this bound, we can encode the selectors as an existential Presburger formula.
We begin by arguing the existence of a polynomial bound for the solutions to any finite set C of constraints of the form [sa op v] or :not([sa op v]), for some fixed s and a. We say that C is a set of constraints over s and a.
In fact, the situation is a little more complicated because it may be the case that a is id. In this case we need to be able to enforce a global uniqueness constraint on the attribute values. Thus, for constraints on an ID attribute, we need a bound that is large enough to allow to all constraints on the same ID appearing throughout the automaton to be satisfied by unique values. Thus, for a given automaton, we might ask for a bound N such that if there exists unique ID values for each transition, then there exist values of length bounded by N.
However, the bound on the length must still work when we account for the fact that not all transitions in the automaton will be used during a run. Consider the following illustrative example.
In this case we have two transitions with ID constraints, and hence two sets of constraints C_{1} = C_{2} = {[sid = v]}. Since these two sets of constraints cannot be satisfied simultaneously with unique values, even the bound N = 0 will satisfy our naive formulation of the required property (since the property had the existence of a solution as an antecedent). However, it is easy to see that any run of the automaton does not use both sets of constraints, and that the bound N = v should suffice. Hence, we formulate the property of our bound to hold for all subcollections of the collection of sets of constraints appearing in the automaton.
The proof uses ideas from Muscholl and Walukiewicz’s NP fragment of LTL [25]. We first, for each set of constraints C, construct a deterministic finite word automaton A that accepts only words satisfying all constraints in C. This automaton has a polynomial number of states and can easily be seen to have a short solution by a standard pumping argument. Given automata A_{1}, …, A_{n} with at most N_{s} states and N_{c} constraints in each set of constraints, we can again use pumping to show there is a sequence of distinct words v_{1}, …, v_{n} such that each v_{i} is accepted by A_{i} and the length of v_{i} is at most n·N_{s}·N_{c}.
We define a type of word automata based on a model by Muscholl and Walukiewicz to show and NP upper bound for a variant of LTL. These automata read words and keep track of which constraints in C have been satisfied or violated. They accept once all positive constraints have been satisfied and no negative constraints have been observed.
In the following, let Prefs(C) be the set of words v′ such that v′ is a prefix of some v with [sa op v] ∈ C or :not([sa op v]) ∈ C. Moreover, let ^ and $ be characters not in Γ that will mark the beginning and end of the word respectively. Additionally, let ε denote the empty word. Finally, we write v ≼ v′ if v is a factor of v′, i.e., v′ = v_{1} v v_{2} for some v_{1} and v_{2}.
Observe that the size of the automaton A_{C} is polynomial in the size of C.
A run of A_{C} over a word with beginning and end marked a_{1}…a_{n}∈ ^ Γ^{∗}$ is
⎛ ⎝  v_{0}, S_{0}, V_{0}  ⎞ ⎠  –[a_{1}]→  ⎛ ⎝  v_{1}, S_{1}, V_{1}  ⎞ ⎠  –[a_{2}]→ ⋯ –[a_{n}]→  ⎛ ⎝  v_{n}, S_{n}, V_{n}  ⎞ ⎠ 
where v_{0} = ε and for all 1 ≤ i ≤ n we have v_{i − 1} –[a_{i}]→ v_{i} and S_{i}, V_{i}⊆ C track the satisfied and violated constraints respectively. That is S_{0} = V_{0} = ∅, and for all 1 ≤ i ≤ n we have (noting ^v ≼ v_{i} implies ^v is a prefix of v_{i}, and similar for v$) S_{i}=

and V_{i}=

Such a run is accepting if S_{n}= {[sa op v]  [sa op v] ∈ C} and V_{n}= ∅. That is, all positive constraints have been satisfied and no negative constraints have been violated.
We show the existence of short solutions via the following lemma. The proof of this lemma is a simple pumping argument which appears in Appendix B.2. Intuitively, if a satisfying word is shorter than N_{s} · N_{c} we do not change it. If it is longer than N_{s} · N_{c} any accepting run of the automaton on this word must contain a repeated (v, S, V). We can thus pump down this word to ensure that it is shorter than N_{s} · N_{c}. Then, to ensure it is unique, we pump it up to some unique length of at most n · N_{s} · N_{c}.
To obtain Lemma 8 (Bounded Attribute Values) we observe that for any subsequence C_{i1}, …, Ci_{m} we have m · N_{s}′ · N_{c}′ ≤ n · N_{s} · N_{c} since m ≤ n and the max number of states N_{s}′ and constraints N_{c}′ in the subsequence have N_{s}′ ≤ N_{s} and N_{c}′ ≤ N_{c}.
Since short witnessing strings suffice for satisfiable attribute attribute selectors, we may use quantifierfree Presburger formulas to “guess” these witnessing strings; the letter in each position in a witnessing string is encoded by a number. [In fact, boolean formulas suffice, though we shall use integer variables since they are cleaner to work with and do not yield worse computational complexity.]
In the following encoding, we use the fact that we can assume each positive attribute selector that does not specify a namespace applies to a unique, fresh, namespace. Thus, these selectors do not interact with any other positive attribute selectors. Note, these fresh namespaces do not appear in ⇃(NS).
Let op range over {=, ~=, =, ^=, $=, *=}. Let τΘ be a node selector. For each s and a, let Θ_{a}^{s} be the set of conditions in Θ of the form θ or :not(θ) where θ is of the form [sa] or [sa op v]. Recall we are encoding runs of a CSS automaton of length at most n. For a given position i in the run, we define AttsPres(τΘ, i) to be the conjunction of the following constraints, where the encoding for sets of selectors is presented in the sequel. Let Neg(s, a) = {:not([sa op v])  :not([a op v]) ∈ Θ}.
AttsPres_{s:a}  ⎛ ⎝ 
 ⎞ ⎠ 
AttsPres_{s:a}  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠ 
AttsPres_{s:a}  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠ 
It remains to encode AttsPres_{s:a}(C, i) for some set of attribute selectors C all applying to s and a.
We can obtain a polynomiallysized global bound (N − 1) on the length of any satisfying value of an attribute s:a at some position i of the run. We obtain this by, for each s and a, collecting the sets of constraints over s and a from each transition, and applying Lemma 8 (Bounded Attribute Values). We then take the maximum of all bounds obtained in this way^{8}. Finally, we increment the bound by one to allow space for a trailing null character.
Once we have a bound on the length of a satisfying value, we can introduce variables for each character position of the satisfying value, and encode the constraints almost directly. That is, letting θ range over positive attribute selectors, we define^{9}

where x^{→} = x_{i, 1}^{s:a}, …, x_{i, N}^{s:a} will be existentially quantified later in the encoding and whose values will range^{10} over Γ ⊎ {0} where 0 is a null character used to pad the suffix of each word. We define AttsPres(θ, x^{→}) for several θ, the rest can be defined in the same way (see Appendix B.3). Letting v = a_{1}…a_{m},

Finally, we enforce correct use of the null character
Nulls  ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  = 

 x_{i, j′}^{s:a} = 0 . 
We encode nonemptiness using the following variables for 0 ≤ i ≤ n, representing the node at the ith step of the run:
Note, we do not need a variable for :root since it necessarily holds uniquely at the 0th position of the run.
Before we give the reduction for CSS automaton we first show how to negate a formula of the form ∃ n . x = α n + β where x is a variable, and α and β are constants, while still remaining inside existential Presburger arithmetic. This is not difficult, but one has to take care since the numbers α and β are represented in binary. Then we show how to construct an existential Presburger formula from a node selector, and a position i.
We need formulas to negate selectors like :nthchild(αn + β), For completeness, we give the definition of the negation below. We prove its correctness in Appendix B.4.
We decompose β according to the period α. I.e. β = α β_{1} + β_{2}, where β_{1} and β_{2} are the unique integers such that β_{2} < α and β_{1} α < 0 implies β_{2} ≤ 0 and β_{1} α > 0 implies β_{2} ≥ 0.

We define the encoding of node selectors below. This encoding uses the variables defined in the previous section. Note, this translation is not correct in isolation: global constraints such as “no ID appears twice in the tree” will be enforced later.
In the following, let NoAtts(Θ) be Θ less all selectors of the form [sa], [sa op v], [a], or [a op v], or :not([sa]), :not([sa op v]), :not([a]), or :not([a op v]).
Pres  ⎛ ⎝  τΘ, i  ⎞ ⎠  =  ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 


Pres  ⎛ ⎝  :nthchild(αn + β), i  ⎞ ⎠  = ∃ 
 . 
 _{i} = α 
 + β 
Pres  ⎛ ⎝  :nthlastchild(αn + β), i  ⎞ ⎠  = ∃ 
 . 
 _{i} = α 
 + β 
Pres  ⎛ ⎝  :nthoftype(αn + β), i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
Pres  ⎛ ⎝  :nthlastoftype(αn + β), i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
Pres  ⎛ ⎝  :onlychild, i  ⎞ ⎠  = 
 _{i} = 1 ∧ 
 _{i} = 1 
Pres  ⎛ ⎝  :onlyoftype, i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 


Pres_{¬}  ⎛ ⎝  :nthchild(αn + β), i  ⎞ ⎠  = NoMatch  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠ 
Pres_{¬}  ⎛ ⎝  :nthlastchild(αn + β), i  ⎞ ⎠  = NoMatch  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠ 
Pres_{¬}  ⎛ ⎝  :nthoftype(αn + β), i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
Pres_{¬}  ⎛ ⎝  :nthlastoftype(αn + β), i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
Pres_{¬}  ⎛ ⎝  :onlychild, i  ⎞ ⎠  = 
 _{i} > 1 ∨ 
 _{i} > 1 
Pres_{¬}  ⎛ ⎝  :onlyoftype, i  ⎞ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠  . 
Since we know, if there is an accepting run, that there is a run of length at most n where n is the number of transitions in Δ, we encode the possibility of an accepting run using the variables discussed above for all 0 ≤ i ≤ n. We encode nonemptiness of a CSS automaton using the encoding below.
θ_{A} =  ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
Intuitively, the first two conjuncts asserts that a final state is reached from an initial state. Next, we use Tran(i) to encodes a single step of the transition relation, or allows the run to finish early. Finally Consistent asserts consistency constraints.
We define Tran(i) = ∨_{t ∈ Δ} Tran(i, t) where Tran(i, t) is defined below by cases. To ease presentation, we write s_{i}:e_{i} = s:e as shorthand for ( s_{i} = s ∧ e_{i} = e ) and s_{i}:e_{i} ≠ s:e as shorthand for ( s_{i} ≠ s ∨ e_{i} ≠ e ).



⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  ∧  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  ∧ Pres  ⎛ ⎝  σ, i  ⎞ ⎠  . 
The consistency constraint on the run asserts
Consistent = Consistent_{n} ∧ Consistent_{i} ∧ Consistent_{p} 
where each conjunct is defined below.
 ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  ∧  ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  . 

 x_{i, j}^{s:id} ≠ x_{i′, j}^{s:id} . 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠  . 
See Lemma 14 and Lemma 15 in Appendix B.5. ▫
This paper presents a formalisation of CSS3 selectors and fundamental results concerning static analysis of CSS3 selectors. In particular, we chose to study the problems of satisfiability and disjointness of CSS3 selectors since they are arguably the most basic static analysis problems for CSS3 selectors and they have found applications in CSS optimisation. To the best of our knowledge, we are the first to successfully argue that these static analysis problems are decidable and, in fact, NPcomplete. Our reduction to existential Presburger formulas (for which highly optimised SMT solvers are available) could potentially lead to efficient solutions to these static analysis problems that can handle CSS3 in its entirety. We leave this implementation task for future work.
Another future research avenue is to study other fundamental static analysis and optimisation problems for CSS3. One such problem is the problem of inclusionchecking for CSS selectors (which have found applications in CSS optimisation; see [6]). It is possible to use the technique developed in this paper to show that the problem is decidable, though it is far from obvious what the actual computational complexity is. Finally, since CSS is an evolving language (e.g. see [8] for the latest draft for the currently unstable CSS4, but seems to be substantially more expressive than CSS3), we believe that academics could play an important role in providing constructive feedback on the expressiveness of the language. Our formalisation of CSS3 is a starting point. We leave it for future work to address this issue in more depth.
We show both soundness and completeness of A_{ϕ}.
Suppose (T, η) ∈ L(A_{ϕ}). By construction of A_{ϕ} we know that the accepting run must pass through all states ∘_{1}, …, ∘_{n} where ϕ = σ_{1} o_{1} ⋯ o_{n−1} σ_{n}. Notice, in order to exit each state ∘_{i} a transition labelled by σ_{i} must be taken. Let η_{i} we the node read by this transition, which necessarily satisfies σ_{i}. Observe η_{n}= η. We proceed by induction. We have η_{1} satisfies σ_{1}. Hence, assume η_{i} satisfies σ_{1} o_{1} ⋯ o_{i−1} σ_{i}. We show η_{i+1} satisfies σ_{1} o_{1} ⋯ o_{i} σ_{i+1}.
We case split on o_{i}.
Thus, by induction, η_{n}= η satisfies σ_{1} o_{1} ⋯ o_{n−1} σ_{n}= ϕ. ▫
Assume T, η ⊨ ϕ. Thus, since ϕ = σ_{1} o_{1} ⋯ o_{n−1} σ_{n}, we have a sequence of nodes η_{1}, …, η_{n} such that for each i we have T, η_{i}⊨ σ_{1} o_{1} ⋯ o_{i−1} σ_{i}. Note η_{n}= η. We build a run of A_{ϕ} from σ_{1} to ∘_{i} by induction. When i = 1 we have the run constructed by taking the loops on the initial state ∘_{1} labelled ↓ and →_{+} to navigate to η_{1}. Assume we have a run to ∘_{i}. We build a run to ∘_{i+1} we consider o_{i}.
Thus, by induction, we construct a run to η_{n} ending in state ∘_{n}. We transform this to an accepting run by taking the transition ∘_{n} –[σ_{n}, ∘]→ q_{f}, using the fact that η_{n} satisfies σ_{n}. ▫
We show that
⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{1}  ⎞ ⎠  ∧  ⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{2}  ⎞ ⎠  ⇔  ⎛ ⎝  T, η  ⎞ ⎠  ∈ L  ⎛ ⎝  A_{1} ⋂ A_{2}  ⎞ ⎠  . 
We begin by observing that that all runs of a CSS automaton showing acceptance of a node η in T must follow a sequence of nodes η_{1}, …, η_{n} such that
that defines the path taken by the automaton. Each node is “read” by some transition on each run. Note a transition labelled →_{+} may read sequence nodes that is a factor of the path above. However, since these transitions are loops that do not check the nodes, without loss of generality we can assume each →_{+} in fact reads only a single node. That is, →_{+} behaves like →. Recall, →_{+} was only introduced to ensure the existence of “short” runs.
Because of the above, any two runs accepting η in T must follow the same sequence of nodes and be of the same length.
We have (T, η) ∈ L(A_{1}) ∧ (T, η) ∈ L(A_{2}) iff there are accepting runs
q_{1}^{i} –[σ_{1}^{i}, d_{1}^{i}]→ ⋯ –[σ_{n}^{i}, d_{n}^{i}]→ q_{n+1}^{i} 
of A_{i} over T reaching node η for both i ∈ {1,2}. We argue these two runs exist iff we have a run
⎛ ⎝  q_{1}^{1}, q_{1}^{2}  ⎞ ⎠  –[σ_{1}, d_{1}]→ ⋯ –[σ_{n}, d_{n}]→  ⎛ ⎝  q_{n+1}^{1}, q_{n+1}^{2}  ⎞ ⎠ 
of A_{1} ∩ A_{2} where each d_{j} and σ_{j} depends on (d_{j}^{1}, d_{j}^{2}).
The existence of the transitions comes from the definition of A_{1} ∩ A_{2}. We have to argue that η_{j} satisfies both σ_{j}^{i} iff it also satisfies σ_{j}. By observing σ ∩ * = * ∩ σ = σ we always have σ_{j}= σ_{j}^{1}∩ σ_{j}^{2}.
Let σ_{j}^{i}= τ_{i}Θ_{i} and σ_{j}= τΘ. It is immediate that η_{j} satisfies Θ = Θ_{1} ∪ Θ_{2} iff it satisfies both Θ_{i}.
To complete the proof we need to show η_{j} satisfies τ iff it satisfies both τ_{i}. Note, we must have some s and e such that τ, τ_{1}, τ_{2} ∈ {*,(s*),(se),e} else the type selectors cannot be satisfied (either τ = :not(*) or τ_{1} and τ_{2} assert conflicting namespaces or elements).
If some τ_{i}= * the property follows by definition. Otherwise, if τ = τ_{2} then in all cases the conjunction of τ_{1} and τ_{2} is equivalent to τ_{2} and we are done. The situation is similar when τ = τ_{1}. Otherwise τ = (se) and τ_{1} = (s*) and τ_{2} = e or vice versa, and it is easy to see τ is equivalent to the intersection of τ_{1} and τ_{2}. Thus, we are done.
We show Proposition 7 (Bounded Types). That is, if some tree T is accepted A, we can define another tree T′ that also is accepted by A but only uses types in ⇃(E_{A}).
We take (T, η) ∈ L(A) with T = (D, λ) and we define T′ = (D, λ′) satisfying the proposition. Let
q_{0}, η_{0}, q_{1}, η_{1}, …, q_{ℓ}, η_{ℓ}, q_{ℓ+1} 
be the accepting run of A, by the sequence of transitions t_{0}, …, t_{ℓ}. As noted above, we can assume each transition in Δ appears only once in this sequence. Let {σ_{1}, …, σ_{Aσ}} be the set of selectors appearing in A. We perform the following modifications to λ to obtain λ′.
We obtain λ′ from λ by changing the element labelling. We first consider all 0 ≤ i ≤ ℓ such that η_{i} is labelled by some element s:e ∈ E_{A}. Let Nodes_{s:e} be the set of nodes labelled by s:e in λ. In λ′ we label all nodes in Nodes_{s:e} by s:e. That is, we do not relabel nodes labelled by s:e. Let Nodes be the union of all such Nodes_{s:e}.
Next we consider all 0 ≤ i ≤ ℓ such that η_{i}∉ Nodes (i.e. was not labelled in the previous case) and t_{i}= q_{i}–[σ, d]→ q_{i+1} with σ ≠ *. Let s:e ∉ E_{A} be the element labelling of η_{i} in λ. Moreover, take τ such that θ(t_{i}) = τ. In λ′ we label all nodes in Nodes_{s:e} (i.e. labelled by s:e) in λ by τ. That is, we globally replace s:e by τ. Let Nodes′ be Nodes union all such Nodes_{e}.
Finally, we label all nodes not in Nodes′ with the null element ⊥.
To see that
q_{0}, η_{0}, q_{1}, η_{1}, …, q_{ℓ}, η_{ℓ}, q_{ℓ+1} 
via t_{0}, …, t_{ℓ} is an accepting run of (D, λ′) we only need to show that for each t_{i}= q_{i}–[σ, d]→ q_{i+1} that η_{i} satisfies σ. This can be shown by induction over σ. Most atomic cases are straightforward (e.g. the truth of :hover is not affected by our transformations). The case of e, (s*), or (se) appearing positively follows since in these cases the labelling remained the same or changed to some τ consistent with the selector. When such selectors appear negatively, the result follows since we only changed elements and namespaces to fresh ones. The truth of attribute selectors remains unchanged since we did not change the attribute labelling. The cases of :nthchild(αn + β) and :nthlastchild(αn + β) follow since we did not change the number of nodes. For the selectors :nthoftype(αn + β) and :nthlastoftype(αn + β) there are two cases. If we did not change the element label s:e of η_{i}, then we also did not change the label of its siblings. Moreover, we did not add any s:e labels elsewhere in the tree. Hence the truth of the formulas remains the same. If we did change the label from s:e to τ for some τ then observe that we also relabelled all other nodes in the tree labelled by s:e. In particular, all siblings of η_{i}. Moreover, since θ is a bijection and each transition appears only once in the run, we did not label any node not labelled s:e with τ. Hence the truth of the formulas also remains the same. Similar arguments hold for :onlychild and :onlyoftype.
Thus, (D, λ′) is accepted, and only uses elements in ⇃(E_{A}) as required.
We give the proof of Lemma 9. That is, given a sequence of sets of constraint automata A_{C1}, …, A_{Cn} each with at most N_{s} states and at most N_{c} constraints in each C_{i}, if there is a sequence of pairwise unique words v_{1}, …, v_{n} such that for all 1 ≤ i ≤ n there is an accepting run of A_{Ci} over v_{i}, then there exists such a sequence where the length of each v_{i} is at most n · N_{s} · N_{c}.
To prove the lemma, take a sequence v_{1}, …, v_{n} such that each v_{i} is unique and accepted by A_{Ci}. We proceed by induction, constructing v′_{1}, …, v′_{i} such that each v′_{j} is unique, accepted by A_{Cj}, and of length ℓ such that either
When i = 0 the result is vacuous. For the induction there are two cases.
When the length ℓ of v_{i} is such that ℓ ≤ N_{s} · N_{c} we set v′_{i}= v_{i}. We know v′_{i} is unique amongst v′_{1}, …, v′_{i} since for all j < i either v′_{j} is longer than v′_{i} or is equal to v_{j} and thus distinct from v_{i}.
When ℓ > N_{s} · N_{c} we use a pumping argument to pick some v′_{i} of length ℓ′ such that i · N_{s} · N_{c} ≤ ℓ′ ≤ (i + 1) · N_{s} · N_{c}. This ensures that v′_{i} is unique since it is the only word whose length lies within the bound. We take the accepting run
⎛ ⎝  u_{0}, S_{0}, V_{0}  ⎞ ⎠  –[a_{1}]→  ⎛ ⎝  u_{1}, S_{1}, V_{1}  ⎞ ⎠  –[a_{2}]→ ⋯ –[a_{n}]→  ⎛ ⎝  u_{ℓ}, S_{n}, V_{ℓ}  ⎞ ⎠ 
of v_{i} and observe that the values of S_{j} and V_{j} are increasing by definition. That is S_{j}⊆ S_{j+1} and V_{j}⊆ V_{j+1}. By a standard down pumping argument, we can construct a short accepting run containing only distinct configurations of length bound by N_{s} · N_{c}. We construct this run by removing all cycles from the original run. This maintains the acceptance condition. Next we obtain an accepted word of length i · N_{s} · N_{c} ≤ ℓ′ ≤ (i + 1) · N_{s} · N_{c}. Since ℓ > N_{s} · N_{c} we know there exists at least one configuration (u, S, V) in the short run that appeared twice in the original run. Thus there is a run of the automaton from (u, S, V) back to (u, S, V) which can be bounded by N_{s} · N_{c} by the same downward pumping argument as before. Thus, we insert this run into the short run the required number of times to obtain an accepted word v′_{i} of the required length.
Thus, by induction, we are able to obtain the required short words v′_{1}, …, v′_{n} as needed.
AttsPres  ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠  ∨ 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ 
AttsPres  ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  = 
 x_{i, j}^{s:a} = a_{j}∧  ⎛ ⎝  x_{i, m+1}^{s:a} = 0 ∨ x_{i, m+1}^{s:a} =   ⎞ ⎠ 
AttsPres  ⎛ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎠  = 
 ⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎠ 
We show that our negation of periodic constraints is correct.
¬  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  ⇔NoMatch  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  . 
We first consider α = 0. Since there is no β′_{2} with β′_{2} < 0 we have to prove
¬  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  ⇔  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  ∨  ⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠ 
which is immediate.
In all other cases, the conditions on β_{2} and β′_{2} ensure that we always have 0 < β_{2} − β′_{2} < α.
If ∃ n . x = α n + α β_{1} + β_{2} then if α > 0 it is easy to verify that we don’t have x < β (since n = 0 gives x = β as the smallest value of x) and similarly when α < 0 we don’t have x < β. To disprove the final disjunct of of NoMatch(x, α, β) we observe there can be no n′ s.t. x = α n′ + α β_{1} + β′_{2} since x = α n + α β_{1} + β_{2} and 0 < β_{2} − β′_{2} < α.
In the other direction, we have three cases depending on the satisfied disjunct of NoMatch(x, α, β). Consider α > 0 and x < β. In this case there is no n such that x = α n + α β_{1} + β_{2} = α n + β since n = 0 gives the smallest value of x, which is β. The case is similar for the second disjunct with α < 0.
The final disjunct gives some n′ such that x = α n′ + α β_{1} + β′_{2} with 0 < β_{2} − β′_{2} < α. Hence, there can be no n with x = α n + α β_{1} + β_{2}. ▫
We prove soundness and completeness of the Presburger encoding of CSS automata nonemptiness in the two lemmas below.
We take a run of A and construct a satisfying assignment to the variables in θ_{A}. That is take a document tree T = (D, λ), node η ∈ D, and sequence
q_{0}, η_{0}, q_{1}, η_{1}, …, q_{ℓ}, η_{ℓ}, q_{ℓ+1} ∈  ⎛ ⎝  Q × D  ⎞ ⎠  ^{∗}×  ⎧ ⎨ ⎩  q_{f}  ⎫ ⎬ ⎭ 
that is an accepting run. We know from Proposition 7 (Bounded Types) that T can only use namespaces from ⇃(NS) and elements from ⇃(E). Let t_{0}, …, t_{ℓ} be the sequence of transitions used in the accepting run. We assume (w.l.o.g.) that no transition is used twice. We construct a satisfying assignment to the variables as follows.
j =  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  η′ι′  
 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  . 
j =  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩  η′ι′  
 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  . 
It remains to prove that the given assignment satisfies the formula.
Recall
θ_{A} =  ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠  . 
The first two conjuncts follow immediately from our assignment to q_{i} and that the chosen run was accepting. Next we look at the third conjunct and simultaneously prove Consistent_{n}. When i ≥ ℓ + 1 we assigned q_{f} to q_{i} and can choose any assignment that satisfies Consistent_{n}. Otherwise we show we satisfy Tran(i) by showing we satisfy Tran(i, t_{i}). We also show Consistent_{n} is satsfied by induction, noting it is immediate for i = 0 and that for i = 1 we must have either the first orlast case which do not depend on the induction hypothesis. Consider the form of t_{i}.
We can easily check the values of q_{i} and q_{i+1}. We defer the argument for Pres(σ, i) until after the case split. By induction we immediately obtain Consistent_{n}.
We show Pres(σ, i) is satisfied for each η_{i} and σ labelling t_{i}. Take a node η and σ = τΘ from this sequence. Note η satisfies σ since thw run is accepting. Recall
Pres  ⎛ ⎝  τΘ, i  ⎞ ⎠  =  ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 
 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠  . 
From the type information of η we immediately satisfy Pres(τ, i).
For a positive θ ∈ Θ there are several cases. If θ = :root then we know we are in η_{0} and the encoding is ⊤. If θ is some other pseudo class p then the encoding of θ is p_{i} and we assigned true to this variable. For :nthchild(αn + β) and :nthlastchild(αn + β) satisfaction of the encoding follows immediately from η satisfying θ and our assignment to n_{i} and N_{i}. Similarly we satisfy encodings of :nthoftype(αn + β), :nthlastoftype(αn + β), :onlychild, and :onlyoftype. The latter follow since an only child is position 1 from the start and end, and an only of type node has 0 strict predecessors or successors of thesame type.
For a negative θ ∈ Θ there are several cases. If θ = :not(:root) then we know we are not in η_{0} and the encoding is ⊤. If θ is the negation of some other pseudo class p then the encoding of θ is ¬p_{i} and we assigned false to this variable. For the selectors :not(:nthchild(αn + β)) and the opposite selector :not(:nthlastchild(αn + β)) satisfaction of the encoding follows immediately from η satisfying θ, our assignment to n_{i} and N_{i} as well as Proposition 13 (Correctness of NoMatch(x, α, β)). Similarly we satisfy encodings of :not(:nthoftype(αn + β)) and also of the selector :not(:nthlastoftype(αn + β)). For :not(:onlychild), and :not(:onlyoftype) the latter follow since an with multiple children the node must either not be position 1 from the start or end, and a not only of type node has more than 0 strict predecessors or successors of the same type.
Next, to satisfy AttsPres(τΘ, i) we have to satisfy a number of conjuncts. First, if we have a word a_{1} … a_{n} we assign it to the variables x_{i, j}^{s:a} (where η is the ith in the run and j ranges over all word positions within the cimputed bound) ny assigning x_{i, j}^{s:a} = a_{j} when j ≤ n and x_{i, j}^{s:a} = 0 otherwise.
In all cases below, it is straightforward to observe that it a word (within the computed length bound) satisfies [sa op v] or :not([sa op v]) then the encoding AttsPres_{s:a}([sa op v], i) or ¬AttsPres_{s:a}([sa op v], i) is satisfied by our variable assignment. Similarly Nulls(x^{→}) is straightforwardly satisfied. Hence, if a word satisfies C then our assignment to the variables means AttsPres_{s:a}(C, i) is also satisfied.
There are a number of cases of conjuncts for attribute selectors. The simplest is for sets Θ_{a}^{s} where we see immediately that all constraints are satisfied for λ_{A}(η)(s, a) and hence we assign this value to the appropriate variables and the conjuct is satisfied also. For each [a] and [a op v] ∈ Θ we have in the document some namespace s such that λ_{A}(η)(s, a) satisfies the attribute selector and all negative selectors applying to all namespaces. Let s′ be the fresh name space assigned to the selector during the encoding and C be the full set of constraints belonging to the conjunct (i.e. including negative ones). We assign to the variable x_{i, j}^{s:a} the jth character of λ_{A}(η)(s, a) (where η is the ith in the run) and satisfy the conjuct as above. Note here that a single value of s:a is assigned to several s′:a. This is benign with respect to the global uniqueness required by ID attributes because each copy has a different namespace.
Finally, we have to satisfy the consistency constraints. We showed Consistent_{n} above. The remaining consistency constraints are easily seen to be satisfied: Consistent_{i} because each ID is unique causing at least one pair of characters to differ in every value; Consistent_{p} since it encodes basic consistency constraints on the appearence of pseudo elements in the tree.
Thus, we have satisfied the encoded formula, completing the first direction of the proof. ▫
Take a satisfying assignment ρ to the free variables of θ_{A}. We construct a tree and node (T, η) as well as a run of A accepting (T, η).
We begin by taking the sequence of states q_{0}, …, q_{ℓ+1} which is the prefix of the assignment to q_{0}, …, q_{n} where q_{ℓ} is the first occurrence of q_{f}. We will construct a series of transitions t_{0}, …, t_{ℓ} with t_{i}= q_{i}–[σ_{i}, d_{i}]→ q_{i+1} for all 0 ≤ i ≤ ℓ. We will define each d_{i} and σ_{i}, as well as construct T and η by induction. We construct the tree inductively, then show σ_{i} is satisfied for each i.
At first let T_{0} contain only a root node. Thus η_{0} is necessarily this root. Throughout the proof we label each η_{i} as follows.
 

We pick t_{i} as the transition corresponding to a satisfied disjunct of Tran(i) (of which there is at least one since q_{i}≠ q_{f} when i ≤ ℓ). Thus, take t_{i}= q_{i}–[σ_{i}, d_{i}]→ q_{i}. We proceed by a case split on d_{i}. Note only cases d_{i}= ↓ and d_{i}= ∘ may apply when i = 0.
The tree and node we require are the tree and node obtained after reaching some d_{i}= ∘, for which we necessairily have i = ℓ since ∘ must be and can only be used to reach q_{f}. In constructing this tree we have almost demonstrated an accepting run of A. To complete the proof we need to argue that all σ_{i} are satisfied by η_{i} and that the obtained is valid. Let τΘ = σ_{i}.
To check τ we observe that Pres(τ, i) constrains s_{i} and e_{i} to values, which when assigned to η_{i} as above mean η_{i} directly satisfies τ.
Now, take some θ ∈ Θ. In each case we argue that Pres(τ, i) ensures the needed properties. Note this is straightforward for the attribute selectors due to the directness of the Presburger encoding. Consider the remaining selectors.
First assume θ is positive. If it is :root then we must have i = 0 and η_{i} is the root node as required. For other pseudo classes p we asserted p_{i} hence we have p ∈ λ_{P}(η_{i}). The encoding of the remaining positive constraints can only be satisfied when i > 0. That is, η_{i} is not the root node.
For :nthchild(αn + β) observe we constructed T such that η_{i}= η ρ(n_{i}) for some η. From the defined encoding of Pres(:nthchild(αn + β), i) we directly obtain that η_{i} satisfies :nthchild(αn + β). Similarly for :nthlastchild(αn + β) as we always pad the end of the sibling order to ensure the correct number of succeeding siblings.
For :nthoftype(αn + β) and :nthlastoftype(αn + β) selectors, by similar arguments to the previous selectors, we have ensured that there are enough preceeding or succeeding nodes (along with the directness of their Presburger encoding) to ensure these selectors are satisfied by η_{i} in T.
For :onlychild we know there are no other children since ρ(n_{i}) = ρ(N_{i}) = 1. Finally for :onlyoftype we know there are no other children of the same type since ρ(n_{i}^{s:e}) = ρ(N_{i}^{s:e}) = 0 where η_{i} has type s:e.
When θ is negative there are several cases. If it is :not(:root) then we must have i > 0 and η_{i} is not the root node. For other pseudo classes p we asserted ¬p_{i} hence we have p ∉ λ_{P}(η_{i}). The encoding of the remaining positive constraints are always satisfied on the root node. That is, i = 0. When η_{i} is not the root node we have i > 0.
For :not(:nthchild(αn + β)) observe we constructed T such that η_{i}= η ρ(n_{i}) for some η. From the definition of Pres(:not(:nthchild(αn + β)), i) we obtain via Proposition 13 (Correctness of NoMatch(x, α, β)) that η_{i} does not satisfy the selector :nthchild(αn + β). Similarly for the last child selector :not(:nthlastchild(αn + β)).
For the :not(:nthoftype(αn + β)) selector and the opposite selector :not(:nthlastoftype(αn + β)), by similar arguments to the previous selectors, we have ensured that there are enough preceeding or succeeding nodes (along with their Presburger encodings and Proposition 13 (Correctness of NoMatch(x, α, β))) to ensure these selectors are satisfied by η_{i} in T.
For :not(:onlychild) we know there are some other children since ρ(n_{i}) > 1 or ρ(N_{i}) > 1. Finally for :not(:onlyoftype) we know there are other children of the same type since ρ(n_{i}^{s:e}) > 0 or ρ(N_{i}^{s:e}) > 0 where η_{i} has type s:e.
Thus we have an accepting run of A over some (T, η). However, we finally have to argue that T is a valid document tree. This is enforced by Consistent_{i} and Consistent_{p}.
First, Consistent_{i} ensures all IDs satisfying the Presburger encoding are unique. Since we transferred these values directly to T our tree also has unique IDs.
Next, we have to ensure properties such as no node is both active and inactive. These are all directly taken care of by Consistent_{p}. Thus, we are done. ▫
This document was translated from L^{A}T_{E}X by H^{E}V^{E}A.