Optimising Right Recursion

Update, 14/12/2014: Jeffrey Kegler, the author of the Marpa parser, posted a response to this post. You probably should take a look.

Note: I have decided not to include this optimisation in the rest of the series. It complicates the core algorithm, has further repercussions in the parsing phase, and its performance benefits are unclear —they need to be tested. (Leo didn’t lie about the complexity bounds, but he omitted less interesting considerations such as the ever growing memory gap). Simply put, more research is needed. I may perform it. Someday.

The problem

Consider the following grammar:

A -> A 'a'
A ->

Let it parse this input: "aaaaa". For now, there’s no problem:

=== 0 ===
A -> • A 'a'  (0)
A -> •        (0)
A -> A • 'a'  (0)

=== 1 ===
A -> A 'a' •  (0)
A -> A • 'a'  (0)

=== 2 ===
A -> A 'a' •  (0)
A -> A • 'a'  (0)

=== 3 ===
A -> A 'a' •  (0)
A -> A • 'a'  (0)

=== 4 ===
A -> A 'a' •  (0)
A -> A • 'a'  (0)

=== 5 ===
A -> A 'a' •  (0)
A -> A • 'a'  (0)

But then, consider this slightly different grammar (the symbols of the first rule have been inverted):

A -> 'a' A
A ->

Upon the same input, it yields this:

=== 0 ===
A -> • 'a' A  (0)
A -> •        (0)

=== 1 ===
A -> 'a' • A  (0)
A -> • 'a' A  (1)
A -> 'a' A •  (0)
A -> •        (1)

=== 2 ===
A -> 'a' • A  (1)
A -> • 'a' A  (2)
A -> 'a' A •  (1)
A -> •        (2)
A -> 'a' A •  (0)

=== 3 ===
A -> 'a' • A  (2)
A -> • 'a' A  (3)
A -> 'a' A •  (2)
A -> •        (3)
A -> 'a' A •  (1)
A -> 'a' A •  (0)

=== 4 ===
A -> 'a' • A  (3)
A -> • 'a' A  (4)
A -> 'a' A •  (3)
A -> •        (4)
A -> 'a' A •  (2)
A -> 'a' A •  (1)
A -> 'a' A •  (0)

=== 5 ===
A -> 'a' • A  (4)
A -> • 'a' A  (5)
A -> 'a' A •  (4)
A -> •        (5)
A -> 'a' A •  (3)
A -> 'a' A •  (2)
A -> 'a' A •  (1)
A -> 'a' A •  (0)

Now we have a problem: the previous grammar yielded a reasonable number of Earley items, but this one… grows quadratically. Such problems are expected of Earley’s algorithm, but come on! This grammar could be parsed by a simple LR parser in linear time! We should not need that many Earley items.

The solution

The trick was published by Joop Leo in 1991. The idea is to avoid completing certain key Earley items.

Deterministic reduction path

Picture this Earley item (Greek letters denote sequences of symbols):

A -> α • (i)

The deterministic path, if any, is a chain of Earley items. The first item of this chain, if there is such a thing, satisfies 2 conditions:

It is of the form: X -> β A • (k)
There is exactly one item of the form X -> β • A (k) in the Earley set S(i),

Now we can picture this Earley item: X -> β A • (k) (yes, it’s the one we just discovered), and find what’s just above it. We can then go up the chain until we can’t. The last Earley item we find is called the topmost complete item on the deterministic reduction path.

By the way, you may notice the eerie similarity with the regular completion step. See, when performing a completion, we check for all items of the form X -> β • A δ (k) in S(i), so we can add X -> β A • δ (k) to the current Earley set. The reduction path business is only a special case, where δ happens to be empty.

The algorithm

We modify the completion step. Instead of performing a regular completion right away, we first examine our Earley item, and check if there is a deterministic reduction path.

If there is, we just add the topmost item on this path to the current Earley set.
Otherwise, we perform a regular completion.

The effect of these shenanigans is fairly simple: if we were doing the regular completion, we would end up adding the whole reduction path to the current Earley set. Instead, we just add the topmost item, and skip all the others.

Let’s take the construction of the final Earley set in our last example. With the regular algorithm, we get this:

=== 5 ===
A -> 'a' • A  (4)  -- scanned previously
A -> • 'a' A  (5)  -- Predicted
A -> 'a' A •  (4)  -- Special completion (because A is nullable)
A -> •        (5)  -- Predicted
A -> 'a' A •  (3)  -- Completed
A -> 'a' A •  (2)  -- Completed
A -> 'a' A •  (1)  -- Completed
A -> 'a' A •  (0)  -- Completed

Now see the last four completions? They’re the deterministic reduction path of A -> • (5). With our new and improved algorithm, we would only add the last one. Here is how it would go. First, we would proceed as usual:

=== 5 ===
A -> 'a' • A  (4)  -- scanned previously
A -> • 'a' A  (5)  -- Predicted
A -> 'a' A •  (4)  -- Special completion (because A is nullable)
A -> •        (5)  -- Predicted

Then, things change as we complete A -> • (5). First, we check for the presence of an item of the form X -> α • (k) in S(5). There is one, and only one:

A -> 'a' • A  (4)

Now, we know that

A -> 'a' A •  (4)

is in the deterministic path. So, instead of putting it in S(5) right away (which by the way wouldn’t change a thing, since it is already present), we go up the chain, and look for an item of the form X -> α • (k), but this time, in S(4). And lo and behold, we find…

A -> 'a' • A  (3)

So the next item up the chain is

A -> 'a' A •  (3)

And so on until we find the topmost item:

A -> 'a' A •  (0)

And we just add that. We therefore get this:

=== 5 ===
A -> 'a' • A  (4)  -- scanned previously
A -> • 'a' A  (5)  -- Predicted
A -> 'a' A •  (4)  -- Special completion (because A is nullable)
A -> •        (5)  -- Predicted
A -> 'a' A •  (0)  -- Topmost item in the deterministic reduction path

Now the number of items can stop growing at each step. Here what it would do to the whole thing:

=== 0 ===
A -> • 'a' A  (0)
A -> •        (0)

=== 1 ===
A -> 'a' • A  (0)
A -> • 'a' A  (1)
A -> 'a' A •  (0)
A -> •        (1)

=== 2 ===
A -> 'a' • A  (1)
A -> • 'a' A  (2)
A -> 'a' A •  (1)
A -> •        (2)
A -> 'a' A •  (0)

=== 3 ===
A -> 'a' • A  (2)
A -> • 'a' A  (3)
A -> 'a' A •  (2)
A -> •        (3)
A -> 'a' A •  (0)

=== 4 ===
A -> 'a' • A  (3)
A -> • 'a' A  (4)
A -> 'a' A •  (3)
A -> •        (4)
A -> 'a' A •  (0)

=== 5 ===
A -> 'a' • A  (4)
A -> • 'a' A  (5)
A -> 'a' A •  (4)
A -> •        (5)
A -> 'a' A •  (0)

Okay, that’s still more than the left recursive version. Still, the number of Earley items per set is now bounded. We got our linear parsing back…

The real algorithm

…mostly. In case you didn’t notice, checking the reduction path takes time. And the further we go into the parse, the longer the reduction path is, the longer it takes to go to the top. Even worse, we are checking and re-checking the same reduction path over and over! So, while we got spatial complexity back to normal, the temporal complexity is still quadratic.

The obvious solution: cache the topmost item in the reduction path. Next time we check we won’t have to go up the whole chain all over again. To perform the cache, we use what Leo calls transitive items. A transitive item is something of the form:

X -> α •  (i), Y

where X and Y are non-terminal symbols, and α is a sequence of symbols (terminal or non-terminal). Basically, it is a completed Earley item, with a symbol attached. The Earley item itself is the topmost item on some reduction path. The symbol tells us which reduction path is involved.

Oh, I didn’t tell you: at any given point in the parse, you can’t find more deterministic paths than non-terminal symbols in the grammar. Looking back at the definition of deterministic paths, it should have been obvious. (I actually missed it, and got the tip from Jeffrey Kegler, in his Marpa parser paper.) In our example above, for instance, there is only one non-terminal symbol (A). So, there is only one deterministic reduction path.

Now, when we find the topmost item in the reduction path, we add the corresponding transitive item to the current Earley set. And all the Earley sets we just examined, while we’re at it.

(That last one bothers me. We used to only touch two Earley sets: the current one (predictions and completions), and the next one (scans). This could enable the use of a neat dynamic array for the whole thing. But with this new “optimisation”, we now have to deal with arbitrary insertions. This could be a performance killer: arrays aren’t exactly insertion friendly, and anything else likely involve an uncanny amount of pointer chasing. Maybe we can mitigate the damage with some kind of cache aware data structure, but I shudder at the sheer complexity.)

In our example it would go like this:

=== 0 ===
A -> • 'a' A  (0) -- as usual
A -> •        (0) -- as usual

=== 1 ===
A -> 'a' • A  (0) -- as usual
A -> • 'a' A  (1) -- as usual
A -> 'a' A •  (0) -- Search for the topmost item -there is none.
A -> •        (1) -- as usual
A : A -> 'a' A • (1) <─────────this transitive item────┐
                                                       │
=== 2 ===                                              │
A -> 'a' • A  (1) -- as usual                          │
A -> • 'a' A  (2) -- as usual                          │
A -> 'a' A •  (1) -- Search for the topmost item       │
                     Finds A -> 'a' • A  (0), creates ─┘
A : A -> 'a' A • (1) <─── And this one one as well
                          (Yes, it is the same)
A -> •        (2) -- Search for the topmost item,
                     finds the transitive item above
A -> 'a' A •  (0) -- Search for the same topmost item,
                     there is none.

This little dances then goes on in the same fashion:

=== 3 ===
A -> 'a' • A  (2)
A -> • 'a' A  (3)
A -> 'a' A •  (2)
A : A -> 'a' A • (1) -- copy of the transitive item above
A -> •        (3)
A -> 'a' A •  (0)

=== 4 ===
A -> 'a' • A  (3)
A -> • 'a' A  (4)
A -> 'a' A •  (3)
A : A -> 'a' A • (1) -- copy of the transitive item above
A -> •        (4)
A -> 'a' A •  (0)

=== 5 ===
A -> 'a' • A  (4)
A -> • 'a' A  (5)
A -> 'a' A •  (4)
A : A -> 'a' A • (1) -- copy of the transitive item above
A -> •        (5)
A -> 'a' A •  (0)

More or less. Sorry for the lack of details (especially with respect to empty rules), I’m not very motivated to flesh it out right now: I’m not sure this optimisation is worth the trouble, and the rest of the series is more important anyway.