Blog
LL(k) parsing, explained simply
The parser generator is an LL(k) generator. That label is doing a lot of work, so this post unpacks it without the formal machinery.
Reading the name
The three parts each mean something concrete:
- L — the input is read Left to right.
- L — it builds a Leftmost derivation (it always expands the leftmost rule first).
- k — it may look ahead k tokens before deciding which rule to take.
The interesting part is the k. With one token of lookahead, some grammars are
ambiguous; with two, the ambiguity disappears.
A concrete example
Consider two rules that both start with an identifier:
statement : assignment | call ;
assignment: ID '=' expr ;
call : ID '(' args ')' ;
With a single token (k = 1) the parser only sees ID and cannot tell which rule
applies. With k = 2 it also sees the next token — = versus ( — and the choice is
immediate:
// generated decision, conceptually:
if (la(2).type == ASSIGN) {
return parseAssignment();
} else {
return parseCall();
}
No backtracking, no guessing — just a fixed amount of lookahead.
Why it stays fast
Because the lookahead is bounded by k, every decision is made in constant time. The
generated parser is predictable: its worst case is its average case. That predictability
is exactly what makes generated parsers pleasant to embed in editors and build tools.
Try it
Point the generator at your own grammar and read the code it produces:
hivevm-cc generate MyGrammar.g --target java
The source lives in hivevm/cc.
Back to the blog