Top-down parsing

A compiler parses input from a programming language to assembly language or an internal representation by matching the incoming symbols to Backus-Naur form production rules. An LL parser, also called a top-bottom parser or top-down parser, applies each production rule to the incoming symbols by working from the left-most symbol yielded on a production rule and then proceeding to the next production rule for each non-terminal symbol encountered. In this way the parsing starts on the Left of the result side (right side) of the production rule and evaluates non-terminals from the Left first and, thus, proceeds down the parse tree for each new non-terminal before continuing to the next symbol for a production rule.

  • A->aBC
  • B->c|cd
  • C->df|eg
would match A->aB and attempt to match B->c|cd next. Then C->df|eg would be tried. As one may expect, some languages are more ambiguous than others. For a non-ambiguous language in which all productions for a non-terminal produce distinct strings: the string produced by one production will not start with the same symbol as the string produced by another production. A non-ambiguous language may be parsed by an LL(1) grammar where the (1) signifies the parser reads ahead one token at a time. For an ambiguous language to be parsed by an LL parser, the parser must lookahead >1 symbol, e.g. LL(3). The common solution is to use an LR parser (a.k.a. bottom-up or shift-reduce parser).

This article is licensed under the GNU Free Documentation License.
It uses material from the Wikipedia article "Top-down parsing".