|
|||||
|
|||||
jparsec Overview
In a typical parser program written in jparsec, programmer creates a bunch of Parser objects and combines them together. These Parser objects each represent a piece of parsing logic. How do I start?With jparsec, one constructs Parser object in terms of the production rule of the grammar. Once a Parser object is created, it can be used as in: Depending on your need, this return value can be either the calculation result or an abstract syntax tree. So how does one create Parser object? The following are the most important classes:
What are the top 5 combinators that I need to familiarize myself with?
Lexical analysis vs. syntactical analysisIn a simple scenario, all work can be done in the scanning phase. For example: However, when the complexity of the grammar rule scales up and there are whitespaces and comments to ignore from the grammar, one-pass parsing becomes awkward. A 2-pass approach can then be used. That is, a lexical analysis phase scans the source as a list of Tokens and then a second syntactical analysis phase parses the tokens. The Terminals class provides common tokenizers that scans the source string and turns them into tokens. It also provides corresponding syntactic parsers that recognize these tokens in the syntactical analysis phase. A syntactical parser takes a list of tokens as input, this list needs to come from the output of a lexer. The Parser.from() API can be used to chain a syntactical parser with a lexer. What are the typical steps in creating a conventional 2-pass parser?Step 1: TerminalsUse the pre-defined tokenizers and terminal syntactical parsers in Terminals to define the atoms of your language. For example, the following parser parses a list of integers separated by a comma, with hitespaces and block comments ignored. Step 2: Production ruleThe next step is to build the syntactical parser following production rules. The "integers" parser used above is a simple example. Real parsers can be arbitrarily complex. For operator precedence grammar, OperatorTable can be used to declare operator precedences and associativities and construct parser based on the declaration. As in most recursive descent parsers, left-recursion needs to be avoided. Beware not to write a parser like this: It will fail with stack overflow! A less obvious left-recursion is a production rule that looks like: As many can occur 0 times, we have a potential left recursion here. Although left recursive grammar isn't generally supported, the most common case of left recursion stems from left associative binary operator, which is handled by OperatorTable. TipsPlease see jparsec Tips for tips and catches. |
|||||
|
Copyright 2003-2006 - The Codehaus. All rights reserved unless otherwise noted.
Powered by Atlassian Confluence
|
|||||