



jparsec2 Tutorial
A Tutorial for jparsec FrameworkIn this tutorial, we will use the classical example: calculator, to show how jparsec's declarative API can be used to construct a parser. Our calculator should be able to:
A typical parsing process takes 2 steps:
TokenizerFirst, let's create our tokenizer. The tokenizer should be able to recognize decimal number, parenthesis and the operators. Input character sequence will be recognized and translated to token. Number literalThe prebuilt Terminals.DecimalLiteral can be used to tokenize decimal literals. WhitespacesThis calculator allows any number of white spaces before and after any number or operator. Put them togetherWith the number literal tokenizer and the OPERATORS constant, our tokenizer will be: And given any syntactical parser, we will be able to chain it with the tokenizer and ignore the whitespaces and comments: Parser and InterpreterWith the tokenizer taking care of lexical analysis, we can now focus on grammar rules. Decimal NumberFrom the tokenizer, we could have two different kinds of tokens: decimal numbers and operators. To convert the decimal number token to a Double object, we use the corresponding syntactic parser for decimal literal and transform the string result: Modeling OperatorsBefore proceeding to parsing the operators, let's first model them using enum. The binary operators (+, , *, /) implement Binary to facilitate calculation; and negative operator () implement Unary. We will later on declare them in a OperatorTable. Parsing OperatorsAll the operators are managed by the OPERATORS constant. The Terminals.token() method returns a parser that recognizes operator. A convenience method can be created on top of it to save a few keystrokes for us: For each operator, the parser will first recognize the token using the term() method and then return the corresponding BinaryOperator/UnaryOperator instance, as in: Whitespace OperatorAt the beginning, we said we want to allow whitespace to be an alternative syntax for multiplication. Well, formally speaking, whitespace itself is not enough to make an operator. For example, "2 3" should still be parsed as a "minus", not a multiplication between "2" and "3". Therefore, our whitespace operator should only happen when none of "+", "", "*", "/" is present. Therefore we have: Operator PrecedenceIn a classic recursivedesent parser, operators with different precedence are handled by defining different production rules, for example: This solution can lead to messy production rules when the number of operators and precedence levels scale up. It is more desirable to be able to specify the operator precedence declaratively. Left RecursionAnother drawback of recursivedesent parser is left recursion. In order to preserve the leftassociative nature of the operators, the above production rules, are defined leftresursively (the first rule at the right hand side of muldiv_expr is muldiv_expr itself). Such leftrecursive definition will lead to infiniterecursion in a naive recursivedesent parser implementation. OperatorTableJparsec provides support for operator precedence grammar and automatically handles left recursion incured by leftassociative operators. Our target syntax should look like: The higher the precedence number, the more tightly it binds the operands. Parens and RecursionIn order to support parens, recursion needs to be involved. An expression inside a pair of parens is itself another expression, which can be nested inside another pair of parens. The way to handle recursion is to use Parser.Reference: If you run the "parenthesized" parser above, it will fail with some error message about the reference is not set yet. And that is accurately what we are missing here. The reference needs to finally be set with the parser for expression, because, well, literally any expression can be nested within a pair of parens. So, putting it together with the operator table and whitespace operator, we have: And that is almost everything we need. Oh, wait, let's pass the NUMBER parser in and hook it up with the lexer to get the final parser: And it doesn't hurt to run it and see how it happily calculates, does it? The EndThe final parser code is as following: SummaryIn this tutorial, we chose to run the calculation directly in the semantic actions. We could defer this calculation and create intermediary abstract syntax tree for further analysis and processing (such as type checking and code optimisation). But that makes no real difference in the parser code. We just need to replace the direct calculation code with the code that creates the abstract syntax tree (of course, we also need to replace <Double> with <whatever your expression type>). This short tutorial cannot cover every detail of the jparsec API. There are many other useful utility methods provided to help the process of constructing a complex parser. Please refer to the jparsec API Documentation for further information. 

Copyright 20032006  The Codehaus. All rights reserved unless otherwise noted.
Powered by Atlassian Confluence
