From 41ee4aff044b8655e3ac4f5c6607ee9cd9b009ad Mon Sep 17 00:00:00 2001
From: David Beazley
-When a syntax error occurs, yacc.py performs the following steps:
+When a syntax error occurs, SLY performs the following steps:
-
-
-
-
-
-
This type of recovery is sometimes known as parser resynchronization.
-The error token acts as a wildcard for any bad input text and
-the token immediately following error acts as a
+The ``error`` token acts as a wildcard for any bad input text and
+the token immediately following ``error`` acts as a
synchronization token.
-
-It is important to note that the error token usually does not appear as the last token
-on the right in an error rule. For example:
+It is important to note that the ``error`` token usually does not
+appear as the last token on the right in an error rule. For example::
-
-Panic mode recovery is implemented entirely in the p_error() function. For example, this
-function starts discarding tokens until it reaches a closing '}'. Then, it restarts the
-parser in its initial state.
+ def error(self, p):
+ print("Whoa. You are seriously hosed.")
+ if not p:
+ print("End of File!")
+ return
-
-This function simply discards the bad token and tells the parser that the error was ok.
+ def error(self, p):
+ if p:
+ print("Syntax error at token", p.type)
+ # Just discard the token and tell the parser it's okay.
+ self.errok()
+ else:
+ print("Syntax error at EOF")
-
-More information on these methods is as follows:
-
-
-
-
-To supply the next lookahead token to the parser, p_error() can return a token. This might be
-useful if trying to synchronize on special characters. For example:
+ def error(self, tok):
+ # Read ahead looking for a terminating ";"
+ while True:
+ tok = next(self.tokens, None) # Get the next token
+ if not tok or tok.type == 'SEMI':
+ break
+ self.errok()
-
-Keep in mind in that the above error handling functions,
-parser is an instance of the parser created by
-yacc(). You'll need to save this instance someplace in your
-code so that you can refer to it during error handling.
-
-One important aspect of manually setting an error is that the p_error() function will NOT be
-called in this case. If you need to issue an error message, make sure you do it in the production that
-raises SyntaxError.
-
-
-Note: This feature of PLY is meant to mimic the behavior of the YYERROR macro in yacc.
-
-
-In most cases, yacc will handle errors as soon as a bad input token is
-detected on the input. However, be aware that yacc may choose to
-delay error handling until after it has reduced one or more grammar
-rules first. This behavior might be unexpected, but it's related to
-special states in the underlying parsing table known as "defaulted
-states." A defaulted state is parsing condition where the same
-grammar rule will be reduced regardless of what valid token
-comes next on the input. For such states, yacc chooses to go ahead
-and reduce the grammar rule without reading the next input
-token. If the next token is bad, yacc will eventually get around to reading it and
-report a syntax error. It's just a little unusual in that you might
-see some of your grammar rules firing immediately prior to the syntax
-error.
-
-Usually, the delayed error reporting with defaulted states is harmless
-(and there are other reasons for wanting PLY to behave in this way).
-However, if you need to turn this behavior off for some reason. You
-can clear the defaulted states table like this:
-
-Disabling defaulted states is not recommended if your grammar makes use
-of embedded actions as described in Section 6.11.
-Although it may be convenient for PLY to track position information on
-all grammar symbols, this is often unnecessary. For example, if you
-are merely using line number information in an error message, you can
-often just key off of a specific token in the grammar rule. For
-example:
-
-
-Similarly, you may get better parsing performance if you only
-selectively propagate line number information where it's needed using
-the p.set_lineno() method. For example:
-
- A minimal way to construct a tree is to simply create and
+AST Construction
+^^^^^^^^^^^^^^^^
+
+SLY provides no special functions for constructing an abstract syntax
+tree. However, such construction is easy enough to do on your own.
+
+A minimal way to construct a tree is to simply create and
propagate a tuple or list in each grammar rule function. There
are many possible ways to do this, but one example would be something
-like this:
+like this::
-
Another approach is to create a set of data structure for different
-kinds of abstract syntax tree nodes and assign nodes to p[0]
-in each rule. For example:
+kinds of abstract syntax tree nodes and create different node types
+in each rule::
-
-To simplify tree traversal, it may make sense to pick a very generic
-tree structure for your parse tree nodes. For example:
+The parsing technique used by SLY only allows actions to be executed
+at the end of a rule. For example, suppose you have a rule like this::
-
In this case, the supplied action code only executes after all of the
-symbols A, B, C, and D have been
+symbols ``A``, ``B``, ``C``, and ``D`` have been
parsed. Sometimes, however, it is useful to execute small code
fragments during intermediate stages of parsing. For example, suppose
-you wanted to perform some action immediately after A has
-been parsed. To do this, write an empty rule like this:
+you wanted to perform some action immediately after ``A`` has
+been parsed. To do this, write an empty rule like this::
-
-In this example, the empty seen_A rule executes immediately
-after A is shifted onto the parsing stack. Within this
-rule, p[-1] refers to the symbol on the stack that appears
-immediately to the left of the seen_A symbol. In this case,
-it would be the value of A in the foo rule
-immediately above. Like other rules, a value can be returned from an
-embedded action by simply assigning it to p[0]
+The use of embedded actions can sometimes introduce extra shift/reduce
+conflicts. For example, this grammar has no conflicts::
-
-The use of embedded actions can sometimes introduce extra shift/reduce conflicts. For example,
-this grammar has no conflicts:
+ @_('abcd',
+ 'abcx')
+ def foo(self, p):
+ pass
-
A common use of embedded rules is to control other aspects of parsing
-such as scoping of local variables. For example, if you were parsing C code, you might
-write code like this:
+such as scoping of local variables. For example, if you were parsing
+C code, you might write code like this::
-
-
-
-Normally, the parsetab.py file is placed into the same directory as
-the module where the parser is defined. If you want it to go somewhere else, you can
-given an absolute package name for tabmodule instead. In that case, the
-tables will be written there.
-
-
-Note: Be aware that unless the directory specified is also on Python's path (sys.path), subsequent
-imports of the table file will fail. As a general rule, it's better to specify a destination using the
-tabmodule argument instead of directly specifying a directory using the outputdir argument.
-
-
-
-
-It should be noted that table generation is reasonably efficient, even for grammars that involve around a 100 rules
-and several hundred states.
-
-
-
-The different states that appear in this file are a representation of
-every possible sequence of valid input tokens allowed by the grammar.
-When receiving input tokens, the parser is building up a stack and
-looking for matching rules. Each state keeps track of the grammar
-rules that might be in the process of being matched at that point. Within each
-rule, the "." character indicates the current location of the parse
-within that rule. In addition, the actions for each valid input token
-are listed. When a shift/reduce or reduce/reduce conflict arises,
-rules not selected are prefixed with an !. For example:
-
-
-Unused terminals:
-
-
-Grammar
-
-Rule 1 expression -> expression PLUS expression
-Rule 2 expression -> expression MINUS expression
-Rule 3 expression -> expression TIMES expression
-Rule 4 expression -> expression DIVIDE expression
-Rule 5 expression -> NUMBER
-Rule 6 expression -> LPAREN expression RPAREN
-
-Terminals, with rules where they appear
-
-TIMES : 3
-error :
-MINUS : 2
-RPAREN : 6
-LPAREN : 6
-DIVIDE : 4
-PLUS : 1
-NUMBER : 5
-
-Nonterminals, with rules where they appear
-
-expression : 1 1 2 2 3 3 4 4 6 0
-
-
-Parsing method: LALR
-
-
-state 0
-
- S' -> . expression
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 1
-
- S' -> expression .
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- PLUS shift and go to state 6
- MINUS shift and go to state 5
- TIMES shift and go to state 4
- DIVIDE shift and go to state 7
-
-
-state 2
-
- expression -> LPAREN . expression RPAREN
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 3
-
- expression -> NUMBER .
-
- $ reduce using rule 5
- PLUS reduce using rule 5
- MINUS reduce using rule 5
- TIMES reduce using rule 5
- DIVIDE reduce using rule 5
- RPAREN reduce using rule 5
-
-
-state 4
-
- expression -> expression TIMES . expression
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 5
-
- expression -> expression MINUS . expression
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 6
-
- expression -> expression PLUS . expression
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 7
-
- expression -> expression DIVIDE . expression
- expression -> . expression PLUS expression
- expression -> . expression MINUS expression
- expression -> . expression TIMES expression
- expression -> . expression DIVIDE expression
- expression -> . NUMBER
- expression -> . LPAREN expression RPAREN
-
- NUMBER shift and go to state 3
- LPAREN shift and go to state 2
-
-
-state 8
-
- expression -> LPAREN expression . RPAREN
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- RPAREN shift and go to state 13
- PLUS shift and go to state 6
- MINUS shift and go to state 5
- TIMES shift and go to state 4
- DIVIDE shift and go to state 7
-
-
-state 9
-
- expression -> expression TIMES expression .
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- $ reduce using rule 3
- PLUS reduce using rule 3
- MINUS reduce using rule 3
- TIMES reduce using rule 3
- DIVIDE reduce using rule 3
- RPAREN reduce using rule 3
-
- ! PLUS [ shift and go to state 6 ]
- ! MINUS [ shift and go to state 5 ]
- ! TIMES [ shift and go to state 4 ]
- ! DIVIDE [ shift and go to state 7 ]
-
-state 10
-
- expression -> expression MINUS expression .
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- $ reduce using rule 2
- PLUS reduce using rule 2
- MINUS reduce using rule 2
- RPAREN reduce using rule 2
- TIMES shift and go to state 4
- DIVIDE shift and go to state 7
-
- ! TIMES [ reduce using rule 2 ]
- ! DIVIDE [ reduce using rule 2 ]
- ! PLUS [ shift and go to state 6 ]
- ! MINUS [ shift and go to state 5 ]
-
-state 11
-
- expression -> expression PLUS expression .
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- $ reduce using rule 1
- PLUS reduce using rule 1
- MINUS reduce using rule 1
- RPAREN reduce using rule 1
- TIMES shift and go to state 4
- DIVIDE shift and go to state 7
-
- ! TIMES [ reduce using rule 1 ]
- ! DIVIDE [ reduce using rule 1 ]
- ! PLUS [ shift and go to state 6 ]
- ! MINUS [ shift and go to state 5 ]
-
-state 12
-
- expression -> expression DIVIDE expression .
- expression -> expression . PLUS expression
- expression -> expression . MINUS expression
- expression -> expression . TIMES expression
- expression -> expression . DIVIDE expression
-
- $ reduce using rule 4
- PLUS reduce using rule 4
- MINUS reduce using rule 4
- TIMES reduce using rule 4
- DIVIDE reduce using rule 4
- RPAREN reduce using rule 4
-
- ! PLUS [ shift and go to state 6 ]
- ! MINUS [ shift and go to state 5 ]
- ! TIMES [ shift and go to state 4 ]
- ! DIVIDE [ shift and go to state 7 ]
-
-state 13
-
- expression -> LPAREN expression RPAREN .
-
- $ reduce using rule 6
- PLUS reduce using rule 6
- MINUS reduce using rule 6
- TIMES reduce using rule 6
- DIVIDE reduce using rule 6
- RPAREN reduce using rule 6
-
-
-
-
-By looking at these rules (and with a little practice), you can usually track down the source
-of most parsing conflicts. It should also be stressed that not all shift-reduce conflicts are
-bad. However, the only way to be sure that they are resolved correctly is to look at parser.out.
+debugging, you can have SLY produce a debugging file when it
+constructs the parsing tables. Add a ``debugfile`` attribute to your
+class like this::
+
+ class CalcParser(Parser):
+ debugfile = 'parser.out'
+ ...
+
+When present, this will write the entire grammar along with all parsing
+states to the file you specify. Each state of the parser is shown
+as output that looks something like this::
+
+ state 2
+
+ (7) factor -> LPAREN . expr RPAREN
+ (1) expr -> . term
+ (2) expr -> . expr MINUS term
+ (3) expr -> . expr PLUS term
+ (4) term -> . factor
+ (5) term -> . term DIVIDE factor
+ (6) term -> . term TIMES factor
+ (7) factor -> . LPAREN expr RPAREN
+ (8) factor -> . NUMBER
+ LPAREN shift and go to state 2
+ NUMBER shift and go to state 3
+
+ factor shift and go to state 1
+ term shift and go to state 4
+ expr shift and go to state 6
+
+Each state keeps track of the grammar rules that might be in the
+process of being matched at that point. Within each rule, the "."
+character indicates the current location of the parse within that
+rule. In addition, the actions for each valid input token are listed.
+By looking at these rules (and with a little practice), you can
+usually track down the source of most parsing conflicts. It should
+also be stressed that not all shift-reduce conflicts are bad.
+However, the only way to be sure that they are resolved correctly is
+to look at the debugging file.
-
- ! TIMES [ reduce using rule 2 ]
- ! DIVIDE [ reduce using rule 2 ]
- ! PLUS [ shift and go to state 6 ]
- ! MINUS [ shift and go to state 5 ]
-
-6.8 Syntax Error Handling
-
+Syntax Error Handling
+^^^^^^^^^^^^^^^^^^^^^
If you are creating a parser for production use, the handling of
syntax errors is important. As a general rule, you don't want a
parser to simply throw up its hands and stop at the first sign of
-trouble. Instead, you want it to report the error, recover if possible, and
-continue parsing so that all of the errors in the input get reported
-to the user at once. This is the standard behavior found in compilers
-for languages such as C, C++, and Java.
+trouble. Instead, you want it to report the error, recover if
+possible, and continue parsing so that all of the errors in the input
+get reported to the user at once. This is the standard behavior found
+in compilers for languages such as C, C++, and Java.
-In PLY, when a syntax error occurs during parsing, the error is immediately
+In SLY, when a syntax error occurs during parsing, the error is immediately
detected (i.e., the parser does not read any more tokens beyond the
source of the error). However, at this point, the parser enters a
recovery mode that can be used to try and continue further parsing.
As a general rule, error recovery in LR parsers is a delicate
topic that involves ancient rituals and black-magic. The recovery mechanism
-provided by yacc.py is comparable to Unix yacc so you may want
+provided by SLY is comparable to Unix yacc so you may want
consult a book like O'Reilly's "Lex and Yacc" for some of the finer details.
-
-
+6. If the top item of the parsing stack is ``error``, lookahead tokens will be discarded until the
+parser can successfully shift a new symbol or reduce a rule involving ``error``.
-6.8.1 Recovery and resynchronization with error rules
+Recovery and resynchronization with error rules
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The most well-behaved approach for handling syntax errors is to write
+grammar rules that include the ``error`` token. For example,
+suppose your language had a grammar rule for a print statement like
+this::
-The most well-behaved approach for handling syntax errors is to write grammar rules that include the error
-token. For example, suppose your language had a grammar rule for a print statement like this:
+ @_('PRINT expr SEMI')
+ def statement(self, p):
+ ...
-
-
+To account for the possibility of a bad expression, you might write an
+additional grammar rule like this::
-To account for the possibility of a bad expression, you might write an additional grammar rule like this:
+ @_('PRINT error SEMI')
+ def statement(self, p):
+ print("Syntax error in print statement. Bad expression")
-
-def p_statement_print(p):
- 'statement : PRINT expr SEMI'
- ...
-
-
-
-
-In this case, the error token will match any sequence of
+In this case, the ``error`` token will match any sequence of
tokens that might appear up to the first semicolon that is
encountered. Once the semicolon is reached, the rule will be
-invoked and the error token will go away.
+invoked and the ``error`` token will go away.
-
-def p_statement_print_error(p):
- 'statement : PRINT error SEMI'
- print("Syntax error in print statement. Bad expression")
-
-
-
-
+ @_('PRINT error')
+ def statement(self, p):
+ print("Syntax error in print statement. Bad expression")
This is because the first bad token encountered will cause the rule to
be reduced--which may make it difficult to recover if more bad tokens
immediately follow.
-
-def p_statement_print_error(p):
- 'statement : PRINT error'
- print("Syntax error in print statement. Bad expression")
-
-6.8.2 Panic mode recovery
+Panic mode recovery
+~~~~~~~~~~~~~~~~~~~
+An alternative error recovery scheme is to enter a panic mode recovery
+in which tokens are discarded to a point where the parser might be
+able to recover in some sensible manner.
-An alternative error recovery scheme is to enter a panic mode recovery in which tokens are
-discarded to a point where the parser might be able to recover in some sensible manner.
+Panic mode recovery is implemented entirely in the ``error()``
+function. For example, this function starts discarding tokens until
+it reaches a closing '}'. Then, it restarts the parser in its initial
+state::
-
-
+This function simply discards the bad token and tells the parser that
+the error was ok::
-
-def p_error(p):
- print("Whoa. You are seriously hosed.")
- if not p:
- print("End of File!")
- return
+ # Read ahead looking for a closing '}'
+ while True:
+ tok = next(self.tokens, None)
+ if not tok or tok.type == 'RBRACE':
+ break
+ self.restart()
- # Read ahead looking for a closing '}'
- while True:
- tok = parser.token() # Get the next token
- if not tok or tok.type == 'RBRACE':
- break
- parser.restart()
-
-
-
+A few additional details about some of the attributes and methods being used:
-
-def p_error(p):
- if p:
- print("Syntax error at token", p.type)
- # Just discard the token and tell the parser it's okay.
- parser.errok()
- else:
- print("Syntax error at EOF")
-
-
-
+To supply the next lookahead token to the parser, ``error()`` can return a token. This might be
+useful if trying to synchronize on special characters. For example::
-
-
+When Do Syntax Errors Get Reported?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-def p_error(p):
- # Read ahead looking for a terminating ";"
- while True:
- tok = parser.token() # Get the next token
- if not tok or tok.type == 'SEMI': break
- parser.errok()
+ # Return SEMI to the parser as the next lookahead token
+ return tok
- # Return SEMI to the parser as the next lookahead token
- return tok
-
-6.8.3 Signalling an error from a production
+General comments on error handling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For normal types of languages, error recovery with error rules and
+resynchronization characters is probably the most reliable
+technique. This is because you can instrument the grammar to catch
+errors at selected places where it is relatively easy to recover and
+continue parsing. Panic mode recovery is really only useful in
+certain specialized applications where you might want to discard huge
+portions of the input text to find a valid restart point.
-If necessary, a production rule can manually force the parser to enter error recovery. This
-is done by raising the SyntaxError exception like this:
-
-
-
-
-The effect of raising SyntaxError is the same as if the last symbol shifted onto the
-parsing stack was actually a syntax error. Thus, when you do this, the last symbol shifted is popped off
-of the parsing stack and the current lookahead token is set to an error token. The parser
-then enters error-recovery mode where it tries to reduce rules that can accept error tokens.
-The steps that follow from this point are exactly the same as if a syntax error were detected and
-p_error() were called.
-
-
-def p_production(p):
- 'production : some production ...'
- raise SyntaxError
-
-6.8.4 When Do Syntax Errors Get Reported
-
-
-
-
-
-
-parser = yacc.yacc()
-parser.defaulted_states = {}
-
-6.8.5 General comments on error handling
-
-
-For normal types of languages, error recovery with error rules and resynchronization characters is probably the most reliable
-technique. This is because you can instrument the grammar to catch errors at selected places where it is relatively easy
-to recover and continue parsing. Panic mode recovery is really only useful in certain specialized applications where you might want
-to discard huge portions of the input text to find a valid restart point.
-
-6.9 Line Number and Position Tracking
-
+Line Number and Position Tracking
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Position tracking is often a tricky problem when writing compilers.
-By default, PLY tracks the line number and position of all tokens.
-This information is available using the following functions:
+By default, SLY tracks the line number and position of all tokens.
+The following attributes may be useful in a production method:
-
-
+- ``p.lineno``. Line number of the left-most terminal in a production.
+- ``p.index``. Lexing index of the left-most terminal in a production.
-For example:
+For example::
-
-
-
-As an optional feature, yacc.py can automatically track line
-numbers and positions for all of the grammar symbols as well.
-However, this extra tracking requires extra processing and can
-significantly slow down parsing. Therefore, it must be enabled by
-passing the
-tracking=True option to yacc.parse(). For example:
-
-
-def p_expression(p):
- 'expression : expression PLUS expression'
- line = p.lineno(2) # line number of the PLUS token
- index = p.lexpos(2) # Position of the PLUS token
-
-
-
-
-Once enabled, the lineno() and lexpos() methods work
-for all grammar symbols. In addition, two additional methods can be
-used:
-
-
-yacc.parse(data,tracking=True)
-
-
-
-
-For example:
-
-
-
-
-Note: The lexspan() function only returns the range of values up to the start of the last grammar symbol.
-
-
-def p_expression(p):
- 'expression : expression PLUS expression'
- p.lineno(1) # Line number of the left expression
- p.lineno(2) # line number of the PLUS operator
- p.lineno(3) # line number of the right expression
- ...
- start,end = p.linespan(3) # Start,end lines of the right expression
- starti,endi = p.lexspan(3) # Start,end positions of right expression
-
-
-
-
-
-
-def p_bad_func(p):
- 'funccall : fname LPAREN error RPAREN'
- # Line number reported from LPAREN token
- print("Bad function call at line", p.lineno(2))
-
-
-
-
-PLY doesn't retain line number information from rules that have already been
-parsed. If you are building an abstract syntax tree and need to have line numbers,
-you should make sure that the line numbers appear in the tree itself.
-
-
-def p_fname(p):
- 'fname : ID'
- p[0] = p[1]
- p.set_lineno(0,p.lineno(1))
-
-6.10 AST Construction
+ @_('expr PLUS expr')
+ def expr(self, p):
+ line = p.lineno # line number of the PLUS token
+ index = p.index # Index of the PLUS token in input text
-yacc.py provides no special functions for constructing an
-abstract syntax tree. However, such construction is easy enough to do
-on your own.
+SLY doesn't propagate line number information to non-terminals. If you need
+this, you'll need to store line number information yourself and propagate it
+in AST nodes or some other data structure.
-
-
-
-
-def p_expression_binop(p):
- '''expression : expression PLUS expression
- | expression MINUS expression
- | expression TIMES expression
- | expression DIVIDE expression'''
+ @_('expr PLUS expr',
+ 'expr MINUS expr',
+ 'expr TIMES expr',
+ 'expr DIVIDE expr')
+ def expr(self, p):
+ return ('binary-expression', p[1], p[0], p[2])
- p[0] = ('binary-expression',p[2],p[1],p[3])
+ @_('LPAREN expr RPAREN')
+ def expr(self, p):
+ return ('group-expression',p[1])
-def p_expression_group(p):
- 'expression : LPAREN expression RPAREN'
- p[0] = ('group-expression',p[2])
+ @_('NUMBER')
+ def expr(self, p):
+ return ('number-expression', p[0])
-def p_expression_number(p):
- 'expression : NUMBER'
- p[0] = ('number-expression',p[1])
-
-
-
+The advantage to this approach is that it may make it easier to attach
+more complicated semantics, type checking, code generation, and other
+features to the node classes.
-The advantage to this approach is that it may make it easier to attach more complicated
-semantics, type checking, code generation, and other features to the node classes.
+Embedded Actions
+^^^^^^^^^^^^^^^^
-
-class Expr: pass
+ class Expr:
+ pass
-class BinOp(Expr):
- def __init__(self,left,op,right):
- self.type = "binop"
- self.left = left
- self.right = right
- self.op = op
+ class BinOp(Expr):
+ def __init__(self, op, left, right)
+ self.op = op
+ self.left = left
+ self.right = right
-class Number(Expr):
- def __init__(self,value):
- self.type = "number"
- self.value = value
+ class Number(Expr):
+ def __init__(self, value):
+ self.value = value
-def p_expression_binop(p):
- '''expression : expression PLUS expression
- | expression MINUS expression
- | expression TIMES expression
- | expression DIVIDE expression'''
+ @_('expr PLUS expr',
+ 'expr MINUS expr',
+ 'expr TIMES expr',
+ 'expr DIVIDE expr')
+ def expr(self, p):
+ return BinOp(p[1], p[0], p[2])
- p[0] = BinOp(p[1],p[2],p[3])
+ @_('LPAREN expr RPAREN')
+ def expr(self, p):
+ return p[1]
-def p_expression_group(p):
- 'expression : LPAREN expression RPAREN'
- p[0] = p[2]
+ @_('NUMBER')
+ def expr(self, p):
+ return Number(p[0])
-def p_expression_number(p):
- 'expression : NUMBER'
- p[0] = Number(p[1])
-
-
-
-
-
-class Node:
- def __init__(self,type,children=None,leaf=None):
- self.type = type
- if children:
- self.children = children
- else:
- self.children = [ ]
- self.leaf = leaf
-
-def p_expression_binop(p):
- '''expression : expression PLUS expression
- | expression MINUS expression
- | expression TIMES expression
- | expression DIVIDE expression'''
+ @_('A B C D')
+ def foo(self, p):
+ print("Parsed a foo", p[0],p[1],p[2],p[3])
- p[0] = Node("binop", [p[1],p[3]], p[2])
-
-6.11 Embedded Actions
-
-
-The parsing technique used by yacc only allows actions to be executed at the end of a rule. For example,
-suppose you have a rule like this:
-
-
-
-
-
-def p_foo(p):
- "foo : A B C D"
- print("Parsed a foo", p[1],p[2],p[3],p[4])
-
-
-
+In this example, the empty ``seen_A`` rule executes immediately after
+``A`` is shifted onto the parsing stack. Within this rule, ``p[-1]``
+refers to the symbol on the stack that appears immediately to the left
+of the ``seen_A`` symbol. In this case, it would be the value of
+``A`` in the ``foo`` rule immediately above. Like other rules, a
+value can be returned from an embedded action by returning it.
-
-def p_foo(p):
- "foo : A seen_A B C D"
- print("Parsed a foo", p[1],p[3],p[4],p[5])
- print("seen_A returned", p[2])
+ @_('A seen_A B C D')
+ def foo(self, p):
+ print("Parsed a foo", p[0],p[2],p[3],p[4])
+ print("seen_A returned", p[1])
-def p_seen_A(p):
- "seen_A :"
- print("Saw an A = ", p[-1]) # Access grammar symbol to left
- p[0] = some_value # Assign value to seen_A
+ @_('')
+ def seen_A(self, p):
+ print("Saw an A = ", p[-1]) # Access grammar symbol to the left
+ return 'some_value' # Assign value to seen_A
-
-
-
+However, if you insert an embedded action into one of the rules like this::
-However, if you insert an embedded action into one of the rules like this,
+ @_('abcd',
+ 'abcx')
+ def foo(self, p):
+ pass
-
-def p_foo(p):
- """foo : abcd
- | abcx"""
+ @_('A B C D')
+ def abcd(self, p):
+ pass
-def p_abcd(p):
- "abcd : A B C D"
+ @_('A B C X')
+ def abcx(self, p):
+ pass
-def p_abcx(p):
- "abcx : A B C X"
-
-
-
+ @_('')
+ def seen_AB(self, p):
+ pass
an extra shift-reduce conflict will be introduced. This conflict is
-caused by the fact that the same symbol C appears next in
-both the abcd and abcx rules. The parser can either
-shift the symbol (abcd rule) or reduce the empty
-rule seen_AB (abcx rule).
+caused by the fact that the same symbol ``C`` appears next in
+both the ``abcd`` and ``abcx`` rules. The parser can either
+shift the symbol (``abcd`` rule) or reduce the empty
+rule ``seen_AB`` (``abcx`` rule).
-
-def p_foo(p):
- """foo : abcd
- | abcx"""
+ @_('A B C D')
+ def abcd(self, p):
+ pass
-def p_abcd(p):
- "abcd : A B C D"
+ @_('A B seen_AB C X')
+ def abcx(self, p):
+ pass
-def p_abcx(p):
- "abcx : A B seen_AB C X"
-
-def p_seen_AB(p):
- "seen_AB :"
-
-
-
-
-In this case, the embedded action new_scope executes
-immediately after a LBRACE ({) symbol is parsed.
+In this case, the embedded action ``new_scope`` executes
+immediately after a ``LBRACE`` (``{``) symbol is parsed.
This might adjust internal symbol tables and other aspects of the
-parser. Upon completion of the rule statements_block, code
-might undo the operations performed in the embedded action
-(e.g., pop_scope()).
-
-
-def p_statements_block(p):
- "statements: LBRACE new_scope statements RBRACE"""
- # Action code
- ...
- pop_scope() # Return to previous scope
+ @_('LBRACE new_scope statements RBRACE')
+ def statements(self, p):
+ # Action code
+ ...
+ pop_scope() # Return to previous scope
+
+ @_('')
+ def new_scope(self, p):
+ # Create a new scope for local variables
+ create_scope()
+ ...
-def p_new_scope(p):
- "new_scope :"
- # Create a new scope for local variables
- s = new_scope()
- push_scope(s)
- ...
-
-6.12 Miscellaneous Yacc Notes
-
-
-
-
-
-
-in this case, x must be a Lexer object that minimally has a x.token() method for retrieving the next
-token. If an input string is given to yacc.parse(), the lexer must also have an x.input() method.
-
-
-parser = yacc.parse(lexer=x)
-
-
-
-
-
-parser = yacc.yacc(debug=False)
-
-
-
-
-
-parser = yacc.yacc(tabmodule="foo")
-
-
-
-
-
-parser = yacc.yacc(tabmodule="foo",outputdir="somedirectory")
-
-
-
-
-Note: If you disable table generation, yacc() will regenerate the parsing tables
-each time it runs (which may take awhile depending on how large your grammar is).
-
-
-parser = yacc.yacc(write_tables=False)
-
-
-
-
-
-parser = yacc.parse(debug=True)
-
-
-
-
-- --from functools import wraps -from nodes import Collection - - -def strict(*types): - def decorate(func): - @wraps(func) - def wrapper(p): - func(p) - if not isinstance(p[0], types): - raise TypeError - - wrapper.co_firstlineno = func.__code__.co_firstlineno - return wrapper - - return decorate - -@strict(Collection) -def p_collection(p): - """ - collection : sequence - | map - """ - p[0] = p[1] --
-As a general rules this isn't a problem. However, to make it work, -you need to carefully make sure everything gets hooked up correctly. -First, make sure you save the objects returned by lex() and -yacc(). For example: - -
-- -Next, when parsing, make sure you give the parse() function a reference to the lexer it -should be using. For example: - --lexer = lex.lex() # Return lexer object -parser = yacc.yacc() # Return parser object --
-- -If you forget to do this, the parser will use the last lexer -created--which is not always what you want. - --parser.parse(text,lexer=lexer) --
-Within lexer and parser rule functions, these objects are also -available. In the lexer, the "lexer" attribute of a token refers to -the lexer object that triggered the rule. For example: - -
-- -In the parser, the "lexer" and "parser" attributes refer to the lexer -and parser objects respectively. - --def t_NUMBER(t): - r'\d+' - ... - print(t.lexer) # Show lexer object --
-- -If necessary, arbitrary attributes can be attached to the lexer or parser object. -For example, if you wanted to have different parsing modes, you could attach a mode -attribute to the parser object and look at it later. - --def p_expr_plus(p): - 'expr : expr PLUS expr' - ... - print(p.parser) # Show parser object - print(p.lexer) # Show lexer object --
-- -then PLY can later be used when Python runs in optimized mode. To make this work, -make sure you first run Python in normal mode. Once the lexing and parsing tables -have been generated the first time, run Python in optimized mode. PLY will use -the tables without the need for doc strings. - --lex.lex(optimize=1) -yacc.yacc(optimize=1) --
-Beware: running PLY in optimized mode disables a lot of error -checking. You should only do this when your project has stabilized -and you don't need to do any debugging. One of the purposes of -optimized mode is to substantially decrease the startup time of -your compiler (by assuming that everything is already properly -specified and works). - -
-Debugging a compiler is typically not an easy task. PLY provides some -advanced diagostic capabilities through the use of Python's -logging module. The next two sections describe this: - -
-Both the lex() and yacc() commands have a debugging -mode that can be enabled using the debug flag. For example: - -
-- -Normally, the output produced by debugging is routed to either -standard error or, in the case of yacc(), to a file -parser.out. This output can be more carefully controlled -by supplying a logging object. Here is an example that adds -information about where different debugging messages are coming from: - --lex.lex(debug=True) -yacc.yacc(debug=True) --
-- -If you supply a custom logger, the amount of debugging -information produced can be controlled by setting the logging level. -Typically, debugging messages are either issued at the DEBUG, -INFO, or WARNING levels. - --# Set up a logging object -import logging -logging.basicConfig( - level = logging.DEBUG, - filename = "parselog.txt", - filemode = "w", - format = "%(filename)10s:%(lineno)4d:%(message)s" -) -log = logging.getLogger() - -lex.lex(debug=True,debuglog=log) -yacc.yacc(debug=True,debuglog=log) --
-PLY's error messages and warnings are also produced using the logging -interface. This can be controlled by passing a logging object -using the errorlog parameter. - -
-- -If you want to completely silence warnings, you can either pass in a -logging object with an appropriate filter level or use the NullLogger -object defined in either lex or yacc. For example: - --lex.lex(errorlog=log) -yacc.yacc(errorlog=log) --
-- --yacc.yacc(errorlog=yacc.NullLogger()) --
-To enable run-time debugging of a parser, use the debug option to parse. This -option can either be an integer (which simply turns debugging on or off) or an instance -of a logger object. For example: - -
-- -If a logging object is passed, you can use its filtering level to control how much -output gets generated. The INFO level is used to produce information -about rule reductions. The DEBUG level will show information about the -parsing stack, token shifts, and other details. The ERROR level shows information -related to parsing errors. - --log = logging.getLogger() -parser.parse(input,debug=log) --
-For very complicated problems, you should pass in a logging object that -redirects to a file where you can more easily inspect the output after -execution. - -