From fb43a50f8a6bd192bf1edd4cf0f6d57070420392 Mon Sep 17 00:00:00 2001 From: xpvpc <> Date: Mon, 14 May 2018 15:43:42 +0200 Subject: [PATCH 1/2] fix typos in docs --- docs/sly.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sly.rst b/docs/sly.rst index 9509abd..0831448 100644 --- a/docs/sly.rst +++ b/docs/sly.rst @@ -57,7 +57,7 @@ described by the following list of token tuples:: [ ('ID','x'), ('EQUALS','='), ('NUMBER','3'), ('PLUS','+'), ('NUMBER','42'), ('TIMES','*'), ('LPAREN','('), ('ID','s'), ('MINUS','-'), - ('ID','t'), ('RPAREN',')' ] + ('ID','t'), ('RPAREN',')') ] The SLY ``Lexer`` class is used to do this. Here is a sample of a simple lexer that tokenizes the above text:: @@ -1212,7 +1212,7 @@ appear as the last token on the right in an error rule. For example:: This is because the first bad token encountered will cause the rule to be reduced--which may make it difficult to recover if more bad tokens immediately follow. It's better to have some kind of landmark such as -a semicolon, closing parenthesese, or other token that can be used as +a semicolon, closing parentheses, or other token that can be used as a synchronization point. Panic mode recovery From 715222a0fc865f751d39c08805689c987db0096d Mon Sep 17 00:00:00 2001 From: xpvpc <> Date: Mon, 14 May 2018 15:44:21 +0200 Subject: [PATCH 2/2] remove trailing whitespace --- docs/sly.rst | 66 ++++++++++++++++++++++++++-------------------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/docs/sly.rst b/docs/sly.rst index 0831448..26c6462 100644 --- a/docs/sly.rst +++ b/docs/sly.rst @@ -2,9 +2,9 @@ SLY (Sly Lex Yacc) ================== This document provides an overview of lexing and parsing with SLY. -Given the intrinsic complexity of parsing, I would strongly advise +Given the intrinsic complexity of parsing, I would strongly advise that you read (or at least skim) this entire document before jumping -into a big development project with SLY. +into a big development project with SLY. SLY requires Python 3.6 or newer. If you're using an older version, you're out of luck. Sorry. @@ -54,7 +54,7 @@ The first step of parsing is to break the text into tokens where each token has a type and value. For example, the above text might be described by the following list of token tuples:: - [ ('ID','x'), ('EQUALS','='), ('NUMBER','3'), + [ ('ID','x'), ('EQUALS','='), ('NUMBER','3'), ('PLUS','+'), ('NUMBER','42'), ('TIMES','*'), ('LPAREN','('), ('ID','s'), ('MINUS','-'), ('ID','t'), ('RPAREN',')') ] @@ -68,7 +68,7 @@ lexer that tokenizes the above text:: class CalcLexer(Lexer): # Set of token names. This is always required - tokens = { ID, NUMBER, PLUS, MINUS, TIMES, + tokens = { ID, NUMBER, PLUS, MINUS, TIMES, DIVIDE, ASSIGN, LPAREN, RPAREN } # String containing ignored characters between tokens @@ -108,7 +108,7 @@ When executed, the example will produce the following output:: A lexer only has one public method ``tokenize()``. This is a generator function that produces a stream of ``Token`` instances. The ``type`` and ``value`` attributes of ``Token`` contain the -token type name and value respectively. +token type name and value respectively. The tokens set ^^^^^^^^^^^^^^^ @@ -122,11 +122,11 @@ In the example, the following code specified the token names:: class CalcLexer(Lexer): ... # Set of token names. This is always required - tokens = { ID, NUMBER, PLUS, MINUS, TIMES, + tokens = { ID, NUMBER, PLUS, MINUS, TIMES, DIVIDE, ASSIGN, LPAREN, RPAREN } ... -Token names should be specified using all-caps as shown. +Token names should be specified using all-caps as shown. Specification of token match patterns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -139,7 +139,7 @@ names of the tokens provided in the ``tokens`` set. For example:: MINUS = r'-' Regular expression patterns are compiled using the ``re.VERBOSE`` flag -which can be used to help readability. However, +which can be used to help readability. However, unescaped whitespace is ignored and comments are allowed in this mode. If your pattern involves whitespace, make sure you use ``\s``. If you need to match the ``#`` character, use ``[#]`` or ``\#``. @@ -189,8 +189,8 @@ comments and newlines:: ... if __name__ == '__main__': - data = '''x = 3 + 42 - * (s # This is a comment + data = '''x = 3 + 42 + * (s # This is a comment - t)''' lexer = CalcLexer() for tok in lexer.tokenize(data): @@ -219,7 +219,7 @@ object should be returned as a result. If no value is returned by the function, the token is discarded and the next token read. The ``@_()`` decorator is defined automatically within the ``Lexer`` -class--you don't need to do any kind of special import for it. +class--you don't need to do any kind of special import for it. It can also accept multiple regular expression rules. For example:: @_(r'0x[0-9a-fA-F]+', @@ -249,8 +249,8 @@ behavior. Token Remapping ^^^^^^^^^^^^^^^ -Occasionally, you might need to remap tokens based on special cases. -Consider the case of matching identifiers such as "abc", "python", or "guido". +Occasionally, you might need to remap tokens based on special cases. +Consider the case of matching identifiers such as "abc", "python", or "guido". Certain identifiers such as "if", "else", and "while" might need to be treated as special keywords. To handle this, include token remapping rules when writing the lexer like this:: @@ -272,7 +272,7 @@ writing the lexer like this:: ID['else'] = ELSE ID['while'] = WHILE -When parsing an identifier, the special cases will remap certain matching +When parsing an identifier, the special cases will remap certain matching values to a new token type. For example, if the value of an identifier is "if" above, an ``IF`` token will be generated. @@ -300,7 +300,7 @@ it does record positional information related to each token in the token's column information as a separate step. For instance, you can search backwards until you reach the previous newline:: - # Compute column. + # Compute column. # input is the input text string # token is a token instance def find_column(text, token): @@ -389,13 +389,13 @@ some other kind of error handling. A More Complete Example ^^^^^^^^^^^^^^^^^^^^^^^ -Here is a more complete example that puts many of these concepts +Here is a more complete example that puts many of these concepts into practice:: # calclex.py from sly import Lexer - + class CalcLexer(Lexer): # Set of token names. This is always required tokens = { NUMBER, ID, WHILE, IF, ELSE, PRINT, @@ -420,7 +420,7 @@ into practice:: GE = r'>=' GT = r'>' NE = r'!=' - + @_(r'\d+') def NUMBER(self, t): t.value = int(t.value) @@ -505,7 +505,7 @@ specification like this:: expr : expr + term | expr - term | term - + term : term * factor | term / factor | factor @@ -532,7 +532,7 @@ example, given the expression grammar above, you might write the specification for the operation of a simple calculator like this:: Grammar Action - ------------------------ -------------------------------- + ------------------------ -------------------------------- expr0 : expr1 + term expr0.val = expr1.val + term.val | expr1 - term expr0.val = expr1.val - term.val | term expr0.val = term.val @@ -549,7 +549,7 @@ values then propagate according to the actions described above. For example, ``factor.val = int(NUMBER.val)`` propagates the value from ``NUMBER`` to ``factor``. ``term0.val = factor.val`` propagates the value from ``factor`` to ``term``. Rules such as ``expr0.val = -expr1.val + term1.val`` combine and propagate values further. Just to +expr1.val + term1.val`` combine and propagate values further. Just to illustrate, here is how values propagate in the expression ``2 + 3 * 4``:: NUMBER.val=2 + NUMBER.val=3 * NUMBER.val=4 # NUMBER -> factor @@ -560,7 +560,7 @@ illustrate, here is how values propagate in the expression ``2 + 3 * 4``:: expr.val=2 + term.val=3 * NUMBER.val=4 # NUMBER -> factor expr.val=2 + term.val=3 * factor.val=4 # term * factor -> term expr.val=2 + term.val=12 # expr + term -> expr - expr.val=14 + expr.val=14 SLY uses a parsing technique known as LR-parsing or shift-reduce parsing. LR parsing is a bottom up technique that tries to recognize @@ -1050,7 +1050,7 @@ generate the same set of symbols. For example:: assignment : ID EQUALS NUMBER | ID EQUALS expr - + expr : expr PLUS expr | expr MINUS expr | expr TIMES expr @@ -1101,7 +1101,7 @@ states to the file you specify. Each state of the parser is shown as output that looks something like this:: state 2 - + (7) factor -> LPAREN . expr RPAREN (1) expr -> . term (2) expr -> . expr MINUS term @@ -1113,7 +1113,7 @@ as output that looks something like this:: (8) factor -> . NUMBER LPAREN shift and go to state 2 NUMBER shift and go to state 3 - + factor shift and go to state 1 term shift and go to state 4 expr shift and go to state 6 @@ -1127,7 +1127,7 @@ usually track down the source of most parsing conflicts. It should also be stressed that not all shift-reduce conflicts are bad. However, the only way to be sure that they are resolved correctly is to look at the debugging file. - + Syntax Error Handling ^^^^^^^^^^^^^^^^^^^^^ @@ -1236,7 +1236,7 @@ state:: # Read ahead looking for a closing '}' while True: tok = next(self.tokens, None) - if not tok or tok.type == 'RBRACE': + if not tok or tok.type == 'RBRACE': break self.restart() @@ -1271,12 +1271,12 @@ useful if trying to synchronize on special characters. For example:: # Read ahead looking for a terminating ";" while True: tok = next(self.tokens, None) # Get the next token - if not tok or tok.type == 'SEMI': + if not tok or tok.type == 'SEMI': break self.errok() # Return SEMI to the parser as the next lookahead token - return tok + return tok When Do Syntax Errors Get Reported? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1339,7 +1339,7 @@ are many possible ways to do this, but one example is something like this:: @_('expr PLUS expr', - 'expr MINUS expr', + 'expr MINUS expr', 'expr TIMES expr', 'expr DIVIDE expr') def expr(self, p): @@ -1357,7 +1357,7 @@ Another approach is to create a set of data structure for different kinds of abstract syntax tree nodes and create different node types in each rule:: - class Expr: + class Expr: pass class BinOp(Expr): @@ -1371,7 +1371,7 @@ in each rule:: self.value = value @_('expr PLUS expr', - 'expr MINUS expr', + 'expr MINUS expr', 'expr TIMES expr', 'expr DIVIDE expr') def expr(self, p): @@ -1494,7 +1494,7 @@ C code, you might write code like this:: # Action code ... pop_scope() # Return to previous scope - + @_('') def new_scope(self, p): # Create a new scope for local variables