258 lines
9.3 KiB
Plaintext
258 lines
9.3 KiB
Plaintext
Version 0.5
|
|
-----------
|
|
10/25/2022 ***IMPORTANT NOTE*** This is the last release to be made
|
|
on PyPi. If you want the latest version go to
|
|
https://github.com/dabeaz/sly.
|
|
|
|
09/06/2022 Modernization of the packaging infrastructure. Slight
|
|
project reorganization.
|
|
|
|
03/25/2022 Added automatic location tracking to the parser. Use
|
|
Parser.line_position(value) to return the line number
|
|
and Parser.index_position(value) to return a (start, end)
|
|
index pair. value is *any* object returned by one of
|
|
the various methods in the parser definition. Typically,
|
|
it would be a AST node. The parser tracks the data using
|
|
the value of id(value).
|
|
|
|
03/25/2022 Added .end attribute to tokens that specify the ending
|
|
index of the matching text. This is used to do more
|
|
precise location tracking for the purpose of issuing
|
|
more useful error messages.
|
|
|
|
05/09/2020 Experimental support for EBNF choices. For example:
|
|
|
|
@('term { PLUS|MINUS term }')
|
|
def expr(self, p):
|
|
lterm = p.pterm0
|
|
for op, rterm in p[1]:
|
|
lterm = BinOp(op, lterm, rterm)
|
|
|
|
One issue here is just how one refers to the choice
|
|
of values. There is no unified name to pick. So,
|
|
you basically have to do it using a numeric index like p[1].
|
|
In this case, p[1] is a list of all of the repeated items
|
|
(represented as tuples).
|
|
|
|
05/09/2020 Changed the internal names used for EBNF rules to make them
|
|
a bit easier to debug in the parser.out file.
|
|
|
|
Version 0.4
|
|
-----------
|
|
|
|
03/06/2020 Added experimental support for EBNF repetition and optional
|
|
syntax. For example, here is a rule for a comma-separated
|
|
expression list:
|
|
|
|
@('expr { COMMA expr }')
|
|
def exprlist(self, p):
|
|
return [ p.expr0 ] + p.expr1
|
|
|
|
In this code, the { ... } means zero-or-more repetitions.
|
|
It turns all symbols inside into lists. So, instead of
|
|
representing a single value, p.expr1 is now a list of
|
|
values.
|
|
|
|
An optional value can be enclosed in brackets like this:
|
|
|
|
@('VAR NAME [ EQUAL expr ] SEMI')
|
|
def variable_declaration(self, p):
|
|
print(f"Definining {p.NAME}. Initial value={p.expr}")
|
|
|
|
In this case, all symbols inside [ ... ] either have a value
|
|
if present or are assigned to None if missing.
|
|
|
|
In both cases, you continue to use the same name indexing
|
|
scheme used by the rest of SLY. For example, in the first
|
|
example above, you use "expr0" and "expr1" to refer to the
|
|
different "expr" symbols since that name appears in more
|
|
than one place.
|
|
|
|
04/09/2019 Fixed very mysterious error message that resulted if you
|
|
defined a grammar rule called "start". start can now
|
|
be a string or a function.
|
|
|
|
04/09/2019 Minor refinement to the reporting of reduce/reduce conflicts.
|
|
If a top grammar rule wasn't specified, SLY could fail with
|
|
a mysterious "unknown conflict" exception. This should be
|
|
fixed.
|
|
|
|
11/18/2018 Various usability fixes observed from last compilers course.
|
|
|
|
- Errors encountered during grammar construction are now
|
|
reported as part of the raised GrammarError exception
|
|
instead of via logging. This places them in the same
|
|
visual position as normal Python errors (at the end
|
|
of the traceback)
|
|
|
|
- Repeated warning messages about unused tokens have
|
|
been consolidated in a single warning message to make
|
|
the output less verbose.
|
|
|
|
- Grammar attributes (e.g., p.TOKEN) used during parsing
|
|
are now read-only.
|
|
|
|
- The error about "infinite recursion" is only checked
|
|
if there are no undefined grammar symbols. Sometimes
|
|
you'd get this message and be confused when the only
|
|
mistake was a bad token name or similar.
|
|
|
|
|
|
9/8/2018 Fixed Issue #14. YaccProduction index property causes
|
|
AttributeError if index is 0
|
|
|
|
9/5/2018 Added support for getattr() and related functions on
|
|
productions.
|
|
|
|
Version 0.3
|
|
-----------
|
|
4/1/2018 Support for Lexer inheritance added. For example:
|
|
|
|
from sly import Lexer
|
|
|
|
class BaseLexer(Lexer):
|
|
tokens = { NAME, NUMBER }
|
|
ignore = ' \t'
|
|
|
|
NAME = r'[a-zA-Z]+'
|
|
NUMBER = r'\d+'
|
|
|
|
|
|
class ChildLexer(BaseLexer):
|
|
tokens = { PLUS, MINUS }
|
|
PLUS = r'\+'
|
|
MINUS = r'-'
|
|
|
|
In this example, the ChildLexer class gets all of the tokens
|
|
from the parent class (BaseLexer) in addition to the new
|
|
definitions it added of its own.
|
|
|
|
One quirk of Lexer inheritance is that definition order has
|
|
an impact on the low-level regular expression parsing. By
|
|
default new definitions are always processed AFTER any previous
|
|
definitions. You can change this using the before() function
|
|
like this:
|
|
|
|
class GrandChildLexer(ChildLexer):
|
|
tokens = { PLUSPLUS, MINUSMINUS }
|
|
PLUSPLUS = before(PLUS, r'\+\+')
|
|
MINUSMINUS = before(MINUS, r'--')
|
|
|
|
In this example, the PLUSPLUS token is checked before the
|
|
PLUS token in the base class. Thus, an input text of '++'
|
|
will be parsed as a single token PLUSPLUS, not two PLUS tokens.
|
|
|
|
4/1/2018 Better support for lexing states. Each lexing state can be defined as
|
|
as a separate class. Use the begin(cls) method to switch to a
|
|
different state. For example:
|
|
|
|
from sly import Lexer
|
|
|
|
class LexerA(Lexer):
|
|
tokens = { NAME, NUMBER, LBRACE }
|
|
|
|
ignore = ' \t'
|
|
|
|
NAME = r'[a-zA-Z]+'
|
|
NUMBER = r'\d+'
|
|
LBRACE = r'\{'
|
|
|
|
def LBRACE(self, t):
|
|
self.begin(LexerB)
|
|
return t
|
|
|
|
class LexerB(Lexer):
|
|
tokens = { PLUS, MINUS, RBRACE }
|
|
|
|
ignore = ' \t'
|
|
|
|
PLUS = r'\+'
|
|
MINUS = r'-'
|
|
RBRACE = r'\}'
|
|
|
|
def RBRACE(self, t):
|
|
self.begin(LexerA)
|
|
return t
|
|
|
|
In this example, LexerA switches to a new state LexerB when
|
|
a left brace ({) is encountered. The begin() method causes
|
|
the state transition. LexerB switches back to state LexerA
|
|
when a right brace (}) is encountered.
|
|
|
|
An option to the begin() method, you can also use push_state(cls)
|
|
and pop_state(cls) methods. This manages the lexing states as a
|
|
stack. The pop_state() method will return back to the previous
|
|
lexing state.
|
|
|
|
1/27/2018 Tokens no longer have to be specified as strings. For example, you
|
|
can now write:
|
|
|
|
from sly import Lexer
|
|
|
|
class TheLexer(Lexer):
|
|
tokens = { ID, NUMBER, PLUS, MINUS }
|
|
|
|
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
|
|
NUMBER = r'\d+'
|
|
PLUS = r'\+'
|
|
MINUS = r'-'
|
|
|
|
This convention also carries over to the parser for things such
|
|
as precedence specifiers:
|
|
|
|
from sly import Parser
|
|
class TheParser(Parser):
|
|
tokens = TheLexer.tokens
|
|
|
|
precedence = (
|
|
('left', PLUS, MINUS),
|
|
('left', TIMES, DIVIDE),
|
|
('right', UMINUS),
|
|
)
|
|
...
|
|
|
|
Nevermind the fact that ID, NUMBER, PLUS, and MINUS appear to be
|
|
undefined identifiers. It all works.
|
|
|
|
1/27/2018 Tokens now allow special-case remapping. For example:
|
|
|
|
from sly import Lexer
|
|
|
|
class TheLexer(Lexer):
|
|
tokens = { ID, IF, ELSE, WHILE, NUMBER, PLUS, MINUS }
|
|
|
|
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
|
|
ID['if'] = IF
|
|
ID['else'] = ELSE
|
|
ID['while'] = WHILE
|
|
|
|
NUMBER = r'\d+'
|
|
PLUS = r'\+'
|
|
MINUS = r'-'
|
|
|
|
In this code, the ID rule matches any identifier. However,
|
|
special cases have been made for IF, ELSE, and WHILE tokens.
|
|
Previously, this had to be handled in a special action method
|
|
such as this:
|
|
|
|
def ID(self, t):
|
|
if t.value in { 'if', 'else', 'while' }:
|
|
t.type = t.value.upper()
|
|
return t
|
|
|
|
Nevermind the fact that the syntax appears to suggest that strings
|
|
work as a kind of mutable mapping.
|
|
|
|
1/16/2018 Usability improvement on Lexer class. Regular expression rules
|
|
specified as strings that don't match any name in tokens are
|
|
now reported as errors.
|
|
|
|
Version 0.2
|
|
-----------
|
|
|
|
12/24/2017 The error(self, t) method of lexer objects now receives a
|
|
token as input. The value attribute of this token contains
|
|
all remaining input text. If the passed token is returned
|
|
by error(), then it shows up in the token stream where
|
|
can be processed by the parser.
|