186 lines
6.4 KiB
Raw Normal View History

2018-09-06 07:59:39 -05:00
Version 0.4
04/09/2019 Minor refinement to the reporting of reduce/reduce conflicts.
If a top grammar rule wasn't specified, SLY could fail with
a mysterious "unknown conflict" exception. This should be
2018-11-18 06:42:22 -06:00
11/18/2018 Various usability fixes observed from last compilers course.
- Errors encountered during grammar construction are now
reported as part of the raised GrammarError exception
instead of via logging. This places them in the same
visual position as normal Python errors (at the end
of the traceback)
- Repeated warning messages about unused tokens have
been consolidated in a single warning message to make
the output less verbose.
- Grammar attributes (e.g., p.TOKEN) used during parsing
are now read-only.
- The error about "infinite recursion" is only checked
if there are no undefined grammar symbols. Sometimes
you'd get this message and be confused when the only
mistake was a bad token name or similar.
2018-09-08 15:26:00 -05:00
9/8/2018 Fixed Issue #14. YaccProduction index property causes
AttributeError if index is 0
2018-09-06 07:59:39 -05:00
9/5/2018 Added support for getattr() and related functions on
Version 0.3
2018-04-01 20:06:27 -05:00
4/1/2018 Support for Lexer inheritance added. For example:
from sly import Lexer
class BaseLexer(Lexer):
tokens = { NAME, NUMBER }
ignore = ' \t'
NAME = r'[a-zA-Z]+'
NUMBER = r'\d+'
class ChildLexer(BaseLexer):
tokens = { PLUS, MINUS }
PLUS = r'\+'
MINUS = r'-'
In this example, the ChildLexer class gets all of the tokens
from the parent class (BaseLexer) in addition to the new
definitions it added of its own.
One quirk of Lexer inheritance is that definition order has
an impact on the low-level regular expression parsing. By
default new definitions are always processed AFTER any previous
definitions. You can change this using the before() function
like this:
class GrandChildLexer(ChildLexer):
PLUSPLUS = before(PLUS, r'\+\+')
MINUSMINUS = before(MINUS, r'--')
In this example, the PLUSPLUS token is checked before the
PLUS token in the base class. Thus, an input text of '++'
will be parsed as a single token PLUSPLUS, not two PLUS tokens.
2018-07-07 13:54:42 -05:00
4/1/2018 Better support for lexing states. Each lexing state can be defined as
2018-04-01 20:06:27 -05:00
as a separate class. Use the begin(cls) method to switch to a
different state. For example:
from sly import Lexer
class LexerA(Lexer):
tokens = { NAME, NUMBER, LBRACE }
ignore = ' \t'
NAME = r'[a-zA-Z]+'
NUMBER = r'\d+'
LBRACE = r'\{'
def LBRACE(self, t):
return t
class LexerB(Lexer):
tokens = { PLUS, MINUS, RBRACE }
ignore = ' \t'
PLUS = r'\+'
MINUS = r'-'
RBRACE = r'\}'
def RBRACE(self, t):
return t
In this example, LexerA switches to a new state LexerB when
a left brace ({) is encountered. The begin() method causes
the state transition. LexerB switches back to state LexerA
when a right brace (}) is encountered.
An option to the begin() method, you can also use push_state(cls)
and pop_state(cls) methods. This manages the lexing states as a
stack. The pop_state() method will return back to the previous
lexing state.
1/27/2018 Tokens no longer have to be specified as strings. For example, you
can now write:
from sly import Lexer
class TheLexer(Lexer):
tokens = { ID, NUMBER, PLUS, MINUS }
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
NUMBER = r'\d+'
PLUS = r'\+'
MINUS = r'-'
This convention also carries over to the parser for things such
as precedence specifiers:
from sly import Parser
class TheParser(Parser):
tokens = TheLexer.tokens
precedence = (
('left', PLUS, MINUS),
('left', TIMES, DIVIDE),
('right', UMINUS),
Nevermind the fact that ID, NUMBER, PLUS, and MINUS appear to be
undefined identifiers. It all works.
1/27/2018 Tokens now allow special-case remapping. For example:
from sly import Lexer
class TheLexer(Lexer):
ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
ID['if'] = IF
ID['else'] = ELSE
ID['while'] = WHILE
NUMBER = r'\d+'
PLUS = r'\+'
MINUS = r'-'
In this code, the ID rule matches any identifier. However,
special cases have been made for IF, ELSE, and WHILE tokens.
Previously, this had to be handled in a special action method
such as this:
def ID(self, t):
if t.value in { 'if', 'else', 'while' }:
t.type = t.value.upper()
return t
Nevermind the fact that the syntax appears to suggest that strings
work as a kind of mutable mapping.
1/16/2018 Usability improvement on Lexer class. Regular expression rules
specified as strings that don't match any name in tokens are
now reported as errors.
2018-01-10 06:09:20 -06:00
Version 0.2
12/24/2017 The error(self, t) method of lexer objects now receives a
token as input. The value attribute of this token contains
all remaining input text. If the passed token is returned
by error(), then it shows up in the token stream where
can be processed by the parser.