sly/CHANGES

Version 0.3
-----------
4/1/2018   Support for Lexer inheritance added.  For example:

            from sly import Lexer

            class BaseLexer(Lexer):
                tokens = { NAME, NUMBER }
                ignore = ' \t'
		
                NAME = r'[a-zA-Z]+'
		NUMBER = r'\d+'

               
            class ChildLexer(BaseLexer):
                tokens = { PLUS, MINUS }
                PLUS = r'\+'
                MINUS = r'-'

           In this example, the ChildLexer class gets all of the tokens
           from the parent class (BaseLexer) in addition to the new
           definitions it added of its own.  

           One quirk of Lexer inheritance is that definition order has
           an impact on the low-level regular expression parsing.  By
           default new definitions are always processed AFTER any previous
           definitions.  You can change this using the before() function
           like this:

            class GrandChildLexer(ChildLexer):
                tokens = { PLUSPLUS, MINUSMINUS }
                PLUSPLUS = before(PLUS, r'\+\+')
                MINUSMINUS = before(MINUS, r'--')

           In this example, the PLUSPLUS token is checked before the
           PLUS token in the base class.  Thus, an input text of '++'
           will be parsed as a single token PLUSPLUS, not two PLUS tokens.

4/1/2018   Better support for lexing states.   Each lexing state can be defined as
           as a separate class.  Use the begin(cls) method to switch to a
           different state.  For example:

            from sly import Lexer

            class LexerA(Lexer):
                tokens = { NAME, NUMBER, LBRACE }

                ignore = ' \t'

                NAME = r'[a-zA-Z]+'
                NUMBER = r'\d+'
                LBRACE = r'\{'

                def LBRACE(self, t):
                    self.begin(LexerB)
                    return t

            class LexerB(Lexer):
                tokens = { PLUS, MINUS, RBRACE }

                ignore = ' \t'

                PLUS = r'\+'
                MINUS = r'-'
                RBRACE = r'\}'

                def RBRACE(self, t):
                    self.begin(LexerA)
                    return t

           In this example, LexerA switches to a new state LexerB when
           a left brace ({) is encountered.  The begin() method causes
           the state transition.   LexerB switches back to state LexerA
           when a right brace (}) is encountered.

           An option to the begin() method, you can also use push_state(cls)
           and pop_state(cls) methods.  This manages the lexing states as a
           stack.  The pop_state() method will return back to the previous
           lexing state.
   
1/27/2018  Tokens no longer have to be specified as strings.   For example, you
           can now write:

           from sly import Lexer

           class TheLexer(Lexer):
               tokens = { ID, NUMBER, PLUS, MINUS }

               ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
               NUMBER = r'\d+'
               PLUS = r'\+'
               MINUS = r'-'

           This convention also carries over to the parser for things such
           as precedence specifiers:

           from sly import Parser
           class TheParser(Parser):
                 tokens = TheLexer.tokens

                 precedence = (
                     ('left', PLUS, MINUS),
                     ('left', TIMES, DIVIDE),
                     ('right', UMINUS),
                  )
            ...

           Nevermind the fact that ID, NUMBER, PLUS, and MINUS appear to be
           undefined identifiers.  It all works. 

1/27/2018  Tokens now allow special-case remapping.   For example:

           from sly import Lexer

           class TheLexer(Lexer):
               tokens = { ID, IF, ELSE, WHILE, NUMBER, PLUS, MINUS }

               ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
               ID['if'] = IF
               ID['else'] = ELSE
               ID['while'] = WHILE

               NUMBER = r'\d+'
               PLUS = r'\+'
               MINUS = r'-'
       
           In this code, the ID rule matches any identifier.  However,
           special cases have been made for IF, ELSE, and WHILE tokens.
           Previously, this had to be handled in a special action method 
           such as this:

               def ID(self, t):
                   if t.value in { 'if', 'else', 'while' }:
                       t.type = t.value.upper()
                   return t

           Nevermind the fact that the syntax appears to suggest that strings
           work as a kind of mutable mapping.
      
1/16/2018  Usability improvement on Lexer class.  Regular expression rules
           specified as strings that don't match any name in tokens are
           now reported as errors.

Version 0.2
-----------

12/24/2017 The error(self, t) method of lexer objects now receives a
           token as input.  The value attribute of this token contains
           all remaining input text.  If the passed token is returned
           by error(), then it shows up in the token stream where
           can be processed by the parser.
Added extra validation check in Lexer construction 2018-01-16 08:30:09 -06:00			`Version 0.3`
			`-----------`
Improvements to lexer inheritance 2018-04-01 20:06:27 -05:00			`4/1/2018 Support for Lexer inheritance added. For example:`

			`from sly import Lexer`

			`class BaseLexer(Lexer):`
			`tokens = { NAME, NUMBER }`
			`ignore = ' \t'`

			`NAME = r'[a-zA-Z]+'`
			`NUMBER = r'\d+'`


			`class ChildLexer(BaseLexer):`
			`tokens = { PLUS, MINUS }`
			`PLUS = r'\+'`
			`MINUS = r'-'`

			`In this example, the ChildLexer class gets all of the tokens`
			`from the parent class (BaseLexer) in addition to the new`
			`definitions it added of its own.`

			`One quirk of Lexer inheritance is that definition order has`
			`an impact on the low-level regular expression parsing. By`
			`default new definitions are always processed AFTER any previous`
			`definitions. You can change this using the before() function`
			`like this:`

			`class GrandChildLexer(ChildLexer):`
			`tokens = { PLUSPLUS, MINUSMINUS }`
			`PLUSPLUS = before(PLUS, r'\+\+')`
			`MINUSMINUS = before(MINUS, r'--')`

			`In this example, the PLUSPLUS token is checked before the`
			`PLUS token in the base class. Thus, an input text of '++'`
			`will be parsed as a single token PLUSPLUS, not two PLUS tokens.`

Updated CHANGES 2018-07-07 13:54:42 -05:00			`4/1/2018 Better support for lexing states. Each lexing state can be defined as`
Improvements to lexer inheritance 2018-04-01 20:06:27 -05:00			`as a separate class. Use the begin(cls) method to switch to a`
			`different state. For example:`

			`from sly import Lexer`

			`class LexerA(Lexer):`
			`tokens = { NAME, NUMBER, LBRACE }`

			`ignore = ' \t'`

			`NAME = r'[a-zA-Z]+'`
			`NUMBER = r'\d+'`
			`LBRACE = r'\{'`

			`def LBRACE(self, t):`
			`self.begin(LexerB)`
			`return t`

			`class LexerB(Lexer):`
			`tokens = { PLUS, MINUS, RBRACE }`

			`ignore = ' \t'`

			`PLUS = r'\+'`
			`MINUS = r'-'`
			`RBRACE = r'\}'`

			`def RBRACE(self, t):`
			`self.begin(LexerA)`
			`return t`

			`In this example, LexerA switches to a new state LexerB when`
			`a left brace ({) is encountered. The begin() method causes`
			`the state transition. LexerB switches back to state LexerA`
			`when a right brace (}) is encountered.`

			`An option to the begin() method, you can also use push_state(cls)`
			`and pop_state(cls) methods. This manages the lexing states as a`
			`stack. The pop_state() method will return back to the previous`
			`lexing state.`

Changes to token specification. More metamagic 2018-01-27 15:27:15 -06:00			`1/27/2018 Tokens no longer have to be specified as strings. For example, you`
			`can now write:`

			`from sly import Lexer`

			`class TheLexer(Lexer):`
			`tokens = { ID, NUMBER, PLUS, MINUS }`

			`ID = r'[a-zA-Z_][a-zA-Z0-9_]*'`
			`NUMBER = r'\d+'`
			`PLUS = r'\+'`
			`MINUS = r'-'`

			`This convention also carries over to the parser for things such`
			`as precedence specifiers:`

			`from sly import Parser`
			`class TheParser(Parser):`
			`tokens = TheLexer.tokens`

			`precedence = (`
			`('left', PLUS, MINUS),`
			`('left', TIMES, DIVIDE),`
			`('right', UMINUS),`
			`)`
			`...`

			`Nevermind the fact that ID, NUMBER, PLUS, and MINUS appear to be`
			`undefined identifiers. It all works.`

			`1/27/2018 Tokens now allow special-case remapping. For example:`

			`from sly import Lexer`

			`class TheLexer(Lexer):`
			`tokens = { ID, IF, ELSE, WHILE, NUMBER, PLUS, MINUS }`

			`ID = r'[a-zA-Z_][a-zA-Z0-9_]*'`
			`ID['if'] = IF`
			`ID['else'] = ELSE`
			`ID['while'] = WHILE`

			`NUMBER = r'\d+'`
			`PLUS = r'\+'`
			`MINUS = r'-'`

			`In this code, the ID rule matches any identifier. However,`
			`special cases have been made for IF, ELSE, and WHILE tokens.`
			`Previously, this had to be handled in a special action method`
			`such as this:`

			`def ID(self, t):`
			`if t.value in { 'if', 'else', 'while' }:`
			`t.type = t.value.upper()`
			`return t`

			`Nevermind the fact that the syntax appears to suggest that strings`
			`work as a kind of mutable mapping.`

Added extra validation check in Lexer construction 2018-01-16 08:30:09 -06:00			`1/16/2018 Usability improvement on Lexer class. Regular expression rules`
			`specified as strings that don't match any name in tokens are`
			`now reported as errors.`

Changes for 0.2 2018-01-10 06:09:20 -06:00			`Version 0.2`
			`-----------`

			`12/24/2017 The error(self, t) method of lexer objects now receives a`
			`token as input. The value attribute of this token contains`
			`all remaining input text. If the passed token is returned`
			`by error(), then it shows up in the token stream where`
			`can be processed by the parser.`