fspl/lexer
Sasha Koshka 3297f6671e Lexer skips over zero runes now 2024-02-23 01:08:58 -05:00
..
README.md DESIGN.md -> README.md 2024-02-09 18:02:03 -05:00
doc.go Added module overview doc comment to lexer 2024-02-08 23:05:36 -05:00
lexer.go Lexer skips over zero runes now 2024-02-23 01:08:58 -05:00
lexer_test.go FINALLY errors and lexer agree on row/col positions properly 2024-02-06 22:11:46 -05:00
test-common.go Changed repository import paths 2024-02-22 19:22:53 -05:00

README.md

lexer

Responsibilities

  • Define token type, token kinds
  • Turning streams of data into streams of tokens

Organization

The lexer is split into its interface and implementation:

  • Lexer: public facing lexer interface
  • fsplLexer: private implementation of Lexer, with public constructors

The lexer is bound to a data stream at the time of creation, and its Next() method may be called to read and return the next token from the stream.

Operation

fsplLexer carries state information about what rune from the data stream is currently being processed. This must always be filled out as long as there is still data in the stream to read from. All lexer routines start off by using this rune, and end by advancing to the next rune for the next routine to use.

The lexer follows this general flow:

  1. Upon creation, grab the first rune to initialize the lexer state
  2. When next is called...
  3. Create a new token
  4. Set the token's position
  5. Switch off of the current rune to set the token's kind and invoke specific lexing behavior
  6. Expand the token's position to cover the full range

When an EOF is detected, the lexer is marked as spent (eof: true) and will only return EOF tokens. The lexer will only return an error alongside an EOF token if the EOF was unexpected.

The lexer also keeps track of its current position in order to embed it into tokens, and to print errors. It is important that the lowest level operation used to advance the lexer's position is fsplLexer.nextRune(), as it contains logic for keeping the position correct and maintaining the current lexer state.