43 lines
1.6 KiB
Markdown
43 lines
1.6 KiB
Markdown
# lexer
|
|
|
|
## Responsibilities
|
|
|
|
- Define token type, token kinds
|
|
- Turning streams of data into streams of tokens
|
|
|
|
## Organization
|
|
|
|
The lexer is split into its interface and implementation:
|
|
|
|
- Lexer: public facing lexer interface
|
|
- fsplLexer: private implementation of Lexer, with public constructors
|
|
|
|
The lexer is bound to a data stream at the time of creation, and its Next()
|
|
method may be called to read and return the next token from the stream.
|
|
|
|
## Operation
|
|
|
|
fsplLexer carries state information about what rune from the data stream is
|
|
currently being processed. This must always be filled out as long as there is
|
|
still data in the stream to read from. All lexer routines start off by using
|
|
this rune, and end by advancing to the next rune for the next routine to use.
|
|
|
|
The lexer follows this general flow:
|
|
|
|
1. Upon creation, grab the first rune to initialize the lexer state
|
|
2. When next is called...
|
|
3. Create a new token
|
|
4. Set the token's position
|
|
5. Switch off of the current rune to set the token's kind and invoke specific
|
|
lexing behavior
|
|
6. Expand the token's position to cover the full range
|
|
|
|
When an EOF is detected, the lexer is marked as spent (eof: true) and will only
|
|
return EOF tokens. The lexer will only return an error alongside an EOF token if
|
|
the EOF was unexpected.
|
|
|
|
The lexer also keeps track of its current position in order to embed it into
|
|
tokens, and to print errors. It is important that the lowest level operation
|
|
used to advance the lexer's position is fsplLexer.nextRune(), as it contains
|
|
logic for keeping the position correct and maintaining the current lexer state.
|