diff --git a/lexer/DESIGN.md b/lexer/DESIGN.md new file mode 100644 index 0000000..c2fcf0d --- /dev/null +++ b/lexer/DESIGN.md @@ -0,0 +1,36 @@ +# lexer + +## Responsibilities + +- Define token type, token kinds +- Turning streams of data into streams of tokens + +## Organization + +The lexer is split into its interface and implementation: + +- Lexer: public facing lexer interface +- fsplLexer: private implementation of Lexer, with public constructors + +The lexer is bound to a data stream at the time of creation, and its Next() +method may be called to read and return the next token from the stream. + +## Operation + +fsplLexer carries state information about what rune from the data stream is +currently being processed. This must always be filled out as long as there is +still data in the stream to read from. All lexer routines start off by using +this rune, and end by advancing to the next rune for the next routine to use. + +The lexer follows this general flow: + +1. Upon creation, grab the first rune to initialize the lexer state +2. When next is called... +3. Create a new token +4. Set out the token's position +5. Switch off of the current rune and call specialized lexing routines if needed +6. Expand the token's position to cover the full range + +When an EOF is detected, the lexer is marked as spent (eof: true) and will only +return EOF tokens. The lexer will only return an error alongside an EOF token if +the EOF was unexpected.