History

Sasha Koshka d8f82d3646 Parse switch expressions		2024-03-25 02:18:18 -04:00
..
README.md	Parse default match cases	2024-03-25 01:53:40 -04:00
doc.go	Separated parser into two packages	2024-02-13 13:03:22 -05:00
expression.go	Parse switch expressions	2024-03-25 02:18:18 -04:00
literal.go	Remove the star from array literals	2024-03-24 20:31:25 -04:00
misc.go	Updated parser	2024-03-14 03:14:08 -04:00
parser.go	Changed repository import paths	2024-02-22 19:22:53 -05:00
parser_test.go	Parse switch expressions	2024-03-25 02:18:18 -04:00
test-common.go	Changed repository import paths	2024-02-22 19:22:53 -05:00
toplevel.go	Updated parser	2024-03-14 03:14:08 -04:00
tree.go	Changed repository import paths	2024-02-22 19:22:53 -05:00
type.go	Change interface symbol from ~ to &	2024-03-13 23:05:58 -04:00
unicode_test.go	Real quick add unicode test to parser	2024-02-15 12:43:09 -05:00

README.md

parser

Responsibilities

Define syntax tree type that contains entities
Turn streams of tokens into abstract syntax tree entities

Organization

The entry point for all logic defined in this package is the Tree type. On this type, the Parse() method is defined. This method creates a new parser, and uses it to parse a stream of tokens into the tree. The details of the parser are hidden from other packages, and instances of it only last for the duration of Tree.Parse().

Operation

The parser holds a pointer to the Tree that created it, as well as the lexer that was passed to it. Its parse() method attempts to consume all tokens produced by the lexer, parsing them into syntax entities which it places into the tree.

parser.parse() parses top level entities, which include functions, methods, and typedefs. For each top-level entity, the parser will call a specialized parsing routine to parse that entity depending on the current token's kind and value. These routines in turn call other routines, which call other routines, etc.

All parsing routines follow this general flow:

Start with the token already present in Parser.token. Do not get the token after it.
Use Parser.expect(), Parser.expectValue(), etc. to test whether the token is a valid start for the entity
If starting by calling another parsing method, trust that method to do this instead.
When getting new tokens, use Parser.expectNext(), Parser.expectNextDesc(), etc. Only use Parser.next() when getting a token right before calling another parsing method, or at the very end of the current method.
To terminate the method, get the next token and do nothing with it.
If terminating by calling another parsing method, trust that method to do this instead.

Remember that parsing routines always start with the current token, and end by getting a trailing token for the next method to start with. This makes it possible to reliably switch between parsing methods depending on the type or value of a token.

The parser must never backtrack or look ahead, but it may revise previous data it has output upon receiving a new token that comes directly after the last token of said previous data. For example:

X in XYZ may not be converted to A once the parser has seen Z, but
X in XYZ may be converted to A once the parser has seen Y.

This disallows complex and ambiguous syntax, but should allow things such as the very occasional infix operator (like . and =)

Expression Parsing

Expression notation is the subset of FSPL that is used to describe computations and data/control flow. The expression parsing routine is the most important part of the FSPL parser, and also the most complex. For each expression, the parser follows this decision tree to determine what to parse:

| +Ident =Variable
|        | 'true'       =LiteralBoolean
|        | 'false'      =LiteralBoolean
|        | 'nil'        =LiteralNil
|        | 'if'         =IfElse
|        | 'match'      =Match
|        | 'switch'     =Switch
|        | 'loop'       =Loop
|        | +Colon       =Declaration
|        | +DoubleColon =Call
|
| +LParen X =LiteralArray
|         | +Dot =LiteralStruct
|
| +LBracket | +Ident =Call
|           |        | 'break'  =Break
|           |        | 'return' =Return
|           | 
|           | +Dot +Expression =Dereference
|           |                  | +Expression =Subscript
|           | +Star =Operation
|           |
|           | +Symbol X
|                     | '\'      =Slice
|                     | '#'      =Length
|                     | '@'      =Reference
|                     | '~'      =ValueCast
|                     | '~~'     =BitCast
|                     | OPERATOR =Operation
|
| +LBrace =Block
| +Int    =LiteralInt
| +Float  =LiteralFloat
| +String =LiteralString

Each branch of the decision tree is implemented as a routine with one or more switch statements which call other routines, and each leaf is implemented as a normal entity parsing routine.

Expressions that are only detected after more than one token has been consumed have parsing routines with "-Core" appended to them. This means that they do not begin at the first token in the expression, but instead at the point where there is no longer any ambiguity as to what they are.

Type Parsing

Type notation is the subset of FSPL that is used to describe data types. Though it is not as complex as expression notation, it still needs a decision tree to determine what type to parse:

| +Ident     =TypeNamed
| +TypeIdent =TypeNamed
| +Int       =TypeArray
|
| +LParen X
|         | +Dot =TypeStruct
|         |
|         | +Symbol X
|                   | '&' =TypeInterface
|                   | '|' =TypeSum
|
| +Star =TypePointer
        | +Colon     =TypeSlice