134 lines
4.7 KiB
Markdown
134 lines
4.7 KiB
Markdown
# parser
|
|
|
|
## Responsibilities
|
|
|
|
- Define syntax tree type that contains entities
|
|
- Turn streams of tokens into abstract syntax tree entities
|
|
|
|
## Organization
|
|
|
|
The entry point for all logic defined in this package is the Tree type. On this
|
|
type, the Parse() method is defined. This method creates a new parser, and uses
|
|
it to parse a stream of tokens into the tree. The details of the parser are
|
|
hidden from other packages, and instances of it only last for the duration of
|
|
Tree.Parse().
|
|
|
|
## Operation
|
|
|
|
The parser holds a pointer to the Tree that created it, as well as the lexer
|
|
that was passed to it. Its parse() method attempts to consume all tokens
|
|
produced by the lexer, parsing them into syntax entities which it places into
|
|
the tree.
|
|
|
|
parser.parse() parses top level entities, which include functions, methods, and
|
|
typedefs. For each top-level entity, the parser will call a specialized parsing
|
|
routine to parse that entity depending on the current token's kind and value.
|
|
These routines in turn call other routines, which call other routines, etc.
|
|
|
|
All parsing routines follow this general flow:
|
|
|
|
- Start with the token already present in Parser.token. Do not get the
|
|
token after it.
|
|
- Use Parser.expect(), Parser.expectValue(), etc. to test whether the token
|
|
is a valid start for the entity
|
|
- If starting by calling another parsing method, trust that method to do
|
|
this instead.
|
|
- When getting new tokens, use Parser.expectNext(),
|
|
Parser.expectNextDesc(), etc. Only use Parser.next() when getting a token
|
|
*right before* calling another parsing method, or at the *very end* of
|
|
the current method.
|
|
- To terminate the method, get the next token and do nothing with it.
|
|
- If terminating by calling another parsing method, trust that method to do
|
|
this instead.
|
|
|
|
Remember that parsing routines always start with the current token, and end by
|
|
getting a trailing token for the next method to start with. This makes it
|
|
possible to reliably switch between parsing methods depending on the type or
|
|
value of a token.
|
|
|
|
The parser must never backtrack or look ahead, but it may revise previous
|
|
data it has output upon receiving a new token that comes directly after the
|
|
last token of said previous data. For example:
|
|
|
|
- X in XYZ may not be converted to A once the parser has seen Z, but
|
|
- X in XYZ may be converted to A once the parser has seen Y.
|
|
|
|
This disallows complex and ambiguous syntax, but should allow things such as
|
|
the very occasional infix operator (like . and =)
|
|
|
|
### Expression Parsing
|
|
|
|
Expression notation is the subset of FSPL that is used to describe
|
|
computations and data/control flow. The expression parsing routine is the most
|
|
important part of the FSPL parser, and also the most complex. For each
|
|
expression, the parser follows this decision tree to determine what to parse:
|
|
|
|
```
|
|
| +Ident =Variable
|
|
| | 'true' =LiteralBoolean
|
|
| | 'false' =LiteralBoolean
|
|
| | 'nil' =LiteralNil
|
|
| | 'if' =IfElse
|
|
| | 'match' =Match
|
|
| | 'switch' =Switch
|
|
| | 'loop' =Loop
|
|
| | 'for' =For
|
|
| | +Colon =Declaration
|
|
| | +DoubleColon =Call
|
|
|
|
|
| +LParen X =LiteralArray
|
|
| | +Dot =LiteralStruct
|
|
|
|
|
| +LBracket | +Ident =Call
|
|
| | | 'break' =Break
|
|
| | | 'return' =Return
|
|
| |
|
|
| | +Dot +Expression =Dereference
|
|
| | | +Expression =Subscript
|
|
| | +Star =Operation
|
|
| |
|
|
| | +Symbol X
|
|
| | '\' =Slice
|
|
| | '#' =Length
|
|
| | '@' =Reference
|
|
| | '~' =ValueCast
|
|
| | '~~' =BitCast
|
|
| | OPERATOR =Operation
|
|
|
|
|
| +LBrace =Block
|
|
| +Int =LiteralInt
|
|
| +Float =LiteralFloat
|
|
| +String =LiteralString
|
|
```
|
|
|
|
Each branch of the decision tree is implemented as a routine with one or more
|
|
switch statements which call other routines, and each leaf is implemented as a
|
|
normal entity parsing routine.
|
|
|
|
Expressions that are only detected after more than one token has been
|
|
consumed have parsing routines with "-Core" appended to them. This means that
|
|
they do not begin at the first token in the expression, but instead at the point
|
|
where there is no longer any ambiguity as to what they are.
|
|
|
|
### Type Parsing
|
|
|
|
Type notation is the subset of FSPL that is used to describe data types. Though
|
|
it is not as complex as expression notation, it still needs a decision tree to
|
|
determine what type to parse:
|
|
|
|
```
|
|
| +Ident =TypeNamed
|
|
| +TypeIdent =TypeNamed
|
|
| +Int =TypeArray
|
|
|
|
|
| +LParen X
|
|
| | +Dot =TypeStruct
|
|
| |
|
|
| | +Symbol X
|
|
| | '&' =TypeInterface
|
|
| | '|' =TypeSum
|
|
|
|
|
| +Star =TypePointer
|
|
| +Colon =TypeSlice
|
|
```
|