fspl/parser/fspl/README.md

134 lines
4.7 KiB
Markdown

# parser
## Responsibilities
- Define syntax tree type that contains entities
- Turn streams of tokens into abstract syntax tree entities
## Organization
The entry point for all logic defined in this package is the Tree type. On this
type, the Parse() method is defined. This method creates a new parser, and uses
it to parse a stream of tokens into the tree. The details of the parser are
hidden from other packages, and instances of it only last for the duration of
Tree.Parse().
## Operation
The parser holds a pointer to the Tree that created it, as well as the lexer
that was passed to it. Its parse() method attempts to consume all tokens
produced by the lexer, parsing them into syntax entities which it places into
the tree.
parser.parse() parses top level entities, which include functions, methods, and
typedefs. For each top-level entity, the parser will call a specialized parsing
routine to parse that entity depending on the current token's kind and value.
These routines in turn call other routines, which call other routines, etc.
All parsing routines follow this general flow:
- Start with the token already present in Parser.token. Do not get the
token after it.
- Use Parser.expect(), Parser.expectValue(), etc. to test whether the token
is a valid start for the entity
- If starting by calling another parsing method, trust that method to do
this instead.
- When getting new tokens, use Parser.expectNext(),
Parser.expectNextDesc(), etc. Only use Parser.next() when getting a token
*right before* calling another parsing method, or at the *very end* of
the current method.
- To terminate the method, get the next token and do nothing with it.
- If terminating by calling another parsing method, trust that method to do
this instead.
Remember that parsing routines always start with the current token, and end by
getting a trailing token for the next method to start with. This makes it
possible to reliably switch between parsing methods depending on the type or
value of a token.
The parser must never backtrack or look ahead, but it may revise previous
data it has output upon receiving a new token that comes directly after the
last token of said previous data. For example:
- X in XYZ may not be converted to A once the parser has seen Z, but
- X in XYZ may be converted to A once the parser has seen Y.
This disallows complex and ambiguous syntax, but should allow things such as
the very occasional infix operator (like . and =)
### Expression Parsing
Expression notation is the subset of FSPL that is used to describe
computations and data/control flow. The expression parsing routine is the most
important part of the FSPL parser, and also the most complex. For each
expression, the parser follows this decision tree to determine what to parse:
```
| +Ident =Variable
| | 'true' =LiteralBoolean
| | 'false' =LiteralBoolean
| | 'nil' =LiteralNil
| | 'if' =IfElse
| | 'match' =Match
| | 'switch' =Switch
| | 'loop' =Loop
| | 'for' =For
| | +Colon =Declaration
| | +DoubleColon =Call
|
| +LParen X =LiteralArray
| | +Dot =LiteralStruct
|
| +LBracket | +Ident =Call
| | | 'break' =Break
| | | 'return' =Return
| |
| | +Dot +Expression =Dereference
| | | +Expression =Subscript
| | +Star =Operation
| |
| | +Symbol X
| | '\' =Slice
| | '#' =Length
| | '@' =Reference
| | '~' =ValueCast
| | '~~' =BitCast
| | OPERATOR =Operation
|
| +LBrace =Block
| +Int =LiteralInt
| +Float =LiteralFloat
| +String =LiteralString
```
Each branch of the decision tree is implemented as a routine with one or more
switch statements which call other routines, and each leaf is implemented as a
normal entity parsing routine.
Expressions that are only detected after more than one token has been
consumed have parsing routines with "-Core" appended to them. This means that
they do not begin at the first token in the expression, but instead at the point
where there is no longer any ambiguity as to what they are.
### Type Parsing
Type notation is the subset of FSPL that is used to describe data types. Though
it is not as complex as expression notation, it still needs a decision tree to
determine what type to parse:
```
| +Ident =TypeNamed
| +TypeIdent =TypeNamed
| +Int =TypeArray
|
| +LParen X
| | +Dot =TypeStruct
| |
| | +Symbol X
| | '&' =TypeInterface
| | '|' =TypeSum
|
| +Star =TypePointer
| +Colon =TypeSlice
```