Merge pull request 'document-packages' (#30) from document-packages into main
Reviewed-on: sashakoshka/fspl#30
This commit is contained in:
commit
1593ecef7b
89
analyzer/README.md
Normal file
89
analyzer/README.md
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
# analyzer
|
||||||
|
|
||||||
|
## Responsibilities
|
||||||
|
|
||||||
|
- Define syntax tree type that contains entities
|
||||||
|
- Turn streams of tokens into abstract syntax tree entities
|
||||||
|
|
||||||
|
## Organization
|
||||||
|
|
||||||
|
The entry point for all logic defined in this package is the Tree type. On this
|
||||||
|
type, the Analyze() method is defined. This method checks the semantic
|
||||||
|
correctness of an AST, fills in semantic fields within its data structures, and
|
||||||
|
arranges them into the Tree.
|
||||||
|
|
||||||
|
Tree contains a scopeContextManager. The job of scopeContextManager is to manage
|
||||||
|
a stack of scopeContexts, which are each tied to a function or method that is
|
||||||
|
currently being analyzed. In turn, each scopeContext manages stacks of
|
||||||
|
entity.Scopes and entity.Loops. This allows for greedy/recursive analysis of
|
||||||
|
functions and methods.
|
||||||
|
|
||||||
|
## Operation
|
||||||
|
|
||||||
|
When the analyze method is called, several hidden fields in the Tree are filled
|
||||||
|
out. Tree.ensure() instantiates data that can persist between analyses, which
|
||||||
|
consists of map initialization and merging the data in the builtinTypes map into
|
||||||
|
Tree.Types.
|
||||||
|
|
||||||
|
After Tree.ensure completes, Tree.assembleRawMaps() takes top-level entities
|
||||||
|
from the AST and organizes them into rawTypes, rawFunctions, and rawMethods. It
|
||||||
|
does this so that top-level entites can be indexed by name. While doing this, it
|
||||||
|
ensures that function and type names are unique, and method names are unique
|
||||||
|
within the type they are defined on.
|
||||||
|
|
||||||
|
Next, Tree.analyzeDeclarations() is called. This is the entry point for the
|
||||||
|
actual analysis logic. For each item in the raw top-level entity maps, it calls
|
||||||
|
a specific analysis routine, which is one of:
|
||||||
|
|
||||||
|
- Tree.analyzeTypedef()
|
||||||
|
- Tree.analyzeFunction()
|
||||||
|
- Tree.analyzeMethod()
|
||||||
|
|
||||||
|
These routines all have two crucial properties that make them very useful:
|
||||||
|
|
||||||
|
- They refer to top-level entities by name instead of by memory location
|
||||||
|
- If the entity has already been analyzed, they return that entity instead of
|
||||||
|
analyzing it again
|
||||||
|
|
||||||
|
Because of this, they are also used as accessors for top level entities within
|
||||||
|
more specific analysis routines. For example, the routine Tree.analyzeCall()
|
||||||
|
will call Tree.analyzeFunction() in order to get information about the function
|
||||||
|
that is being called. If the function has not yet been analyzed, it is analyzed
|
||||||
|
(making use of scopeContextManager to push a new scopeContext), and other
|
||||||
|
routines (including Tree.analyzeDeclarations()) will not have to analyze it all
|
||||||
|
over agian. After a top-level entity has been analyzed, these routines will
|
||||||
|
always return the same pointer to the one instance of the analyzed entity.
|
||||||
|
|
||||||
|
## Expression Analysis and Assignment
|
||||||
|
|
||||||
|
Since expressions make up the bulk of FSPL, expression analysis makes up the
|
||||||
|
bulk of the semantic analyzer. Whenever an expression needs to be analyzed,
|
||||||
|
Tree.analyzeExpression() is called. This activates a switch to call one of many
|
||||||
|
specialized analysis routines based on the expression entity's concrete type.
|
||||||
|
|
||||||
|
Much of expression analysis consists of the analyze checking to see if the
|
||||||
|
result of one expression can be assigned to the input of another. To this end,
|
||||||
|
assignment rules are used. There are five different assignment modes:
|
||||||
|
|
||||||
|
- Strict: Structural equivalence, but named types are treated as opaque and are
|
||||||
|
not tested. This applies to the root of the type, and to types enclosed as
|
||||||
|
members, elements, etc. This is the assignment mode most often used.
|
||||||
|
- Weak: Like strict, but the root types specifically are compared as if they
|
||||||
|
were not named. analyzer.ReduceToBase() is used to accomplish this.
|
||||||
|
- Structural: Full structural equivalence, and named types are always reduced.
|
||||||
|
- Coerce: Data of the source type must be convert-able to the destination type.
|
||||||
|
This is used in value casts.
|
||||||
|
- Force: All assignment rules are ignored. This is only used in bit casts.
|
||||||
|
|
||||||
|
|
||||||
|
All expression analysis routines take in as a parameter the type that the result
|
||||||
|
expression is being assigned to, and the assignment mode. To figure out whether
|
||||||
|
or not they can be assigned, they in turn (usually) call Tree.canAssign().
|
||||||
|
Tree.canAssign() is used to determine whether data of a source type can be
|
||||||
|
assigned to a destination type, given an assignment mode. However, it is not
|
||||||
|
called automatically by Tree.analyzeExpression() because:
|
||||||
|
|
||||||
|
- Determining the source type is sometimes non-trivial (see
|
||||||
|
Tree.analyzeOperation())
|
||||||
|
- Literals have their own very weak assignment rules, and are designed to be
|
||||||
|
assignable to a wide range of data types
|
@ -7,16 +7,20 @@ import "git.tebibyte.media/sashakoshka/fspl/entity"
|
|||||||
import "git.tebibyte.media/sashakoshka/fspl/integer"
|
import "git.tebibyte.media/sashakoshka/fspl/integer"
|
||||||
|
|
||||||
type strictness int; const (
|
type strictness int; const (
|
||||||
// name equivalence
|
// Structural equivalence, but named types are treated as opaque and are
|
||||||
|
// not tested. This applies to the root of the type, and to types
|
||||||
|
// enclosed as members, elements, etc. This is the assignment mode most
|
||||||
|
// often used.
|
||||||
strict strictness = iota
|
strict strictness = iota
|
||||||
// structural equivalence up until the first base type, then name
|
// Like strict, but the root types specifically are compared as if they
|
||||||
// equivalence applies to the parts of the type
|
// were not named. analyzer.ReduceToBase() is used to accomplish this.
|
||||||
weak
|
weak
|
||||||
// structural equivalence
|
// Full structural equivalence, and named types are always reduced.
|
||||||
structural
|
structural
|
||||||
// allow if values can be converted
|
// Data of the source type must be convert-able to the destination type.
|
||||||
|
// This is used in value casts.
|
||||||
coerce
|
coerce
|
||||||
// assignment rules are completely ignored and everything is accepted
|
// All assignment rules are ignored. This is only used in bit casts.
|
||||||
force
|
force
|
||||||
)
|
)
|
||||||
|
|
||||||
|
@ -145,7 +145,7 @@ referring to usually the name of a function. The result of a call may be assigne
|
|||||||
any type matching the function's return type. Since it contains inherent type
|
any type matching the function's return type. Since it contains inherent type
|
||||||
information, it may be directly assigned to an interface.
|
information, it may be directly assigned to an interface.
|
||||||
### Method call
|
### Method call
|
||||||
Method call calls upon the method of the variable before the dot that is
|
Method call calls upon the method (of the expression before the dot) that is
|
||||||
specified by the first argument, passing the rest of the arguments to the
|
specified by the first argument, passing the rest of the arguments to the
|
||||||
method. The first argument must be a method name. The result of a call may be
|
method. The first argument must be a method name. The result of a call may be
|
||||||
assigned to any type matching the method's return type. Since it contains
|
assigned to any type matching the method's return type. Since it contains
|
||||||
@ -258,12 +258,15 @@ does not return anything, the return statement does not accept a value. In all
|
|||||||
cases, return statements have no value and may not be assigned to anything.
|
cases, return statements have no value and may not be assigned to anything.
|
||||||
### Assignment
|
### Assignment
|
||||||
Assignment allows assigning the result of one expression to one or more location
|
Assignment allows assigning the result of one expression to one or more location
|
||||||
expressions. The assignment statement itself has no value and may not be
|
expressions. The assignment expression itself has no value and may not be
|
||||||
assigned to anything.
|
assigned to anything.
|
||||||
|
|
||||||
# Syntax entities
|
# Syntax entities
|
||||||
|
|
||||||
Below is a rough syntax description of the language.
|
Below is a rough syntax description of the language. Note that `<assignment>`
|
||||||
|
is right-associative, and `<memberAccess>` and `<methodCall>` are
|
||||||
|
left-associative. I invite you to torture yourself by attempting to implement
|
||||||
|
this without hand-writing a parser.
|
||||||
|
|
||||||
```
|
```
|
||||||
<file> -> (<typedef> | <function> | <method>)*
|
<file> -> (<typedef> | <function> | <method>)*
|
||||||
@ -281,8 +284,8 @@ Below is a rough syntax description of the language.
|
|||||||
<pointerType> -> "*" <type>
|
<pointerType> -> "*" <type>
|
||||||
<sliceType> -> "*" ":" <type>
|
<sliceType> -> "*" ":" <type>
|
||||||
<arrayType> -> <intLiteral> ":" <type>
|
<arrayType> -> <intLiteral> ":" <type>
|
||||||
<structType> -> "(" <declaration>* ")"
|
<structType> -> "(" "." <declaration>* ")"
|
||||||
<interfaceType> -> "(" <signature> ")"
|
<interfaceType> -> "(" "~" <signature>* ")"
|
||||||
|
|
||||||
<expression> -> <intLiteral>
|
<expression> -> <intLiteral>
|
||||||
| <floatLiteral>
|
| <floatLiteral>
|
||||||
@ -302,25 +305,26 @@ Below is a rough syntax description of the language.
|
|||||||
| <operation>
|
| <operation>
|
||||||
| <block>
|
| <block>
|
||||||
| <memberAccess>
|
| <memberAccess>
|
||||||
|
| <methodCall>
|
||||||
| <ifelse>
|
| <ifelse>
|
||||||
| <loop>
|
| <loop>
|
||||||
| <break>
|
| <break>
|
||||||
| <return>
|
| <return>
|
||||||
<statement> -> <expression> | <assignment>
|
| <assignment>
|
||||||
<variable> -> <identifier>
|
<variable> -> <identifier>
|
||||||
<declaration> -> <identifier> ":" <type>
|
<declaration> -> <identifier> ":" <type>
|
||||||
<call> -> "[" <expression>+ "]"
|
<call> -> "[" <expression>+ "]"
|
||||||
<subscript> -> "[" "." <expression> <expression> "]"
|
<subscript> -> "[" "." <expression> <expression> "]"
|
||||||
<slice> -> "[" "\" <expression> <expression>? ":" <expression>? "]"
|
<slice> -> "[" "\" <expression> <expression>? "/" <expression>? "]"
|
||||||
<length> -> "[" "#" <expression> "]"
|
<length> -> "[" "#" <expression> "]"
|
||||||
<dereference> -> "[" "." <expression> "]"
|
<dereference> -> "[" "." <expression> "]"
|
||||||
<reference> -> "[" "@" <expression> "]"
|
<reference> -> "[" "@" <expression> "]"
|
||||||
<valueCast> -> "[" "~" <type> <expression> "]"
|
<valueCast> -> "[" "~" <type> <expression> "]"
|
||||||
<bitCast> -> "[" "~~" <type> <expression> "]"
|
<bitCast> -> "[" "~~" <type> <expression> "]"
|
||||||
<operation> -> "[" <operator> <expression>* "]"
|
<operation> -> "[" <operator> <expression>* "]"
|
||||||
<block> -> "{" <statement>* "}"
|
<block> -> "{" <expression>* "}"
|
||||||
<memberAccess> -> <variable> "." <identifier>
|
<memberAccess> -> <expression> "." <identifier>
|
||||||
<methodAccess> -> <variable> "." <call>
|
<methodCall> -> <expression> "." <call>
|
||||||
<ifelse> -> "if" <expression>
|
<ifelse> -> "if" <expression>
|
||||||
"then" <expression>
|
"then" <expression>
|
||||||
["else" <expression>]
|
["else" <expression>]
|
||||||
@ -336,7 +340,7 @@ Below is a rough syntax description of the language.
|
|||||||
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
|
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
|
||||||
<stringLiteral> -> /'.*'/
|
<stringLiteral> -> /'.*'/
|
||||||
<arrayLiteral> -> "(*" <expression>* ")"
|
<arrayLiteral> -> "(*" <expression>* ")"
|
||||||
<structLiteral> -> "(" <member>* ")"
|
<structLiteral> -> "(." <member>* ")"
|
||||||
<booleanLiteral> -> "true" | "false"
|
<booleanLiteral> -> "true" | "false"
|
||||||
|
|
||||||
<member> -> <identifier> ":" <expression>
|
<member> -> <identifier> ":" <expression>
|
||||||
|
86
generator/README.md
Normal file
86
generator/README.md
Normal file
@ -0,0 +1,86 @@
|
|||||||
|
# generator
|
||||||
|
|
||||||
|
## Responsibilities
|
||||||
|
|
||||||
|
Given a compilation target, turn a well-formed FSPL semantic tree into an LLVM
|
||||||
|
IR module tree.
|
||||||
|
|
||||||
|
## Organization
|
||||||
|
|
||||||
|
Generator defines the Target type, which contains information about the system
|
||||||
|
that the program is being compiled for. The native sub-package uses Go's
|
||||||
|
conditional compilation directives to provide a default Target that matches the
|
||||||
|
system the compiler has been natively built for.
|
||||||
|
|
||||||
|
The entry point for all logic defined in this package is Target.Generate(). This
|
||||||
|
method creates a new generator, and uses it to recursively generate and return an
|
||||||
|
LLVM module. The details of the generator are hidden from other packages, and
|
||||||
|
instances of it only last for the duration of Target.Generate().
|
||||||
|
|
||||||
|
The generator contains a stack of blockManagers, which plays a similar role to
|
||||||
|
analyzer.scopeContextManager, except that the stack of blockManagers is managed
|
||||||
|
directly by the generator, which contains appropriate methods for
|
||||||
|
pushing/popping them.
|
||||||
|
|
||||||
|
Like the analyzer, the generator greedily generates code, and one function may
|
||||||
|
be generated in the middle of the generation process of another function. Thus,
|
||||||
|
each blockManager is tied to a specific LLVM function, and is in charge of
|
||||||
|
variables/stack allocations and to a degree, control flow flattening
|
||||||
|
(specifically loops). It also embeds the current active block, allowing for
|
||||||
|
generator routines to call its methods to add new instructions to the current
|
||||||
|
block, and switch between different blocks when necessary.
|
||||||
|
|
||||||
|
## Operation
|
||||||
|
|
||||||
|
When Target.Generate() is called, a new generator is created. It is given the
|
||||||
|
semantic tree to generate, as well as a copy of the Target. All data structure
|
||||||
|
initialization within the generator happens at this point.
|
||||||
|
|
||||||
|
Then, the generate() method on the newly created generator is called. This is
|
||||||
|
the entry point for the actual generation logic. This routine is comprised of
|
||||||
|
two phases:
|
||||||
|
|
||||||
|
- Function generation
|
||||||
|
- Method generation
|
||||||
|
|
||||||
|
You'll notice that there is no step for type generation. This is because types
|
||||||
|
are generated on-demand in order to reduce IR clutter.
|
||||||
|
|
||||||
|
## Expression Generation
|
||||||
|
|
||||||
|
Since expressions make up the bulk of FSPL, expression generation makes up the
|
||||||
|
bulk of the code generator. The generator is able to produce expressions in one
|
||||||
|
of three modes:
|
||||||
|
|
||||||
|
- Location: The generator will return an IR register that contains a pointer to
|
||||||
|
the result of the expression.
|
||||||
|
- Value: The generator will return an IR register that directly contains the
|
||||||
|
result of the expression.
|
||||||
|
- Any: The generator will decide which of these two options is best for the
|
||||||
|
specific expression, and will let the caller know which was chosen, in case it
|
||||||
|
cares. Some expressions are better suited to returning a pointer, such as
|
||||||
|
array subscripting or member access. Other expressions are better suited to
|
||||||
|
returning a value, such as arithmetic operators and function calls.
|
||||||
|
|
||||||
|
It is important to note that generating a Value expression may return a pointer,
|
||||||
|
because *FSPL pointers are first-class values*. The distinction between location
|
||||||
|
and value generation modes is purely to do with LLVM. It is similar to the
|
||||||
|
concept of location expressions within the analyzer, but not 100% identical all
|
||||||
|
of the time.
|
||||||
|
|
||||||
|
Whenever an expression needs to be generated, one of the following routines is
|
||||||
|
called:
|
||||||
|
|
||||||
|
- generator.generateExpression()
|
||||||
|
- generator.generateAny()
|
||||||
|
- generator.generateVal()
|
||||||
|
- generator.generateLoc()
|
||||||
|
|
||||||
|
The generator.generateExpression() routine takes in a mode value and depending
|
||||||
|
on it, calls one of the other more specific routines. Each of these routines, in
|
||||||
|
turn, calls a more specialized generation routine depending on the specific
|
||||||
|
expression.
|
||||||
|
|
||||||
|
If it is specifically requested to generate a value for an expression with only
|
||||||
|
its location component defined or vice versa, generator.generateVal/Loc() will
|
||||||
|
automatically perform the conversion.
|
42
lexer/README.md
Normal file
42
lexer/README.md
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
# lexer
|
||||||
|
|
||||||
|
## Responsibilities
|
||||||
|
|
||||||
|
- Define token type, token kinds
|
||||||
|
- Turning streams of data into streams of tokens
|
||||||
|
|
||||||
|
## Organization
|
||||||
|
|
||||||
|
The lexer is split into its interface and implementation:
|
||||||
|
|
||||||
|
- Lexer: public facing lexer interface
|
||||||
|
- fsplLexer: private implementation of Lexer, with public constructors
|
||||||
|
|
||||||
|
The lexer is bound to a data stream at the time of creation, and its Next()
|
||||||
|
method may be called to read and return the next token from the stream.
|
||||||
|
|
||||||
|
## Operation
|
||||||
|
|
||||||
|
fsplLexer carries state information about what rune from the data stream is
|
||||||
|
currently being processed. This must always be filled out as long as there is
|
||||||
|
still data in the stream to read from. All lexer routines start off by using
|
||||||
|
this rune, and end by advancing to the next rune for the next routine to use.
|
||||||
|
|
||||||
|
The lexer follows this general flow:
|
||||||
|
|
||||||
|
1. Upon creation, grab the first rune to initialize the lexer state
|
||||||
|
2. When next is called...
|
||||||
|
3. Create a new token
|
||||||
|
4. Set the token's position
|
||||||
|
5. Switch off of the current rune to set the token's kind and invoke specific
|
||||||
|
lexing behavior
|
||||||
|
6. Expand the token's position to cover the full range
|
||||||
|
|
||||||
|
When an EOF is detected, the lexer is marked as spent (eof: true) and will only
|
||||||
|
return EOF tokens. The lexer will only return an error alongside an EOF token if
|
||||||
|
the EOF was unexpected.
|
||||||
|
|
||||||
|
The lexer also keeps track of its current position in order to embed it into
|
||||||
|
tokens, and to print errors. It is important that the lowest level operation
|
||||||
|
used to advance the lexer's position is fsplLexer.nextRune(), as it contains
|
||||||
|
logic for keeping the position correct and maintaining the current lexer state.
|
128
parser/README.md
Normal file
128
parser/README.md
Normal file
@ -0,0 +1,128 @@
|
|||||||
|
# parser
|
||||||
|
|
||||||
|
## Responsibilities
|
||||||
|
|
||||||
|
- Define syntax tree type that contains entities
|
||||||
|
- Turn streams of tokens into abstract syntax tree entities
|
||||||
|
|
||||||
|
## Organization
|
||||||
|
|
||||||
|
The entry point for all logic defined in this package is the Tree type. On this
|
||||||
|
type, the Parse() method is defined. This method creates a new parser, and uses
|
||||||
|
it to parse a stream of tokens into the tree. The details of the parser are
|
||||||
|
hidden from other packages, and instances of it only last for the duration of
|
||||||
|
Tree.Parse().
|
||||||
|
|
||||||
|
## Operation
|
||||||
|
|
||||||
|
The parser holds a pointer to the Tree that created it, as well as the lexer
|
||||||
|
that was passed to it. Its parse() method attempts to consume all tokens
|
||||||
|
produced by the lexer, parsing them into syntax entities which it places into
|
||||||
|
the tree.
|
||||||
|
|
||||||
|
parser.parse() parses top level entities, which include functions, methods, and
|
||||||
|
typedefs. For each top-level entity, the parser will call a specialized parsing
|
||||||
|
routine to parse that entity depending on the current token's kind and value.
|
||||||
|
These routines in turn call other routines, which call other routines, etc.
|
||||||
|
|
||||||
|
All parsing routines follow this general flow:
|
||||||
|
|
||||||
|
- Start with the token already present in Parser.token. Do not get the
|
||||||
|
token after it.
|
||||||
|
- Use Parser.expect(), Parser.expectValue(), etc. to test whether the token
|
||||||
|
is a valid start for the entity
|
||||||
|
- If starting by calling another parsing method, trust that method to do
|
||||||
|
this instead.
|
||||||
|
- When getting new tokens, use Parser.expectNext(),
|
||||||
|
Parser.expectNextDesc(), etc. Only use Parser.next() when getting a token
|
||||||
|
*right before* calling another parsing method, or at the *very end* of
|
||||||
|
the current method.
|
||||||
|
- To terminate the method, get the next token and do nothing with it.
|
||||||
|
- If terminating by calling another parsing method, trust that method to do
|
||||||
|
this instead.
|
||||||
|
|
||||||
|
Remember that parsing routines always start with the current token, and end by
|
||||||
|
getting a trailing token for the next method to start with. This makes it
|
||||||
|
possible to reliably switch between parsing methods depending on the type or
|
||||||
|
value of a token.
|
||||||
|
|
||||||
|
The parser must never backtrack or look ahead, but it may revise previous
|
||||||
|
data it has output upon receiving a new token that comes directly after the
|
||||||
|
last token of said previous data. For example:
|
||||||
|
|
||||||
|
- X in XYZ may not be converted to A once the parser has seen Z, but
|
||||||
|
- X in XYZ may be converted to A once the parser has seen Y.
|
||||||
|
|
||||||
|
This disallows complex and ambiguous syntax, but should allow things such as
|
||||||
|
the very occasional infix operator (like . and =)
|
||||||
|
|
||||||
|
### Expression Parsing
|
||||||
|
|
||||||
|
Expression notation is the subset of FSPL that is used to describe
|
||||||
|
computations and data/control flow. The expression parsing routine is the most
|
||||||
|
important part of the FSPL parser, and also the most complex. For each
|
||||||
|
expression, the parser follows this decision tree to determine what to parse:
|
||||||
|
|
||||||
|
```
|
||||||
|
| +Ident =Variable
|
||||||
|
| | 'true' =LiteralBoolean
|
||||||
|
| | 'false' =LiteralBoolean
|
||||||
|
| | 'nil' =LiteralNil
|
||||||
|
| | 'if' =IfElse
|
||||||
|
| | 'loop' =Loop
|
||||||
|
| | +Colon =Declaration
|
||||||
|
| | +DoubleColon =Call
|
||||||
|
|
|
||||||
|
| +LParen X
|
||||||
|
| | +Star =LiteralArray
|
||||||
|
| | +Dot =LiteralStruct
|
||||||
|
|
|
||||||
|
| +LBracket | +Ident =Call
|
||||||
|
| | | 'break' =Break
|
||||||
|
| | | 'return' =Return
|
||||||
|
| |
|
||||||
|
| | +Dot +Expression =Dereference
|
||||||
|
| | | +Expression =Subscript
|
||||||
|
| | +Star =Operation
|
||||||
|
| |
|
||||||
|
| | +Symbol X
|
||||||
|
| | '\' =Slice
|
||||||
|
| | '#' =Length
|
||||||
|
| | '@' =Reference
|
||||||
|
| | '~' =ValueCast
|
||||||
|
| | '~~' =BitCast
|
||||||
|
| | OPERATOR =Operation
|
||||||
|
|
|
||||||
|
| +LBrace =Block
|
||||||
|
| +Int =LiteralInt
|
||||||
|
| +Float =LiteralFloat
|
||||||
|
| +String =LiteralString
|
||||||
|
```
|
||||||
|
|
||||||
|
Each branch of the decision tree is implemented as a routine with one or more
|
||||||
|
switch statements which call other routines, and each leaf is implemented as a
|
||||||
|
normal entity parsing routine.
|
||||||
|
|
||||||
|
Expressions that are only detected after more than one token has been
|
||||||
|
consumed have parsing routines with "-Core" appended to them. This means that
|
||||||
|
they do not begin at the first token in the expression, but instead at the point
|
||||||
|
where there is no longer any ambiguity as to what they are.
|
||||||
|
|
||||||
|
### Type Parsing
|
||||||
|
|
||||||
|
Type notation is the subset of FSPL that is used to describe data types. Though
|
||||||
|
it is not as complex as expression notation, it still needs a decision tree to
|
||||||
|
determine what type to parse:
|
||||||
|
|
||||||
|
```
|
||||||
|
| +Ident =TypeNamed
|
||||||
|
| +TypeIdent =TypeNamed
|
||||||
|
| +Int =TypeArray
|
||||||
|
|
|
||||||
|
| +LParen X
|
||||||
|
| | +Dot =TypeStruct
|
||||||
|
| | +Symbol '~' =TypeInterface
|
||||||
|
|
|
||||||
|
| +Star =TypePointer
|
||||||
|
| +Colon =TypeSlice
|
||||||
|
```
|
@ -4,44 +4,6 @@ import "git.tebibyte.media/sashakoshka/fspl/lexer"
|
|||||||
import "git.tebibyte.media/sashakoshka/fspl/errors"
|
import "git.tebibyte.media/sashakoshka/fspl/errors"
|
||||||
import "git.tebibyte.media/sashakoshka/fspl/entity"
|
import "git.tebibyte.media/sashakoshka/fspl/entity"
|
||||||
|
|
||||||
// Expression decision tree flow:
|
|
||||||
//
|
|
||||||
// | +Ident =Variable
|
|
||||||
// | | 'true' =LiteralBoolean
|
|
||||||
// | | 'false' =LiteralBoolean
|
|
||||||
// | | 'nil' =LiteralNil
|
|
||||||
// | | 'if' =IfElse
|
|
||||||
// | | 'loop' =Loop
|
|
||||||
// | | +Colon =Declaration
|
|
||||||
// | | +DoubleColon =Call
|
|
||||||
// |
|
|
||||||
// | +LParen X
|
|
||||||
// | | +Star =LiteralArray
|
|
||||||
// | | +Dot =LiteralStruct
|
|
||||||
// |
|
|
||||||
// | +LBracket | +Ident =Call
|
|
||||||
// | | | 'break' =Break
|
|
||||||
// | | | 'return' =Return
|
|
||||||
// | |
|
|
||||||
// | | +Dot +Expression =Dereference
|
|
||||||
// | | | +Expression =Subscript
|
|
||||||
// | | +Star =Operation
|
|
||||||
// | |
|
|
||||||
// | | +Symbol X
|
|
||||||
// | | '\' =Slice
|
|
||||||
// | | '#' =Length
|
|
||||||
// | | '@' =Reference
|
|
||||||
// | | '~' =ValueCast
|
|
||||||
// | | '~~' =BitCast
|
|
||||||
// | | OPERATOR =Operation
|
|
||||||
// |
|
|
||||||
// | +LBrace =Block
|
|
||||||
// | +Int =LiteralInt
|
|
||||||
// | +Float =LiteralFloat
|
|
||||||
// | +String =LiteralString
|
|
||||||
//
|
|
||||||
// Entities with a star have yet to be implemented
|
|
||||||
|
|
||||||
var descriptionExpression = "expression"
|
var descriptionExpression = "expression"
|
||||||
var startTokensExpression = []lexer.TokenKind {
|
var startTokensExpression = []lexer.TokenKind {
|
||||||
lexer.Ident,
|
lexer.Ident,
|
||||||
|
@ -5,36 +5,6 @@ import "fmt"
|
|||||||
import "git.tebibyte.media/sashakoshka/fspl/lexer"
|
import "git.tebibyte.media/sashakoshka/fspl/lexer"
|
||||||
import "git.tebibyte.media/sashakoshka/fspl/errors"
|
import "git.tebibyte.media/sashakoshka/fspl/errors"
|
||||||
|
|
||||||
// When writing a parsing method on Parser, follow this flow:
|
|
||||||
// - Start with the token already present in Parser.token. Do not get the
|
|
||||||
// token after it.
|
|
||||||
// - Use Parser.expect(), Parser.expectValue(), etc. to test whether the token
|
|
||||||
// is a valid start for the entity
|
|
||||||
// - If starting by calling another parsing method, trust that method to do
|
|
||||||
// this instead.
|
|
||||||
// - When getting new tokens, use Parser.expectNext(),
|
|
||||||
// Parser.expectNextDesc(), etc. Only use Parser.next() when getting a token
|
|
||||||
// *right before* calling another parsing method, or at the *very end* of
|
|
||||||
// the current method.
|
|
||||||
// - To terminate the method, get the next token and do nothing with it.
|
|
||||||
// - If terminating by calling another parsing method, trust that method to do
|
|
||||||
// this instead.
|
|
||||||
//
|
|
||||||
// Remember that parsing methods always start with the current token, and end by
|
|
||||||
// getting a trailing token for the next method to start with. This makes it
|
|
||||||
// possible to reliably switch between parsing methods depending on the type or
|
|
||||||
// value of a token.
|
|
||||||
//
|
|
||||||
// The parser must never backtrack or look ahead, but it may revise previous
|
|
||||||
// data it has output upon receiving a new token that comes directly after the
|
|
||||||
// last token of said previous data. For example:
|
|
||||||
//
|
|
||||||
// X in XYZ may not be converted to A once the parser has seen Z, but
|
|
||||||
// X in XYZ may be converted to A once the parser has seen Y.
|
|
||||||
//
|
|
||||||
// This disallows complex and ambiguous syntax, but should allow things such as
|
|
||||||
// the very occasional infix operator (like . and =)
|
|
||||||
|
|
||||||
// Parser parses tokens from a lexer into syntax entities, which it places into
|
// Parser parses tokens from a lexer into syntax entities, which it places into
|
||||||
// a tree.
|
// a tree.
|
||||||
type Parser struct {
|
type Parser struct {
|
||||||
|
Loading…
Reference in New Issue
Block a user