Ok now slices actually

This commit is contained in:
Sasha Koshka 2023-09-22 00:58:06 -04:00
parent 53c0bb7171
commit 57e41e48ff
5 changed files with 493 additions and 6 deletions

80
design/analyzer.md Normal file
View File

@ -0,0 +1,80 @@
# Analyzer
Package analyzer:
- Ensures an AST is semantically correct
- Fills in semantic information within the AST
- Gives named types a pointer to the defined type they reference
- Gives variables a pointer to the declaration they reference
- Gives all expressions, including literals, a type even if it is void
- Organizes keyed entities into maps
- Checks to make sure there are no duplicate key names
- Returns the resulting AST as a Tree
## Types
### Scoped
- Variable (name string) entity.Declaration
- AddVariable (entity.Declaration)
Functions, methods,
### Tree
Tree acts as a semantic tree to contain entities from the entity package that
have semantic information filled in.
#### Data
- Raw map of names -> types
- Raw map of names -> functions
- Raw map of type.names -> methods
- Completed map of names -> types
- Completed map of names -> functions
- Scope breadcrumb trail
- Every expression is assigned a Type when its type is determined
- Methods are moved into a map within their types
#### Methods
##### Analyze(parser.Tree) error
Analyze takes in an AST and analyzes it within its own context. If this method
is called multiple times, it will parse all of the trees as if they were one.
This method returns an error if the tree has a semantic error and cannot be
turned into a proper semantic tree.
First, add all top level AST declarations to quick access maps and make sure
their names are unique.
For each top level declaration, call one of the analysis functions with its
name.
##### pushScope (Scoped)
Pushes a scope onto the scope trail
##### popScope ()
Removes the last scope from the scope trail
##### variable (string) entity.Declaration
Returns a named variable, and nil if it doesn't exist. Goes through all scopes
from the top of the trail to the bottom until it finds one.
##### addVariable (string, entity.Declaration)
Adds a variable to the top scope.
##### analyzeTypedef
- If already analyzed return what has been analyzed
- Get typedef from raw map
- Analyze type
- Store analyzed type in completed map
##### analyzeFunction
- If already analyzed return what has been analyzed
- Get function from raw map
- Analyze signature
- Analyze expression
- Store analyzed function in completed map
##### analyzeMethod
- Analyze type
- If already analyzed return what has been analyzed
- Get method from raw map
- Analyze signature
- Analyze expression
- Store analyzed method in type

118
design/compiler.md Normal file
View File

@ -0,0 +1,118 @@
# Main
Package main:
- Parses CLI arguments
- Displays help
- Compiles file(s)
- Displays any errors in a readable way
## Subroutines
### Compile (inputFiles ...string, outputFile string) error
- Create AST
- For each input file:
- Feed file into AST
- If error, print error and terminate
- Feed AST from parser into analyzer
- If error, print error and terminate
- Open temporary IR output pipe
- Feed semantic tree from analyzer and the output pipe into generator
- If error, print error and terminate
- Invoke LLVM on temporary IR output pipe, instruct it to output to output
file
- Close termporary IR output pipe
# Entity
Package entity defines types to represent language entities, as well as some
convenience methods for dealing with them.
## Types
- TopLevel
- Function
- Typedef
- Method
- Type
- TypeNamed
- TypePointer
- TypeArray
- TypeStruct
- TypeInterface
- Expression
- LiteralInt
- LiteralFloat
- LiteralArray
- LiteralStruct
- Variable
- Declaration
- Call
- Subscript
- Dereference
- Reference
- ValueCast
- BitCast
- Operation
- Block
- MemberAccess
- IfElse
- Loop
- Break
- Return
- Assignment
- Member
- Signature
# Parser
Package parser parses data in an input io.Reader into an AST.
## Types
### Tree
Tree acts as an AST to contain top-level declarations from the entity package.
#### Methods
##### Parse(name string, file io.Reader) error
Parse takes in a file name, and an io.Reader and parses its contents into the
tree. The name is only used for error reporting purposes, this method does not
open any files. It returns an error if the syntax read from the input reader is
erroneous and cannot be parsed.
##### ParseFile(name string) error
ParseFile is like Parse, except it automatically opens and closes the file
specified by the given name.
# Analyzer
Package analyzer:
- Ensures an AST is semantically correct
- Fills in semantic information within the AST
- Gives named types a pointer to the defined type they reference
- Gives variables a pointer to the declaration they reference
- Gives all expressions, including literals, a type even if it is void
- Organizes keyed entities into maps
- Checks to make sure there are no duplicate key names
- Returns the resulting AST as a Tree
## Types
### Tree
Tree acts as a semantic tree to contain entities from the entity package that
have semantic information filled in.
#### Methods
##### Analyze(parser.Tree) error
Analyze takes in an AST and analyzes it within its own context. If this method
is called multiple times, it will parse all of the trees as if they were one.
This method returns an error if the tree has a semantic error and cannot be
turned into a proper semantic tree.
# Generator
Package generator turns semantic trees into LLVM IR and outputs it to an
io.Writer.
## Subroutines
### Generate (analyzer.Tree, io.Writer) error
Generate takes in a semantic tree and writes corresponding LLVM IR to the given
io.Writer. It returns an error in case there is something wrong with the
semantic tree that prevents the code generation process from occurring.

274
design/spec.md Normal file
View File

@ -0,0 +1,274 @@
# Semantic entities
## Top level
### Type definition
Type definitions bind a type to a global identifier.
### Function
Functions bind a global identifier and argument list to an expression which is
evaluated each time the function is called. If no expression is specified, the
function is marked as external. Functions have an argument list, where each
argument is passed as a separate variable. They return one value. All of these
are typed.
### Method
A method is like a function, except localized to a defined type. Methods are
called on an instance of that type, and receive a pointer to that instance via
the "this" variable. Method names are not globally unique, bur are unique within
the type they are defined on.
## Types
### Named
Named refers to a user-defined or built in named type.
### Pointer
Pointer is a pointer to another type.
### Array
Array is a group of values of a given type stored next to eachother. The length
of an array is fixed and is part of its type. Arrays are passed by value unless
a pointer is used.
### Struct
Struct is a composite type that stores keyed values. The positions of the values
within the struct are decided at compile time, based on the order they are
specified in. Structs are passed by value unless a pointer is used.
### Interface
Interface is a polymorphic pointer that allows any value of any type through,
except it must have at least the methods defined within the interface.
Interfaces are always passed by reference. When assigning a value to an
interface, it will be referenced automatically. When assigning a pointer to an
interface, the pointer's reference will be used instead.
## Expressions
### Location expressions
Location expressions are special expressions that only refer to the location of
a memory address. An expression is only a location expression if its value
originates from another location expression. Such expressions are marked here
with a star (*).
### Literals
#### Integer
An integer literal specifies an integer value. It can be assigned to any type
that is derived from an integer, or a float.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Float
A float literal specifies a floating point value. It can be assigned to any type
that is derived from a float.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Array
Array is a composite array literal. It can contain any number of values. It can
be assigned to any array type that:
1. has an identical length, and
2. who's element type can be assigned to by all the element values in the
literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Struct
Struct is a composite structure literal. It can contain any number of name:value
pairs. It can be assigned to any struct type that:
1. has at least the members specified in the literal
2. who's member types can be assigned to by the corresponding member values in
the literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
### Variable *
Variable specifies a named variable. It can be assigned to a type matching the
variable declaration's type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Declaration *
Declaration binds a local identifier to a typed variable, but also acts as a
variable expression allowing the variable to be used the moment it is defined.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Block
Block is an ordered collection of expressions that are evaluated sequentially.
It has its own scope. The last expression in the block specifies the block's
value, and any assignment rules of the block are equivalent to those of its last
expression.
### Call
Call calls upon the function specified by the first argument, and passes the
rest of that argument to the function. The first argument must be a function
type, usually the name of a function. The result of a call may be assigned to
any type matching the function's return type. Since it contains inherent type
information, it may be directly assigned to an interface.
### Member access *
Member access allows referring to a specific member of a value with a struct
type. It accepts any struct type that contains the specified member name, and
may be assigned to any type that matches the type of the selected member. Since
it contains inherent type information, it may be directly assigned to an
interface.
### Method access
Method access allows referring to a specific method of a type, or a behavior of
an interface. It can only be assigned to the first argument of a call.
### Array subscript *
Array subscripting allows referring to a specific element of an array. It
accepts any array, and any offset of type Size. It may be assigned to any type
matching the array's element type. Since it contains inherent type information,
it may be directly assigned to an interface.
### Pointer dereference *
Pointer dereferencing allows retrieving the value of a pointer. It accepts any
pointer. It may be assigned to any type matching the pointer's pointed type.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Value reference
Value referencing allows retrieving the location of a value in memory. It
accepts any location expression, and can be assigned to any type that is a
pointer to the location expression's type. Since it contains inherent type
information, it can be directly assigned to an interface, although it doesn't
make a whole lot of sense to do so because assigning a value to an interface
automatically references it anyway.
### Bit casting
Bit casting takes the raw data in memory of a certain value and re-interprets it
as a value of another type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Value casting
Vaue casting converts a value of a certain type to another type. Since it
contains inherent type information, it may be directly assigned to an interface.
### Operations
Operations perform math, logic, or bit manipulation on values. They accept
values of the same type as the type they are being assigned to, except in
special cases. Since they contain no inherent type information, they may not be
assigned to interfaces.
#### Math
Mathematical operations perform math on numeric values.
- `+` Returns the sum of all arguments
- `++` Returns the sum of all arguments, plus 1
- `-` Returns all arguments after the first subtracted from the first
- `--` Returns all arguments after the first subtracted from the first, minus 1
- `*` Returns the product of all arguments
- `/` Returns A0 / A1 / ... / An
- `%` Returns the remainder of the first argument divided by the second.
#### Logic
Logic operations perform logic on booleans.
- `!` Returns the logical negation of the argument
- `|` Returns the logical OR of all arguments
- `&` Returns the logical AND of all arguments
- `^` Returns the logical XOR of all arguments
#### Bit manipulation
Bit manipulation allows for manipulating values at the binary level. These work
on all types except reference types.
- `!!` Returns the bitwise negation of the argument
- `||` Returns the bitwise OR of all arguments
- `&&` Returns the bitwise AND of all arguments
- `^^` Returns the bitwise XOR of all aruments
- `<<` Returns the first argument bit-shifted to the left by the second
argument. The second argument must be an integer.
- `>>` Returns the first argument bit-shifted to the right by the second
argument. The second argument must be an integer.
#### Comparison
Comparison operations compare two values and return a boolean.
- `<` Returns if all operands are in ascending order from left to right.
- `>` Returns if all operands are in descending order from left to right.
- `<=` Returns if all operands are in ascending order from left to right,
allowing equal operands.
- `>=` Returns if all operands are in descending order from left to right,
allowing equal operands.
- `=` Returns if all operands are equal to eachother.
Comparison operations are the only constructs in FSPL which are allowed to infer
their argument types. The rules for this are as follows:
- If at least one argument has type information, that type is used for all
arguments that do not.
- Else, fail. Optionally call the user an idiot if this is because they directly
compared two literals.
### If/else
If/else is a control flow branching expression that executes one of two
expressions depending on a boolean value. If the value of the if/else is unused,
the else expression need not be specified. It may be assigned to any type that
satisfies the assignment rules of both the true and false expressions.
### Loop
Loop is a control flow expression that repeats an expression until a break
statement is called from within it. The break statement must be given a value
if the value of the loop is used. Otherwise, it need not even have a break
statement. The result of the loop may be assigned to any type that satisfies the
assignment rules of all of its break statements. Loops may be nested, and break
statements only apply to the closest containing loop. The value of the loop's
expression is never used.
### Break
Break allows breaking out of loops. It has no value and may not be assigned to
anything.
### Return
Return allows terminating functions before they have reached their end. It
accepts values that may be assigned to the function's return type. If a function
does not return anything, the return statement does not accept a value. In all
cases, return statements have no value and may not be assigned to anything.
### Assignment
Assignment allows assigning the result of one expression to one or more location
expressions. The assignment statement itself has no value and may not be
assigned to anything.
# Syntax entities
```
<file> -> (<typedef> | <function> | <method>)*
<typedef> -> <identifier> ":" <type>
<function> -> <signature> ["=" <expression>]
<method> -> <identifier> "." <function>
<type> -> <namedType>
| <pointerType>
| <arrayType>
| <structType>
| <interfaceType>
<namedType> -> <identifier>
<pointerType> -> "*" <type>
<arrayType> -> <intLiteral> "x" <type>
<structType> -> "(" <declaration>* ")"
<interfaceType> -> "(" <signature> ")"
<expression> -> <intLiteral>
| <floatLiteral>
| <arrayLiteral>
| <structLiteral>
| <variable>
| <declaration>
| <call>
| <subscript>
| <dereference>
| <reference>
| <valueCast>
| <bitCast>
| <operation>
| <block>
| <memberAccess>
| <ifelse>
| <loop>
| <break>
| <return>
<statement> -> <expression> | <assignment>
<variable> -> <identifier>
<declaration> -> <identifier> ":" <type>
<call> -> "[" <expression>+ "]"
<subscript> -> "[" "." <expression> <expression> "]"
<dereference> -> "[" "." <expression> "]"
<reference> -> "[" "@" <expression> "]"
<valueCast> -> "[" "cast" <type> <expression> "]"
<bitCast> -> "[" "bitcast" <type> <expression> "]"
<operation> -> "[" <operator> <expression>* "]"
<block> -> "{" <statement>* "}"
<memberAccess> -> <variable> "." <identifier>
<methodAccess> -> <variable> "::" <identifier>
<ifelse> -> "if" <expression>
"then" <expression>
["else" <expression>]
<loop> -> "loop" <expression>
<break> -> "[" "break" [<expression>] "]"
<return> -> "[" "return" [<expression>] "]"
<assignment> -> <expression> "=" <expression>
<intLiteral> -> /-?[1-9][0-9]*/
| /-?0[0-7]*/
| /-?0x[0-9a-fA-F]*/
| /-?0b[0-1]*/
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
<arrayLiteral> -> "(*" <expression>* ")"
<structLiteral> -> "(" <member>* ")"
<member> -> <identifier> ":" <expression>
<signature> -> "[" <identifier> <declaration>* "]" [":" <type>]
<identifier> -> /[A-Za-z]+/
<operator> -> "+" | "++" | "-" | "--" | "*" | "/" | "%"
| "!!" | "||" | "&&" | "^^"
| "!" | "|" | "&" | "^" | "<<" | ">>"
| "<" | ">" | "<=" | ">=" | "="
```

View File

@ -78,7 +78,7 @@ type Subscript struct {
func (*Subscript) expression(){}
func (*Subscript) statement(){}
func (this *Subscript) String () string {
return fmt.Sprint("[.", this.Array, this.Offset, "]")
return fmt.Sprint("[.", this.Array, " ", this.Offset, "]")
}
// Slice adjusts the start and end points of a slice relative to its current
@ -87,9 +87,22 @@ func (this *Subscript) String () string {
// is never a valid location expression.
type Slice struct {
Pos lexer.Position
Slice Expression `parser:" '[' '\\' @@ "`
Start Expression `parser:" @@? "`
End Expression `parser:" ':' @@? ']' "`
Slice Expression `parser:" '[' '\\\\' @@ "`
Start Expression `parser:" @@? "`
End Expression `parser:" ':' @@? ']' "`
}
func (*Slice) expression(){}
func (*Slice) statement(){}
func (this *Slice) String () string {
out := fmt.Sprint("[\\", this.Slice, " ")
if this.Start != nil {
out += fmt.Sprint(this.Start)
}
out += ":"
if this.End != nil {
out += fmt.Sprint(this.End)
}
return out + "]"
}
// Pointer dereferencing allows retrieving the value of a pointer. It accepts

View File

@ -39,7 +39,8 @@ testString (test,
[call] = [print 3 1]
[emptyBlock] = {}
[nestedBlock] = {342 93.34 {3948 32}}
[subscript] = [.arr 3]
[subscript] = [.(* 3 1 2 4) 3]
[slice] = [\(* 1 2 3 4 5 6 7 8 9) consumed:]
[dereference] = [.ptr]
[reference] = [@val]
[valueCast] = [cast F32 someValue]
@ -65,7 +66,8 @@ testString (test,
32
}
}
[subscript] = [. arr 3]
[subscript] = [. (* 3 1 2 4) 3]
[slice] = [\ (* 1 2 3 4 5 6 7 8 9) consumed:]
[dereference] = [. ptr]
[reference] = [@ val]
[valueCast] = [cast F32 someValue]