fspl/design/spec.md
2024-03-25 00:46:30 -04:00

406 lines
19 KiB
Markdown

# Semantic entities
## Top level
Top level entities are defined directly within the source file(s) of a unit, and
can be made available to other units via access control modes. The three modes
are:
- Public access: Allows other modules to access an entity normally.
- Opaque access: Causes a top-level entity to appear opaque to other units.
Values of opaque types can be passed around, assigned to each-other, and their
methods can be called, but the implementation of the type is entirely hidden.
This access mode cannot be applied to functions or methods.
- Private access: Disallows other modules from accessing a top-level entity.
This mode is the default when one isn't supplied.
### Type definition
Type definitions bind a type to a global type identifier.
### Function
Functions bind a global identifier and argument list to an expression which is
evaluated each time the function is called. If no expression is specified, the
function is marked as external. Functions have an argument list, where each
argument is passed as a separate variable. They return one value. All of these
are typed.
### Method
A method is like a function, except localized to a defined type. Methods are
called on an instance of that type, and receive a pointer to that instance via
the "this" variable. Method names are not globally unique, but are unique within
the type they are defined on.
## Types
### Named
Named refers to a user-defined, primitive, or built-in named type.
### Pointer
Pointer is a pointer to another type.
### Array
Array is a group of values of a given type stored next to eachother. The length
of an array is fixed and is part of its type. Arrays are passed by value unless
a pointer is used.
### Slice
Slice is a pointer to several values of a given type stored next to eachother.
Its length is not built into its type and can be changed at runtime.
### Struct
Struct is a composite type that stores keyed values. The positions of the values
within the struct are decided at compile time, based on the order they are
specified in. Structs are passed by value unless a pointer is used.
### Interface
Interface is a polymorphic pointer that allows any value of any type through,
except it must have at least the methods defined within the interface.
Interfaces are always passed by reference. When assigning a value to an
interface, it will be referenced automatically. When assigning a pointer to an
interface, the pointer's reference will be used instead.
### Union
Union is a polymorphic type that can hold any value as long as it is one of a
list of allowed types. It is not a pointer. It holds the hash of the actual type
of the value stored within it, followed by the value. The hash field is computed
using the type's name, and the UUID that it was defined in. If it is not named,
then the hash is computed using the structure of the type. The value field is
always big enough to hold the largest type in the allowed list. The value of a
union can only be extracted using a match expression.
## Primitive types
### Int
Int is defined as a signed system word.
### UInt
UInt is defined as an unsigned system word.
### I8, I16, I32, I64
I8-I64 are defined as signed 8, 16, 32, and 64 bit integers respectively.
### U8, U16, U32, U64
U8-U64 are defined as unsigned 8, 16, 32, and 64 bit integers respectively.
### F32, F64
F32 and F64 are defined as single-precision and double-precision floating point
types respectively.
## Built-in types
### Index
Index is defined as an unsigned system word. It is used to describe the size of
chunks of memory, and to index arrays and such.
### Byte
Byte is defined as the smallest addressable integer. It is unsigned. It is
usually equivalent to U8.
### Bool
Bool is a boolean type. It is equivalent to U1. For now, since arbitrary width
integers are not supported, it is the only way to get a U1 type.
### Rune
Rune is defined as a U32. It represents a single UTF-32 code point.
### String
String is defined as a slice of U8's. It represents a UTF-8 string. It is not
conventionally null-terminated, but a null can be added at the end manually if
desired.
## Expressions
### Location expressions
Location expressions are special expressions that only refer to the location of
a memory address. An expression is only a location expression if its value
originates from another location expression. Such expressions are marked here
with a star (*).
### Literals
#### Integer
An integer literal specifies an integer value. It can be assigned to any type
that is derived from an integer or a float, as long as the value of the literal
can fit within the range of the type.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Float
A float literal specifies a floating point value. It can be assigned to any type
that is derived from a float.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### String
A string literal specifies a string value. It takes on different data
representations depending on what the base type of what it is assigned to is
structurally equivalent to:
- Integer: Single unicode code point. When assigning to an integer, the
string literal may not be longer than one code point, and that code point
must fit in the integer.
- Slice of 8 bit integers: UTF-8 string.
- Slice of 16 bit integers: UTF-16 string.
- Slice of 32 bit (or larger) integers: UTF-32 string.
- Array of integers: The same as slices of integers, but the string literal
must fit inside of the array.
- Pointer to 8 bit integer: Null-terminated UTF-8 string (AKA C-string).
A string literal cannot be directly assigned to an interface because it
contains no inherent type information. A value cast may be used for this
purpose.
#### Array
Array is a composite array literal. It can contain any number of values. It can
be assigned to any array type that:
1. has an identical length, and
2. who's element type can be assigned to by all the element values in the
literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Struct
Struct is a composite structure literal. It can contain any number of name:value
pairs. It can be assigned to any struct type that:
1. has at least the members specified in the literal
2. who's member types can be assigned to by the corresponding member values in
the literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Boolean
Boolean is a boolean literal. It may be either true or false. It can be assigned
to any type derived from a boolean. It cannot be directly assigned to an
interface because it contains no inherent type information. A value cast may be
used for this purpose.
### Variable *
Variable specifies a named variable. It can be assigned to a type matching the
variable declaration's type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Declaration *
Declaration binds a local identifier to a typed variable, but also acts as a
variable expression allowing the variable to be used the moment it is defined.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Block
Block is an ordered collection of expressions that are evaluated sequentially.
It has its own scope. The last expression in the block specifies the block's
value, and any assignment rules of the block are equivalent to those of its last
expression.
### Call
Call calls upon the function specified by the first argument, and passes the
rest of that argument to the function. The first argument must be an identifier
referring to usually the name of a function. The result of a call may be assigned to
any type matching the function's return type. Since it contains inherent type
information, it may be directly assigned to an interface.
### Method call
Method call calls upon the method (of the expression before the dot) that is
specified by the first argument, passing the rest of the arguments to the
method. The first argument must be a method name. The result of a call may be
assigned to any type matching the method's return type. Since it contains
inherent type information, it may be directly assigned to an interface.
### Member access *
Member access allows referring to a specific member of a value with a struct
type. It accepts any struct type that contains the specified member name, and
may be assigned to any type that matches the type of the selected member. Since
it contains inherent type information, it may be directly assigned to an
interface.
### Array subscript *
Array subscripting allows referring to a specific element of an array. It
accepts any array, and any offset of type Size. It may be assigned to any type
matching the array's element type. Since it contains inherent type information,
it may be directly assigned to an interface.
### Slice
Slice adjusts the start and end points of a slice relative to its current
starting index, and returns an adjusted copy pointing to the same data. Any
assignment rules of this expression are equivalent to those of the slice it is
operating on.
### Length
Length returns the length of an array or a slice. It always returns a value
of type Index.
### Pointer dereference *
Pointer dereferencing allows retrieving the value of a pointer. It accepts any
pointer. It may be assigned to any type matching the pointer's pointed type.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Value reference
Value referencing allows retrieving the location of a value in memory. It
accepts any location expression, and can be assigned to any type that is a
pointer to the location expression's type. Since it contains inherent type
information, it can be directly assigned to an interface, although it doesn't
make a whole lot of sense to do so because assigning a value to an interface
automatically references it anyway.
### Bit casting
Bit casting takes the raw data in memory of a certain value and re-interprets it
as a value of another type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Value casting
Vaue casting converts a value of a certain type to another type. Since it
contains inherent type information, it may be directly assigned to an interface.
### Operations
Operations perform math, logic, or bit manipulation on values. They accept
values of the same type as the type they are being assigned to, except in
special cases. Since they contain no inherent type information, they may not be
assigned to interfaces.
#### Math
Mathematical operations perform math on numeric values.
- `+` Returns the sum of all arguments
- `++` Returns the sum of all arguments, plus 1
- `-` Returns all arguments after the first subtracted from the first
- `--` Returns all arguments after the first subtracted from the first, minus 1
- `*` Returns the product of all arguments
- `/` Returns A0 / A1 / ... / An
- `%` Returns the remainder of the first argument divided by the second.
#### Logic
Logic operations perform logic on booleans.
- `!` Returns the logical negation of the argument
- `|` Returns the logical OR of all arguments
- `&` Returns the logical AND of all arguments
- `^` Returns the logical XOR of all arguments, right to left
#### Bit manipulation
Bit manipulation allows for manipulating values at the binary level. These work
on all types except reference types.
- `!!` Returns the bitwise negation of the argument
- `||` Returns the bitwise OR of all arguments
- `&&` Returns the bitwise AND of all arguments
- `^^` Returns the bitwise XOR of all aruments, right to left
- `<<` Returns the first argument bit-shifted to the left by the second
argument. The second argument must be an integer.
- `>>` Returns the first argument bit-shifted to the right by the second
argument. The second argument must be an integer.
#### Comparison
Comparison operations compare two values and return a boolean.
- `<` Returns if all operands are in ascending order from left to right.
- `>` Returns if all operands are in descending order from left to right.
- `<=` Returns if all operands are in ascending order from left to right,
allowing equal operands.
- `>=` Returns if all operands are in descending order from left to right,
allowing equal operands.
- `=` Returns if all operands are equal to eachother.
Comparison operations are the only constructs in FSPL which are allowed to infer
their argument types. The rules for this are as follows:
- If at least one argument has type information, that type is used for all
arguments that do not.
- Else, fail. Optionally call the user an idiot if this is because they directly
compared two literals.
### If/else
If/else is a control flow branching expression that executes one of two
expressions depending on a boolean value. If the value of the if/else is unused,
the else expression need not be specified. It may be assigned to any type that
satisfies the assignment rules of both the true and false expressions.
### Match
Match is a control flow branching expression that executes one of several case
expressions depending on the input. It can be used to check the type of a union
value. Each case takes the form of a declaration, and an associated expression.
If the type of the union matches the type of the declaration in the case, the
expression is executed and the value of the union is made available to it
through the declaration. If the value of the match expression is used, all
possible types in the union must be accounted for, or it must have a default
case. It may be assigned to any type that satisfies the assignment rules of its
first case.
### Switch
Switch is a control flow branching expression that executes one of several case
expressions depending on the value of the input. It accepts any pointer or
integer type. If the value of the switch expression is used, a default case must
be present. It may be assigned to any type that satisfies the assignment rules
of its first case.
### Loop
Loop is a control flow expression that repeats an expression until a break
statement is called from within it. The break statement must be given a value
if the value of the loop is used. Otherwise, it need not even have a break
statement. The result of the loop may be assigned to any type that satisfies the
assignment rules of all of its break statements. Loops may be nested, and break
statements only apply to the closest containing loop. The value of the loop's
expression is never used.
### For
For is a special kind of loop that evaluates an expression for each element of
an array or slice. It accepts an index declaration and an element declaration,
which are scoped to the loop's body and are set to the index of the current
element and the element itself respectively at the beginning of each iteration.
The assignment rules of a for statement are identical to that of a normal loop.
### Break
Break allows breaking out of loops. It may be assigned to anything, but the
assignment will have no effect as execution of the code after and surrounding it
will cease.
### Return
Return allows terminating functions before they have reached their end. It
accepts values that may be assigned to the function's return type. If a function
does not return anything, the return statement does not accept a value. It may
be assigned to anything, but the assignment will have no effect as execution of
the function or method will cease.
### Assignment
Assignment allows assigning the result of one expression to one or more location
expressions. The assignment expression itself has no value and may not be
assigned to anything.
# Syntax entities
Below is a rough syntax description of the language. Note that `<assignment>`
is right-associative, and `<memberAccess>` and `<methodCall>` are
left-associative. I invite you to torture yourself by attempting to implement
this without hand-writing a parser.
```
<file> -> (<typedef> | <function> | <method>)*
<access> -> "+" | "#" | "-"
<typedef> -> [<access>] <typeIdentifier> ":" <type>
<function> -> [<access>] <signature> ["=" <expression>]
<method> -> [<access>] <typeIdentifier> "." <function>
<type> -> <namedType>
| <pointerType>
| <sliceType>
| <arrayType>
| <structType>
| <interfaceType>
<namedType> -> <typeIdentifier>
<pointerType> -> "*" <type>
<sliceType> -> "*" ":" <type>
<arrayType> -> <intLiteral> ":" <type>
<structType> -> "(" "." <declaration>* ")"
<interfaceType> -> "(" "&" <signature>* ")"
<unionType> -> "(" "|" <type>* ")"
<expression> -> <intLiteral>
| <floatLiteral>
| <stringLiteral>
| <arrayLiteral>
| <structLiteral>
| <booleanLiteral>
| <variable>
| <declaration>
| <call>
| <subscript>
| <length>
| <dereference>
| <reference>
| <valueCast>
| <bitCast>
| <operation>
| <block>
| <memberAccess>
| <methodCall>
| <ifelse>
| <loop>
| <break>
| <return>
| <assignment>
<variable> -> <identifier>
<declaration> -> <identifier> ":" <type>
<call> -> "[" <expression>+ "]"
<subscript> -> "[" "." <expression> <expression> "]"
<slice> -> "[" "\" <expression> <expression>? "/" <expression>? "]"
<length> -> "[" "#" <expression> "]"
<dereference> -> "[" "." <expression> "]"
<reference> -> "[" "@" <expression> "]"
<valueCast> -> "[" "~" <type> <expression> "]"
<bitCast> -> "[" "~~" <type> <expression> "]"
<operation> -> "[" <operator> <expression>* "]"
<block> -> "{" <expression>* "}"
<memberAccess> -> <expression> "." <identifier>
<methodCall> -> <expression> "." <call>
<ifelse> -> "if" <expression>
"then" <expression>
["else" <expression>]
<match> -> "match" <expression> <matchCase>* [<defaultCase>]
<switch> -> "switch" <expression> <switchCase>* [<defaultCase>]
<loop> -> "loop" <expression>
<for> -> "for" <expression> [<expression>] in <expression> <expression>
<break> -> "[" "break" [<expression>] "]"
<return> -> "[" "return" [<expression>] "]"
<assignment> -> <expression> "=" <expression>
<intLiteral> -> /-?[1-9][0-9]*/
| /-?0[0-7]*/
| /-?0x[0-9a-fA-F]*/
| /-?0b[0-1]*/
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
<stringLiteral> -> /'.*'/
<arrayLiteral> -> "(*" <expression>* ")"
<structLiteral> -> "(." <member>* ")"
<booleanLiteral> -> "true" | "false"
<member> -> <identifier> ":" <expression>
<matchCase> -> "|" <declaration> <expression>
<switchCase> -> "|" <expression> <expression>
<defaultCase> -> "*" <expression>
<signature> -> "[" <identifier> <declaration>* "]" [":" <type>]
<identifier> -> /[a-z][A-Za-z]*/
<typeIdentifier> -> /[A-Z][A-Za-z]*/
<operator> -> "+" | "++" | "-" | "--" | "*" | "/" | "%"
| "!!" | "||" | "&&" | "^^"
| "!" | "|" | "&" | "^" | "<<" | ">>"
| "<" | ">" | "<=" | ">=" | "="
```