406 lines
19 KiB
Markdown
406 lines
19 KiB
Markdown
# Semantic entities
|
|
|
|
## Top level
|
|
Top level entities are defined directly within the source file(s) of a unit, and
|
|
can be made available to other units via access control modes. The three modes
|
|
are:
|
|
- Public access: Allows other modules to access an entity normally.
|
|
- Opaque access: Causes a top-level entity to appear opaque to other units.
|
|
Values of opaque types can be passed around, assigned to each-other, and their
|
|
methods can be called, but the implementation of the type is entirely hidden.
|
|
This access mode cannot be applied to functions or methods.
|
|
- Private access: Disallows other modules from accessing a top-level entity.
|
|
This mode is the default when one isn't supplied.
|
|
### Type definition
|
|
Type definitions bind a type to a global type identifier.
|
|
### Function
|
|
Functions bind a global identifier and argument list to an expression which is
|
|
evaluated each time the function is called. If no expression is specified, the
|
|
function is marked as external. Functions have an argument list, where each
|
|
argument is passed as a separate variable. They return one value. All of these
|
|
are typed.
|
|
### Method
|
|
A method is like a function, except localized to a defined type. Methods are
|
|
called on an instance of that type, and receive a pointer to that instance via
|
|
the "this" variable. Method names are not globally unique, but are unique within
|
|
the type they are defined on.
|
|
|
|
## Types
|
|
### Named
|
|
Named refers to a user-defined, primitive, or built-in named type.
|
|
### Pointer
|
|
Pointer is a pointer to another type.
|
|
### Array
|
|
Array is a group of values of a given type stored next to eachother. The length
|
|
of an array is fixed and is part of its type. Arrays are passed by value unless
|
|
a pointer is used.
|
|
### Slice
|
|
Slice is a pointer to several values of a given type stored next to eachother.
|
|
Its length is not built into its type and can be changed at runtime.
|
|
### Struct
|
|
Struct is a composite type that stores keyed values. The positions of the values
|
|
within the struct are decided at compile time, based on the order they are
|
|
specified in. Structs are passed by value unless a pointer is used.
|
|
### Interface
|
|
Interface is a polymorphic pointer that allows any value of any type through,
|
|
except it must have at least the methods defined within the interface.
|
|
Interfaces are always passed by reference. When assigning a value to an
|
|
interface, it will be referenced automatically. When assigning a pointer to an
|
|
interface, the pointer's reference will be used instead.
|
|
### Union
|
|
Union is a polymorphic type that can hold any value as long as it is one of a
|
|
list of allowed types. It is not a pointer. It holds the hash of the actual type
|
|
of the value stored within it, followed by the value. The hash field is computed
|
|
using the type's name, and the UUID that it was defined in. If it is not named,
|
|
then the hash is computed using the structure of the type. The value field is
|
|
always big enough to hold the largest type in the allowed list. The value of a
|
|
union can only be extracted using a match expression.
|
|
|
|
## Primitive types
|
|
### Int
|
|
Int is defined as a signed system word.
|
|
### UInt
|
|
UInt is defined as an unsigned system word.
|
|
### I8, I16, I32, I64
|
|
I8-I64 are defined as signed 8, 16, 32, and 64 bit integers respectively.
|
|
### U8, U16, U32, U64
|
|
U8-U64 are defined as unsigned 8, 16, 32, and 64 bit integers respectively.
|
|
### F32, F64
|
|
F32 and F64 are defined as single-precision and double-precision floating point
|
|
types respectively.
|
|
|
|
## Built-in types
|
|
### Index
|
|
Index is defined as an unsigned system word. It is used to describe the size of
|
|
chunks of memory, and to index arrays and such.
|
|
### Byte
|
|
Byte is defined as the smallest addressable integer. It is unsigned. It is
|
|
usually equivalent to U8.
|
|
### Bool
|
|
Bool is a boolean type. It is equivalent to U1. For now, since arbitrary width
|
|
integers are not supported, it is the only way to get a U1 type.
|
|
### Rune
|
|
Rune is defined as a U32. It represents a single UTF-32 code point.
|
|
### String
|
|
String is defined as a slice of U8's. It represents a UTF-8 string. It is not
|
|
conventionally null-terminated, but a null can be added at the end manually if
|
|
desired.
|
|
|
|
## Expressions
|
|
### Location expressions
|
|
Location expressions are special expressions that only refer to the location of
|
|
a memory address. An expression is only a location expression if its value
|
|
originates from another location expression. Such expressions are marked here
|
|
with a star (*).
|
|
### Literals
|
|
#### Integer
|
|
An integer literal specifies an integer value. It can be assigned to any type
|
|
that is derived from an integer or a float, as long as the value of the literal
|
|
can fit within the range of the type.
|
|
It cannot be directly assigned to an interface because it contains no inherent
|
|
type information. A value cast may be used for this purpose.
|
|
#### Float
|
|
A float literal specifies a floating point value. It can be assigned to any type
|
|
that is derived from a float.
|
|
It cannot be directly assigned to an interface because it contains no inherent
|
|
type information. A value cast may be used for this purpose.
|
|
#### String
|
|
A string literal specifies a string value. It takes on different data
|
|
representations depending on what the base type of what it is assigned to is
|
|
structurally equivalent to:
|
|
- Integer: Single unicode code point. When assigning to an integer, the
|
|
string literal may not be longer than one code point, and that code point
|
|
must fit in the integer.
|
|
- Slice of 8 bit integers: UTF-8 string.
|
|
- Slice of 16 bit integers: UTF-16 string.
|
|
- Slice of 32 bit (or larger) integers: UTF-32 string.
|
|
- Array of integers: The same as slices of integers, but the string literal
|
|
must fit inside of the array.
|
|
- Pointer to 8 bit integer: Null-terminated UTF-8 string (AKA C-string).
|
|
A string literal cannot be directly assigned to an interface because it
|
|
contains no inherent type information. A value cast may be used for this
|
|
purpose.
|
|
#### Array
|
|
Array is a composite array literal. It can contain any number of values. It can
|
|
be assigned to any array type that:
|
|
1. has an identical length, and
|
|
2. who's element type can be assigned to by all the element values in the
|
|
literal.
|
|
|
|
It cannot be directly assigned to an interface because it contains no inherent
|
|
type information. A value cast may be used for this purpose.
|
|
#### Struct
|
|
Struct is a composite structure literal. It can contain any number of name:value
|
|
pairs. It can be assigned to any struct type that:
|
|
1. has at least the members specified in the literal
|
|
2. who's member types can be assigned to by the corresponding member values in
|
|
the literal.
|
|
|
|
It cannot be directly assigned to an interface because it contains no inherent
|
|
type information. A value cast may be used for this purpose.
|
|
#### Boolean
|
|
Boolean is a boolean literal. It may be either true or false. It can be assigned
|
|
to any type derived from a boolean. It cannot be directly assigned to an
|
|
interface because it contains no inherent type information. A value cast may be
|
|
used for this purpose.
|
|
### Variable *
|
|
Variable specifies a named variable. It can be assigned to a type matching the
|
|
variable declaration's type. Since it contains inherent type information, it may
|
|
be directly assigned to an interface.
|
|
### Declaration *
|
|
Declaration binds a local identifier to a typed variable, but also acts as a
|
|
variable expression allowing the variable to be used the moment it is defined.
|
|
Since it contains inherent type information, it may be directly assigned to an
|
|
interface.
|
|
### Block
|
|
Block is an ordered collection of expressions that are evaluated sequentially.
|
|
It has its own scope. The last expression in the block specifies the block's
|
|
value, and any assignment rules of the block are equivalent to those of its last
|
|
expression.
|
|
### Call
|
|
Call calls upon the function specified by the first argument, and passes the
|
|
rest of that argument to the function. The first argument must be an identifier
|
|
referring to usually the name of a function. The result of a call may be assigned to
|
|
any type matching the function's return type. Since it contains inherent type
|
|
information, it may be directly assigned to an interface.
|
|
### Method call
|
|
Method call calls upon the method (of the expression before the dot) that is
|
|
specified by the first argument, passing the rest of the arguments to the
|
|
method. The first argument must be a method name. The result of a call may be
|
|
assigned to any type matching the method's return type. Since it contains
|
|
inherent type information, it may be directly assigned to an interface.
|
|
### Member access *
|
|
Member access allows referring to a specific member of a value with a struct
|
|
type. It accepts any struct type that contains the specified member name, and
|
|
may be assigned to any type that matches the type of the selected member. Since
|
|
it contains inherent type information, it may be directly assigned to an
|
|
interface.
|
|
### Array subscript *
|
|
Array subscripting allows referring to a specific element of an array. It
|
|
accepts any array, and any offset of type Size. It may be assigned to any type
|
|
matching the array's element type. Since it contains inherent type information,
|
|
it may be directly assigned to an interface.
|
|
### Slice
|
|
Slice adjusts the start and end points of a slice relative to its current
|
|
starting index, and returns an adjusted copy pointing to the same data. Any
|
|
assignment rules of this expression are equivalent to those of the slice it is
|
|
operating on.
|
|
### Length
|
|
Length returns the length of an array or a slice. It always returns a value
|
|
of type Index.
|
|
### Pointer dereference *
|
|
Pointer dereferencing allows retrieving the value of a pointer. It accepts any
|
|
pointer. It may be assigned to any type matching the pointer's pointed type.
|
|
Since it contains inherent type information, it may be directly assigned to an
|
|
interface.
|
|
### Value reference
|
|
Value referencing allows retrieving the location of a value in memory. It
|
|
accepts any location expression, and can be assigned to any type that is a
|
|
pointer to the location expression's type. Since it contains inherent type
|
|
information, it can be directly assigned to an interface, although it doesn't
|
|
make a whole lot of sense to do so because assigning a value to an interface
|
|
automatically references it anyway.
|
|
### Bit casting
|
|
Bit casting takes the raw data in memory of a certain value and re-interprets it
|
|
as a value of another type. Since it contains inherent type information, it may
|
|
be directly assigned to an interface.
|
|
### Value casting
|
|
Vaue casting converts a value of a certain type to another type. Since it
|
|
contains inherent type information, it may be directly assigned to an interface.
|
|
### Operations
|
|
Operations perform math, logic, or bit manipulation on values. They accept
|
|
values of the same type as the type they are being assigned to, except in
|
|
special cases. Since they contain no inherent type information, they may not be
|
|
assigned to interfaces.
|
|
#### Math
|
|
Mathematical operations perform math on numeric values.
|
|
- `+` Returns the sum of all arguments
|
|
- `++` Returns the sum of all arguments, plus 1
|
|
- `-` Returns all arguments after the first subtracted from the first
|
|
- `--` Returns all arguments after the first subtracted from the first, minus 1
|
|
- `*` Returns the product of all arguments
|
|
- `/` Returns A0 / A1 / ... / An
|
|
- `%` Returns the remainder of the first argument divided by the second.
|
|
#### Logic
|
|
Logic operations perform logic on booleans.
|
|
- `!` Returns the logical negation of the argument
|
|
- `|` Returns the logical OR of all arguments
|
|
- `&` Returns the logical AND of all arguments
|
|
- `^` Returns the logical XOR of all arguments, right to left
|
|
#### Bit manipulation
|
|
Bit manipulation allows for manipulating values at the binary level. These work
|
|
on all types except reference types.
|
|
- `!!` Returns the bitwise negation of the argument
|
|
- `||` Returns the bitwise OR of all arguments
|
|
- `&&` Returns the bitwise AND of all arguments
|
|
- `^^` Returns the bitwise XOR of all aruments, right to left
|
|
- `<<` Returns the first argument bit-shifted to the left by the second
|
|
argument. The second argument must be an integer.
|
|
- `>>` Returns the first argument bit-shifted to the right by the second
|
|
argument. The second argument must be an integer.
|
|
#### Comparison
|
|
Comparison operations compare two values and return a boolean.
|
|
- `<` Returns if all operands are in ascending order from left to right.
|
|
- `>` Returns if all operands are in descending order from left to right.
|
|
- `<=` Returns if all operands are in ascending order from left to right,
|
|
allowing equal operands.
|
|
- `>=` Returns if all operands are in descending order from left to right,
|
|
allowing equal operands.
|
|
- `=` Returns if all operands are equal to eachother.
|
|
|
|
Comparison operations are the only constructs in FSPL which are allowed to infer
|
|
their argument types. The rules for this are as follows:
|
|
- If at least one argument has type information, that type is used for all
|
|
arguments that do not.
|
|
- Else, fail. Optionally call the user an idiot if this is because they directly
|
|
compared two literals.
|
|
### If/else
|
|
If/else is a control flow branching expression that executes one of two
|
|
expressions depending on a boolean value. If the value of the if/else is unused,
|
|
the else expression need not be specified. It may be assigned to any type that
|
|
satisfies the assignment rules of both the true and false expressions.
|
|
### Match
|
|
Match is a control flow branching expression that executes one of several case
|
|
expressions depending on the input. It can be used to check the type of a union
|
|
value. Each case takes the form of a declaration, and an associated expression.
|
|
If the type of the union matches the type of the declaration in the case, the
|
|
expression is executed and the value of the union is made available to it
|
|
through the declaration. If the value of the match expression is used, all
|
|
possible types in the union must be accounted for, or it must have a default
|
|
case. It may be assigned to any type that satisfies the assignment rules of its
|
|
first case.
|
|
### Switch
|
|
Switch is a control flow branching expression that executes one of several case
|
|
expressions depending on the value of the input. It accepts any pointer or
|
|
integer type. If the value of the switch expression is used, a default case must
|
|
be present. It may be assigned to any type that satisfies the assignment rules
|
|
of its first case.
|
|
### Loop
|
|
Loop is a control flow expression that repeats an expression until a break
|
|
statement is called from within it. The break statement must be given a value
|
|
if the value of the loop is used. Otherwise, it need not even have a break
|
|
statement. The result of the loop may be assigned to any type that satisfies the
|
|
assignment rules of all of its break statements. Loops may be nested, and break
|
|
statements only apply to the closest containing loop. The value of the loop's
|
|
expression is never used.
|
|
### For
|
|
For is a special kind of loop that evaluates an expression for each element of
|
|
an array or slice. It accepts an index declaration and an element declaration,
|
|
which are scoped to the loop's body and are set to the index of the current
|
|
element and the element itself respectively at the beginning of each iteration.
|
|
The assignment rules of a for statement are identical to that of a normal loop.
|
|
### Break
|
|
Break allows breaking out of loops. It may be assigned to anything, but the
|
|
assignment will have no effect as execution of the code after and surrounding it
|
|
will cease.
|
|
### Return
|
|
Return allows terminating functions before they have reached their end. It
|
|
accepts values that may be assigned to the function's return type. If a function
|
|
does not return anything, the return statement does not accept a value. It may
|
|
be assigned to anything, but the assignment will have no effect as execution of
|
|
the function or method will cease.
|
|
### Assignment
|
|
Assignment allows assigning the result of one expression to one or more location
|
|
expressions. The assignment expression itself has no value and may not be
|
|
assigned to anything.
|
|
|
|
# Syntax entities
|
|
|
|
Below is a rough syntax description of the language. Note that `<assignment>`
|
|
is right-associative, and `<memberAccess>` and `<methodCall>` are
|
|
left-associative. I invite you to torture yourself by attempting to implement
|
|
this without hand-writing a parser.
|
|
|
|
```
|
|
<file> -> (<typedef> | <function> | <method>)*
|
|
<access> -> "+" | "#" | "-"
|
|
<typedef> -> [<access>] <typeIdentifier> ":" <type>
|
|
<function> -> [<access>] <signature> ["=" <expression>]
|
|
<method> -> [<access>] <typeIdentifier> "." <function>
|
|
|
|
<type> -> <namedType>
|
|
| <pointerType>
|
|
| <sliceType>
|
|
| <arrayType>
|
|
| <structType>
|
|
| <interfaceType>
|
|
<namedType> -> <typeIdentifier>
|
|
<pointerType> -> "*" <type>
|
|
<sliceType> -> "*" ":" <type>
|
|
<arrayType> -> <intLiteral> ":" <type>
|
|
<structType> -> "(" "." <declaration>* ")"
|
|
<interfaceType> -> "(" "&" <signature>* ")"
|
|
<unionType> -> "(" "|" <type>* ")"
|
|
|
|
<expression> -> <intLiteral>
|
|
| <floatLiteral>
|
|
| <stringLiteral>
|
|
| <arrayLiteral>
|
|
| <structLiteral>
|
|
| <booleanLiteral>
|
|
| <variable>
|
|
| <declaration>
|
|
| <call>
|
|
| <subscript>
|
|
| <length>
|
|
| <dereference>
|
|
| <reference>
|
|
| <valueCast>
|
|
| <bitCast>
|
|
| <operation>
|
|
| <block>
|
|
| <memberAccess>
|
|
| <methodCall>
|
|
| <ifelse>
|
|
| <loop>
|
|
| <break>
|
|
| <return>
|
|
| <assignment>
|
|
<variable> -> <identifier>
|
|
<declaration> -> <identifier> ":" <type>
|
|
<call> -> "[" <expression>+ "]"
|
|
<subscript> -> "[" "." <expression> <expression> "]"
|
|
<slice> -> "[" "\" <expression> <expression>? "/" <expression>? "]"
|
|
<length> -> "[" "#" <expression> "]"
|
|
<dereference> -> "[" "." <expression> "]"
|
|
<reference> -> "[" "@" <expression> "]"
|
|
<valueCast> -> "[" "~" <type> <expression> "]"
|
|
<bitCast> -> "[" "~~" <type> <expression> "]"
|
|
<operation> -> "[" <operator> <expression>* "]"
|
|
<block> -> "{" <expression>* "}"
|
|
<memberAccess> -> <expression> "." <identifier>
|
|
<methodCall> -> <expression> "." <call>
|
|
<ifelse> -> "if" <expression>
|
|
"then" <expression>
|
|
["else" <expression>]
|
|
<match> -> "match" <expression> <matchCase>* [<defaultCase>]
|
|
<switch> -> "switch" <expression> <switchCase>* [<defaultCase>]
|
|
<loop> -> "loop" <expression>
|
|
<for> -> "for" <expression> [<expression>] in <expression> <expression>
|
|
<break> -> "[" "break" [<expression>] "]"
|
|
<return> -> "[" "return" [<expression>] "]"
|
|
<assignment> -> <expression> "=" <expression>
|
|
|
|
<intLiteral> -> /-?[1-9][0-9]*/
|
|
| /-?0[0-7]*/
|
|
| /-?0x[0-9a-fA-F]*/
|
|
| /-?0b[0-1]*/
|
|
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
|
|
<stringLiteral> -> /'.*'/
|
|
<arrayLiteral> -> "(*" <expression>* ")"
|
|
<structLiteral> -> "(." <member>* ")"
|
|
<booleanLiteral> -> "true" | "false"
|
|
|
|
<member> -> <identifier> ":" <expression>
|
|
<matchCase> -> "|" <declaration> <expression>
|
|
<switchCase> -> "|" <expression> <expression>
|
|
<defaultCase> -> "*" <expression>
|
|
<signature> -> "[" <identifier> <declaration>* "]" [":" <type>]
|
|
<identifier> -> /[a-z][A-Za-z]*/
|
|
<typeIdentifier> -> /[A-Z][A-Za-z]*/
|
|
<operator> -> "+" | "++" | "-" | "--" | "*" | "/" | "%"
|
|
| "!!" | "||" | "&&" | "^^"
|
|
| "!" | "|" | "&" | "^" | "<<" | ">>"
|
|
| "<" | ">" | "<=" | ">=" | "="
|
|
```
|