fspl/design/spec.md

351 lines
16 KiB
Markdown
Raw Normal View History

2023-09-22 04:58:06 +00:00
# Semantic entities
## Top level
### Type definition
2023-10-28 07:24:08 +00:00
Type definitions bind a type to a global type identifier.
2023-09-22 04:58:06 +00:00
### Function
Functions bind a global identifier and argument list to an expression which is
evaluated each time the function is called. If no expression is specified, the
function is marked as external. Functions have an argument list, where each
argument is passed as a separate variable. They return one value. All of these
are typed.
### Method
A method is like a function, except localized to a defined type. Methods are
called on an instance of that type, and receive a pointer to that instance via
2023-10-05 20:57:32 +00:00
the "this" variable. Method names are not globally unique, but are unique within
2023-09-22 04:58:06 +00:00
the type they are defined on.
## Types
### Named
2023-10-28 07:24:08 +00:00
Named refers to a user-defined, primitive, or built-in named type.
2023-09-22 04:58:06 +00:00
### Pointer
Pointer is a pointer to another type.
### Array
Array is a group of values of a given type stored next to eachother. The length
of an array is fixed and is part of its type. Arrays are passed by value unless
a pointer is used.
2023-10-03 22:38:15 +00:00
### Slice
Slice is a pointer to several values of a given type stored next to eachother.
Its length is not built into its type and can be changed at runtime.
2023-09-22 04:58:06 +00:00
### Struct
Struct is a composite type that stores keyed values. The positions of the values
within the struct are decided at compile time, based on the order they are
specified in. Structs are passed by value unless a pointer is used.
### Interface
Interface is a polymorphic pointer that allows any value of any type through,
except it must have at least the methods defined within the interface.
Interfaces are always passed by reference. When assigning a value to an
interface, it will be referenced automatically. When assigning a pointer to an
interface, the pointer's reference will be used instead.
2023-10-28 07:24:08 +00:00
## Primitive types
### Int
Int is defined as a signed system word.
### UInt
UInt is defined as an unsigned system word.
### I8, I16, I32, I64
I8-I64 are defined as signed 8, 16, 32, and 64 bit integers respectively.
### U8, U16, U32, U64
U8-U64 are defined as unsigned 8, 16, 32, and 64 bit integers respectively.
### F32, F64
F32 and F64 are defined as single-precision and double-precision floating point
types respectively.
## Built-in types
### Index
Index is defined as an unsigned system word. It is used to describe the size of
chunks of memory, and to index arrays and such.
### Byte
Byte is defined as the smallest addressable integer. It is unsigned. It is
usually equivalent to U8.
2023-11-04 20:34:40 +00:00
### Bool
Bool is a boolean type. It is equivalent to Byte.
2023-10-28 07:24:08 +00:00
### Rune
Rune is defined as a U32. It represents a single UTF-32 code point.
### String
String is defined as a slice of U8's. It represents a UTF-8 string. It is not
conventionally null-terminated, but a null can be added at the end manually if
desired.
2023-09-22 04:58:06 +00:00
## Expressions
### Location expressions
Location expressions are special expressions that only refer to the location of
a memory address. An expression is only a location expression if its value
originates from another location expression. Such expressions are marked here
with a star (*).
### Literals
#### Integer
An integer literal specifies an integer value. It can be assigned to any type
that is derived from an integer or a float, as long as the value of the literal
2023-10-28 07:24:08 +00:00
can fit within the range of the type.
2023-09-22 04:58:06 +00:00
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Float
A float literal specifies a floating point value. It can be assigned to any type
that is derived from a float.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
2023-12-08 22:31:16 +00:00
#### String
A string literal specifies a string value. It takes on different data
representations depending on what the base type of what it is assigned to is
structurally equivalent to:
- Integer: Single unicode code point. When assigning to an integer, the
string literal may not be longer than one code point, and that code point
must fit in the integer.
- Slice of 8 bit integers: UTF-8 string.
- Slice of 16 bit integers: UTF-16 string.
- Slice of 32 bit (or larger) integers: UTF-32 string.
- Array of integers: The same as slices of integers, but the string literal
must fit inside of the array.
- Pointer to 8 bit integer: Null-terminated UTF-8 string (AKA C-string).
A string literal cannot be directly assigned to an interface because it
contains no inherent type information. A value cast may be used for this
purpose.
2023-09-22 04:58:06 +00:00
#### Array
Array is a composite array literal. It can contain any number of values. It can
be assigned to any array type that:
1. has an identical length, and
2. who's element type can be assigned to by all the element values in the
literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
#### Struct
Struct is a composite structure literal. It can contain any number of name:value
pairs. It can be assigned to any struct type that:
1. has at least the members specified in the literal
2. who's member types can be assigned to by the corresponding member values in
the literal.
It cannot be directly assigned to an interface because it contains no inherent
type information. A value cast may be used for this purpose.
2023-10-09 04:24:48 +00:00
#### Boolean
Boolean is a boolean literal. It may be either true or false. It can be assigned
2023-10-28 07:24:08 +00:00
to any type derived from a boolean. It cannot be directly assigned to an
interface because it contains no inherent type information. A value cast may be
used for this purpose.
2023-09-22 04:58:06 +00:00
### Variable *
Variable specifies a named variable. It can be assigned to a type matching the
variable declaration's type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Declaration *
Declaration binds a local identifier to a typed variable, but also acts as a
variable expression allowing the variable to be used the moment it is defined.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Block
Block is an ordered collection of expressions that are evaluated sequentially.
It has its own scope. The last expression in the block specifies the block's
value, and any assignment rules of the block are equivalent to those of its last
expression.
### Call
Call calls upon the function specified by the first argument, and passes the
2023-10-28 07:24:08 +00:00
rest of that argument to the function. The first argument must be an identifier
referring to usually the name of a function. The result of a call may be assigned to
2023-09-22 04:58:06 +00:00
any type matching the function's return type. Since it contains inherent type
information, it may be directly assigned to an interface.
2023-10-28 07:24:08 +00:00
### Method call
Method call calls upon the method of the variable before the dot that is
specified by the first argument, passing the rest of the arguments to the
method. The first argument must be a method name. The result of a call may be
assigned to any type matching the method's return type. Since it contains
inherent type information, it may be directly assigned to an interface.
2023-09-22 04:58:06 +00:00
### Member access *
Member access allows referring to a specific member of a value with a struct
type. It accepts any struct type that contains the specified member name, and
may be assigned to any type that matches the type of the selected member. Since
it contains inherent type information, it may be directly assigned to an
interface.
### Array subscript *
Array subscripting allows referring to a specific element of an array. It
accepts any array, and any offset of type Size. It may be assigned to any type
matching the array's element type. Since it contains inherent type information,
it may be directly assigned to an interface.
2023-10-03 22:38:15 +00:00
### Slice
Slice adjusts the start and end points of a slice relative to its current
starting index, and returns an adjusted copy pointing to the same data. Any
assignment rules of this expression are equivalent to those of the slice it is
operating on.
2023-12-08 22:31:16 +00:00
### Length
Length returns the length of an array or a slice. It always returns a value
of type Index.
2023-09-22 04:58:06 +00:00
### Pointer dereference *
Pointer dereferencing allows retrieving the value of a pointer. It accepts any
pointer. It may be assigned to any type matching the pointer's pointed type.
Since it contains inherent type information, it may be directly assigned to an
interface.
### Value reference
Value referencing allows retrieving the location of a value in memory. It
accepts any location expression, and can be assigned to any type that is a
pointer to the location expression's type. Since it contains inherent type
information, it can be directly assigned to an interface, although it doesn't
make a whole lot of sense to do so because assigning a value to an interface
automatically references it anyway.
### Bit casting
Bit casting takes the raw data in memory of a certain value and re-interprets it
as a value of another type. Since it contains inherent type information, it may
be directly assigned to an interface.
### Value casting
Vaue casting converts a value of a certain type to another type. Since it
contains inherent type information, it may be directly assigned to an interface.
### Operations
Operations perform math, logic, or bit manipulation on values. They accept
values of the same type as the type they are being assigned to, except in
special cases. Since they contain no inherent type information, they may not be
assigned to interfaces.
#### Math
Mathematical operations perform math on numeric values.
- `+` Returns the sum of all arguments
- `++` Returns the sum of all arguments, plus 1
- `-` Returns all arguments after the first subtracted from the first
- `--` Returns all arguments after the first subtracted from the first, minus 1
- `*` Returns the product of all arguments
- `/` Returns A0 / A1 / ... / An
- `%` Returns the remainder of the first argument divided by the second.
#### Logic
Logic operations perform logic on booleans.
- `!` Returns the logical negation of the argument
- `|` Returns the logical OR of all arguments
- `&` Returns the logical AND of all arguments
- `^` Returns the logical XOR of all arguments, right to left
2023-09-22 04:58:06 +00:00
#### Bit manipulation
Bit manipulation allows for manipulating values at the binary level. These work
on all types except reference types.
- `!!` Returns the bitwise negation of the argument
- `||` Returns the bitwise OR of all arguments
- `&&` Returns the bitwise AND of all arguments
- `^^` Returns the bitwise XOR of all aruments, right to left
2023-09-22 04:58:06 +00:00
- `<<` Returns the first argument bit-shifted to the left by the second
argument. The second argument must be an integer.
- `>>` Returns the first argument bit-shifted to the right by the second
argument. The second argument must be an integer.
#### Comparison
Comparison operations compare two values and return a boolean.
- `<` Returns if all operands are in ascending order from left to right.
- `>` Returns if all operands are in descending order from left to right.
- `<=` Returns if all operands are in ascending order from left to right,
allowing equal operands.
- `>=` Returns if all operands are in descending order from left to right,
allowing equal operands.
- `=` Returns if all operands are equal to eachother.
Comparison operations are the only constructs in FSPL which are allowed to infer
their argument types. The rules for this are as follows:
- If at least one argument has type information, that type is used for all
arguments that do not.
- Else, fail. Optionally call the user an idiot if this is because they directly
compared two literals.
### If/else
If/else is a control flow branching expression that executes one of two
expressions depending on a boolean value. If the value of the if/else is unused,
the else expression need not be specified. It may be assigned to any type that
satisfies the assignment rules of both the true and false expressions.
### Loop
Loop is a control flow expression that repeats an expression until a break
statement is called from within it. The break statement must be given a value
if the value of the loop is used. Otherwise, it need not even have a break
statement. The result of the loop may be assigned to any type that satisfies the
assignment rules of all of its break statements. Loops may be nested, and break
statements only apply to the closest containing loop. The value of the loop's
expression is never used.
### Break
Break allows breaking out of loops. It has no value and may not be assigned to
anything.
### Return
Return allows terminating functions before they have reached their end. It
accepts values that may be assigned to the function's return type. If a function
does not return anything, the return statement does not accept a value. In all
cases, return statements have no value and may not be assigned to anything.
### Assignment
Assignment allows assigning the result of one expression to one or more location
expressions. The assignment statement itself has no value and may not be
assigned to anything.
# Syntax entities
2023-12-08 22:31:16 +00:00
Below is a rough syntax description of the language.
2023-09-22 04:58:06 +00:00
```
<file> -> (<typedef> | <function> | <method>)*
2023-10-28 07:24:08 +00:00
<typedef> -> <typeIdentifier> ":" <type>
2023-09-22 04:58:06 +00:00
<function> -> <signature> ["=" <expression>]
2023-10-28 07:24:08 +00:00
<method> -> <typeIdentifier> "." <function>
2023-09-22 04:58:06 +00:00
<type> -> <namedType>
| <pointerType>
2023-10-03 22:38:15 +00:00
| <sliceType>
2023-09-22 04:58:06 +00:00
| <arrayType>
| <structType>
| <interfaceType>
2023-10-28 07:24:08 +00:00
<namedType> -> <typeIdentifier>
2023-09-22 04:58:06 +00:00
<pointerType> -> "*" <type>
2023-10-03 22:38:15 +00:00
<sliceType> -> "*" ":" <type>
<arrayType> -> <intLiteral> ":" <type>
2023-09-22 04:58:06 +00:00
<structType> -> "(" <declaration>* ")"
<interfaceType> -> "(" <signature> ")"
<expression> -> <intLiteral>
| <floatLiteral>
2023-12-08 22:31:16 +00:00
| <stringLiteral>
2023-09-22 04:58:06 +00:00
| <arrayLiteral>
| <structLiteral>
2023-10-09 04:24:48 +00:00
| <booleanLiteral>
2023-09-22 04:58:06 +00:00
| <variable>
| <declaration>
| <call>
| <subscript>
2023-12-08 22:31:16 +00:00
| <length>
2023-09-22 04:58:06 +00:00
| <dereference>
| <reference>
| <valueCast>
| <bitCast>
| <operation>
| <block>
| <memberAccess>
| <ifelse>
| <loop>
| <break>
| <return>
<statement> -> <expression> | <assignment>
<variable> -> <identifier>
<declaration> -> <identifier> ":" <type>
<call> -> "[" <expression>+ "]"
<subscript> -> "[" "." <expression> <expression> "]"
2023-10-03 22:38:15 +00:00
<slice> -> "[" "\" <expression> <expression>? ":" <expression>? "]"
2023-12-08 22:31:16 +00:00
<length> -> "[" "#" <expression> "]"
2023-09-22 04:58:06 +00:00
<dereference> -> "[" "." <expression> "]"
<reference> -> "[" "@" <expression> "]"
2023-10-03 22:38:15 +00:00
<valueCast> -> "[" "~" <type> <expression> "]"
<bitCast> -> "[" "~~" <type> <expression> "]"
2023-09-22 04:58:06 +00:00
<operation> -> "[" <operator> <expression>* "]"
<block> -> "{" <statement>* "}"
<memberAccess> -> <variable> "." <identifier>
2023-10-28 07:24:08 +00:00
<methodAccess> -> <variable> "." <call>
2023-09-22 04:58:06 +00:00
<ifelse> -> "if" <expression>
"then" <expression>
["else" <expression>]
<loop> -> "loop" <expression>
<break> -> "[" "break" [<expression>] "]"
<return> -> "[" "return" [<expression>] "]"
<assignment> -> <expression> "=" <expression>
<intLiteral> -> /-?[1-9][0-9]*/
| /-?0[0-7]*/
| /-?0x[0-9a-fA-F]*/
| /-?0b[0-1]*/
2023-10-09 04:24:48 +00:00
<floatLiteral> -> /-?[0-9]*\.[0-9]+/
2023-12-08 22:31:16 +00:00
<stringLiteral> -> /'.*'/
2023-10-09 04:24:48 +00:00
<arrayLiteral> -> "(*" <expression>* ")"
<structLiteral> -> "(" <member>* ")"
<booleanLiteral> -> "true" | "false"
2023-09-22 04:58:06 +00:00
2023-10-28 07:24:08 +00:00
<member> -> <identifier> ":" <expression>
<signature> -> "[" <identifier> <declaration>* "]" [":" <type>]
<identifier> -> /[a-z][A-Za-z]*/
<typeIdentifier> -> /[A-Z][A-Za-z]*/
<operator> -> "+" | "++" | "-" | "--" | "*" | "/" | "%"
| "!!" | "||" | "&&" | "^^"
| "!" | "|" | "&" | "^" | "<<" | ">>"
| "<" | ">" | "<=" | ">=" | "="
2023-09-22 04:58:06 +00:00
```