fspl/design/spec.md

19 KiB

Semantic entities

Top level

Top level entities are defined directly within the source file(s) of a unit, and can be made available to other units via access control modes. The three modes are:

  • Public access: Allows other modules to access an entity normally.
  • Opaque access: Causes a top-level entity to appear opaque to other units. Values of opaque types can be passed around, assigned to each-other, and their methods can be called, but the implementation of the type is entirely hidden. This access mode cannot be applied to functions or methods.
  • Private access: Disallows other modules from accessing a top-level entity. This mode is the default when one isn't supplied.

Type definition

Type definitions bind a type to a global type identifier.

Function

Functions bind a global identifier and argument list to an expression which is evaluated each time the function is called. If no expression is specified, the function is marked as external. Functions have an argument list, where each argument is passed as a separate variable. They return one value. All of these are typed.

Method

A method is like a function, except localized to a defined type. Methods are called on an instance of that type, and receive a pointer to that instance via the "this" variable. Method names are not globally unique, but are unique within the type they are defined on.

Types

Named

Named refers to a user-defined, primitive, or built-in named type.

Pointer

Pointer is a pointer to another type.

Array

Array is a group of values of a given type stored next to eachother. The length of an array is fixed and is part of its type. Arrays are passed by value unless a pointer is used.

Slice

Slice is a pointer to several values of a given type stored next to eachother. Its length is not built into its type and can be changed at runtime.

Struct

Struct is a composite type that stores keyed values. The positions of the values within the struct are decided at compile time, based on the order they are specified in. Structs are passed by value unless a pointer is used.

Interface

Interface is a polymorphic pointer that allows any value of any type through, except it must have at least the methods defined within the interface. Interfaces are always passed by reference. When assigning a value to an interface, it will be referenced automatically. When assigning a pointer to an interface, the pointer's reference will be used instead.

Union

Union is a polymorphic type that can hold any value as long as it is one of a list of allowed types. It is not a pointer. It holds the hash of the actual type of the value stored within it, followed by the value. The hash field is computed using the type's name, and the UUID that it was defined in. If it is not named, then the hash is computed using the structure of the type. The value field is always big enough to hold the largest type in the allowed list. The value of a union can only be extracted using a match expression.

Primitive types

Int

Int is defined as a signed system word.

UInt

UInt is defined as an unsigned system word.

I8, I16, I32, I64

I8-I64 are defined as signed 8, 16, 32, and 64 bit integers respectively.

U8, U16, U32, U64

U8-U64 are defined as unsigned 8, 16, 32, and 64 bit integers respectively.

F32, F64

F32 and F64 are defined as single-precision and double-precision floating point types respectively.

Built-in types

Index

Index is defined as an unsigned system word. It is used to describe the size of chunks of memory, and to index arrays and such.

Byte

Byte is defined as the smallest addressable integer. It is unsigned. It is usually equivalent to U8.

Bool

Bool is a boolean type. It is equivalent to U1. For now, since arbitrary width integers are not supported, it is the only way to get a U1 type.

Rune

Rune is defined as a U32. It represents a single UTF-32 code point.

String

String is defined as a slice of U8's. It represents a UTF-8 string. It is not conventionally null-terminated, but a null can be added at the end manually if desired.

Expressions

Location expressions

Location expressions are special expressions that only refer to the location of a memory address. An expression is only a location expression if its value originates from another location expression. Such expressions are marked here with a star (*).

Literals

Integer

An integer literal specifies an integer value. It can be assigned to any type that is derived from an integer or a float, as long as the value of the literal can fit within the range of the type. It cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

Float

A float literal specifies a floating point value. It can be assigned to any type that is derived from a float. It cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

String

A string literal specifies a string value. It takes on different data representations depending on what the base type of what it is assigned to is structurally equivalent to:

  • Integer: Single unicode code point. When assigning to an integer, the string literal may not be longer than one code point, and that code point must fit in the integer.
  • Slice of 8 bit integers: UTF-8 string.
  • Slice of 16 bit integers: UTF-16 string.
  • Slice of 32 bit (or larger) integers: UTF-32 string.
  • Array of integers: The same as slices of integers, but the string literal must fit inside of the array.
  • Pointer to 8 bit integer: Null-terminated UTF-8 string (AKA C-string). A string literal cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

Array

Array is a composite array literal. It can contain any number of values. It can be assigned to any array type that:

  1. has an identical length, and
  2. who's element type can be assigned to by all the element values in the literal.

It cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

Struct

Struct is a composite structure literal. It can contain any number of name:value pairs. It can be assigned to any struct type that:

  1. has at least the members specified in the literal
  2. who's member types can be assigned to by the corresponding member values in the literal.

It cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

Boolean

Boolean is a boolean literal. It may be either true or false. It can be assigned to any type derived from a boolean. It cannot be directly assigned to an interface because it contains no inherent type information. A value cast may be used for this purpose.

Variable *

Variable specifies a named variable. It can be assigned to a type matching the variable declaration's type. Since it contains inherent type information, it may be directly assigned to an interface.

Declaration *

Declaration binds a local identifier to a typed variable, but also acts as a variable expression allowing the variable to be used the moment it is defined. Since it contains inherent type information, it may be directly assigned to an interface.

Block

Block is an ordered collection of expressions that are evaluated sequentially. It has its own scope. The last expression in the block specifies the block's value, and any assignment rules of the block are equivalent to those of its last expression.

Call

Call calls upon the function specified by the first argument, and passes the rest of that argument to the function. The first argument must be an identifier referring to usually the name of a function. The result of a call may be assigned to any type matching the function's return type. Since it contains inherent type information, it may be directly assigned to an interface.

Method call

Method call calls upon the method (of the expression before the dot) that is specified by the first argument, passing the rest of the arguments to the method. The first argument must be a method name. The result of a call may be assigned to any type matching the method's return type. Since it contains inherent type information, it may be directly assigned to an interface.

Member access *

Member access allows referring to a specific member of a value with a struct type. It accepts any struct type that contains the specified member name, and may be assigned to any type that matches the type of the selected member. Since it contains inherent type information, it may be directly assigned to an interface.

Array subscript *

Array subscripting allows referring to a specific element of an array. It accepts any array, and any offset of type Size. It may be assigned to any type matching the array's element type. Since it contains inherent type information, it may be directly assigned to an interface.

Slice

Slice adjusts the start and end points of a slice relative to its current starting index, and returns an adjusted copy pointing to the same data. Any assignment rules of this expression are equivalent to those of the slice it is operating on.

Length

Length returns the length of an array or a slice. It always returns a value of type Index.

Pointer dereference *

Pointer dereferencing allows retrieving the value of a pointer. It accepts any pointer. It may be assigned to any type matching the pointer's pointed type. Since it contains inherent type information, it may be directly assigned to an interface.

Value reference

Value referencing allows retrieving the location of a value in memory. It accepts any location expression, and can be assigned to any type that is a pointer to the location expression's type. Since it contains inherent type information, it can be directly assigned to an interface, although it doesn't make a whole lot of sense to do so because assigning a value to an interface automatically references it anyway.

Bit casting

Bit casting takes the raw data in memory of a certain value and re-interprets it as a value of another type. Since it contains inherent type information, it may be directly assigned to an interface.

Value casting

Vaue casting converts a value of a certain type to another type. Since it contains inherent type information, it may be directly assigned to an interface.

Operations

Operations perform math, logic, or bit manipulation on values. They accept values of the same type as the type they are being assigned to, except in special cases. Since they contain no inherent type information, they may not be assigned to interfaces.

Math

Mathematical operations perform math on numeric values.

  • + Returns the sum of all arguments
  • ++ Returns the sum of all arguments, plus 1
  • - Returns all arguments after the first subtracted from the first
  • -- Returns all arguments after the first subtracted from the first, minus 1
  • * Returns the product of all arguments
  • / Returns A0 / A1 / ... / An
  • % Returns the remainder of the first argument divided by the second.

Logic

Logic operations perform logic on booleans.

  • ! Returns the logical negation of the argument
  • | Returns the logical OR of all arguments
  • & Returns the logical AND of all arguments
  • ^ Returns the logical XOR of all arguments, right to left

Bit manipulation

Bit manipulation allows for manipulating values at the binary level. These work on all types except reference types.

  • !! Returns the bitwise negation of the argument
  • || Returns the bitwise OR of all arguments
  • && Returns the bitwise AND of all arguments
  • ^^ Returns the bitwise XOR of all aruments, right to left
  • << Returns the first argument bit-shifted to the left by the second argument. The second argument must be an integer.
  • >> Returns the first argument bit-shifted to the right by the second argument. The second argument must be an integer.

Comparison

Comparison operations compare two values and return a boolean.

  • < Returns if all operands are in ascending order from left to right.
  • > Returns if all operands are in descending order from left to right.
  • <= Returns if all operands are in ascending order from left to right, allowing equal operands.
  • >= Returns if all operands are in descending order from left to right, allowing equal operands.
  • = Returns if all operands are equal to eachother.

Comparison operations are the only constructs in FSPL which are allowed to infer their argument types. The rules for this are as follows:

  • If at least one argument has type information, that type is used for all arguments that do not.
  • Else, fail. Optionally call the user an idiot if this is because they directly compared two literals.

If/else

If/else is a control flow branching expression that executes one of two expressions depending on a boolean value. If the value of the if/else is unused, the else expression need not be specified. It may be assigned to any type that satisfies the assignment rules of both the true and false expressions.

Match

Match is a control flow branching expression that executes one of several case expressions depending on the input. It can be used to check the type of a union value. Each case takes the form of a declaration, and an associated expression. If the type of the union matches the type of the declaration in the case, the expression is executed and the value of the union is made available to it through the declaration. If the value of the match expression is used, all possible types in the union must be accounted for, or it must have a default case. It may be assigned to any type that satisfies the assignment rules of its first case.

Switch

Switch is a control flow branching expression that executes one of several case expressions depending on the value of the input. It accepts any pointer or integer type. If the value of the switch expression is used, a default case must be present. It may be assigned to any type that satisfies the assignment rules of its first case.

Loop

Loop is a control flow expression that repeats an expression until a break statement is called from within it. The break statement must be given a value if the value of the loop is used. Otherwise, it need not even have a break statement. The result of the loop may be assigned to any type that satisfies the assignment rules of all of its break statements. Loops may be nested, and break statements only apply to the closest containing loop. The value of the loop's expression is never used.

For

For is a special kind of loop that evaluates an expression for each element of an array or slice. It accepts an index declaration and an element declaration, which are scoped to the loop's body and are set to the index of the current element and the element itself respectively at the beginning of each iteration. The assignment rules of a for statement are identical to that of a normal loop.

Break

Break allows breaking out of loops. It may be assigned to anything, but the assignment will have no effect as execution of the code after and surrounding it will cease.

Return

Return allows terminating functions before they have reached their end. It accepts values that may be assigned to the function's return type. If a function does not return anything, the return statement does not accept a value. It may be assigned to anything, but the assignment will have no effect as execution of the function or method will cease.

Assignment

Assignment allows assigning the result of one expression to one or more location expressions. The assignment expression itself has no value and may not be assigned to anything.

Syntax entities

Below is a rough syntax description of the language. Note that <assignment> is right-associative, and <memberAccess> and <methodCall> are left-associative. I invite you to torture yourself by attempting to implement this without hand-writing a parser.

<file>     -> (<typedef> | <function> | <method>)*
<access>   -> "+" | "#" | "-"
<typedef>  -> [<access>] <typeIdentifier> ":" <type>
<function> -> [<access>] <signature> ["=" <expression>]
<method>   -> [<access>] <typeIdentifier> "." <function>

<type> -> <namedType>
        | <pointerType>
        | <sliceType>
        | <arrayType>
        | <structType>
        | <interfaceType>
<namedType>     -> <typeIdentifier>
<pointerType>   -> "*" <type>
<sliceType>     -> "*" ":" <type>
<arrayType>     -> <intLiteral> ":" <type>
<structType>    -> "(" "." <declaration>* ")"
<interfaceType> -> "(" "&" <signature>* ")"
<unionType>     -> "(" "|" <type>* ")"

<expression> -> <intLiteral>
              | <floatLiteral>
              | <stringLiteral>
              | <arrayLiteral>
              | <structLiteral>
              | <booleanLiteral>
              | <variable>
              | <declaration>
              | <call>
              | <subscript>
              | <length>
              | <dereference>
              | <reference>
              | <valueCast>
              | <bitCast>
              | <operation>
              | <block>
              | <memberAccess>
              | <methodCall>
              | <ifelse>
              | <loop>
              | <break>
              | <return>
              | <assignment>
<variable>     -> <identifier>
<declaration>  -> <identifier> ":" <type>
<call>         -> "[" <expression>+ "]"
<subscript>    -> "[" "." <expression> <expression> "]"
<slice>        -> "[" "\" <expression> <expression>? "/" <expression>? "]"
<length>       -> "[" "#" <expression> "]"
<dereference>  -> "[" "." <expression> "]"
<reference>    -> "[" "@" <expression> "]"
<valueCast>    -> "[" "~"  <type> <expression> "]"
<bitCast>      -> "[" "~~" <type> <expression> "]"
<operation>    -> "[" <operator> <expression>* "]"
<block>        -> "{" <expression>* "}"
<memberAccess> -> <expression> "." <identifier>
<methodCall>   -> <expression> "." <call>
<ifelse>       -> "if"   <expression>
                  "then" <expression>
                 ["else" <expression>]
<match>        -> "match"  <expression> <matchCase>*  [<defaultCase>]
<switch>       -> "switch" <expression> <switchCase>* [<defaultCase>]
<loop>         -> "loop" <expression>
<for>          -> "for" <expression> [<expression>] in <expression> <expression>
<break>        -> "[" "break" [<expression>] "]"
<return>       -> "[" "return" [<expression>] "]"
<assignment>   -> <expression> "=" <expression>

<intLiteral> -> /-?[1-9][0-9]*/
              | /-?0[0-7]*/
              | /-?0x[0-9a-fA-F]*/
              | /-?0b[0-1]*/
<floatLiteral>   -> /-?[0-9]*\.[0-9]+/
<stringLiteral>  -> /'.*'/
<arrayLiteral>   -> "(*" <expression>* ")"
<structLiteral>  -> "(." <member>* ")"
<booleanLiteral> -> "true" | "false"

<member>         -> <identifier> ":" <expression>
<matchCase>      -> "|" <declaration> <expression>
<switchCase>     -> "|" <expression> <expression>
<defaultCase>    -> "*" <expression>
<signature>      -> "[" <identifier> <declaration>* "]" [":" <type>]
<identifier>     -> /[a-z][A-Za-z]*/
<typeIdentifier> -> /[A-Z][A-Za-z]*/
<operator>       -> "+"  | "++" | "-"  | "--" | "*"  | "/" | "%"
                  | "!!" | "||" | "&&" | "^^"
                  | "!"  | "|"  | "&"  | "^"  | "<<" | ">>"
                  | "<"  | ">"  | "<=" | ">=" | "="