fspl/design/units.md

184 lines
7.4 KiB
Markdown

# Units
## Modules - Concept
- Equivalent to a package in Go
- Contains one or more FSPL source files
- Uniqued by a UUIDv4
- Depends on zero or more other units
- Source files in a module can access functionality of dependencies
## Addressing
When compiling source files, depending on a module, etc. an *address* is used to
refer to a *unit*, which is a module or file. An *addresser* is anything that
addresses a unit, the *addressee*. An addresser can be a module, a user invoking
the compiler, or something else. An address is represented by a string. If the
string ends in `.fspl`, the address refers to an FSPL source file. If not, the
address refers to a module.
If the address begins in a `/`, `./` or `../`, the address is interpreted as an
absolute path beginning from the filesystem root, the current directory of the
addressee, or the directory above that respectively. Otherwise, the unit is
searched for within a set of standard or configured paths.
For example, if the search path is `/usr/include/fspl`, and the address is
`foo`, then the unit will be located at `/usr/include/fspl/foo`. If the address
is `foo/bar`, then the unit will be located at `/usr/include/fspl/foo/bar`. If
there is an additional directory in the search path, such as
`/usr/local/include/fspl`, then the unit will be searched for in each one (in
order) until it is found.
There are standard paths that the compiler will search for units in. These are,
in order of preference:
- `$HOME/.local/src/fspl`
- `$HOME/.local/include/fspl`
- `/usr/local/src/fspl`
- `/usr/local/include/fspl`
- `/usr/src/fspl`
- `/usr/include/fspl`
Files in `include` directories should *not* include program code, and should
only define types and external functions and methods, similar to header files in
C. They may have a corresponding shared object file that programs can
dynamically link against.
Files in `src` directories *may* contain program code, and may be compiled into
an object file if the user wishes to link them statically. Because of FSPL's
ability to "skim" units (discussed later in this document), files in `src` may
be used in the same way that files in `include` are. Thus, `src` files are
effectively more "complete" versions of `include` files with extended
capability, and that is why they are searched first.
## Uniqueness
Each unit is uniqued by a UUID. Most users will never directly use UUIDs, but
they are essential in order to prevent name collisions within the compiler or
linker. For modules, the UUID is specified in the metadata file. For other
units, the UUID is a UUIDv3 (md5) generated using the zero-UUID as a namespace
and the basename of the file (with the extension) as the data.
When creating a module, a UUID should be randomly generated for it. Keep in mind
that altering the UUID of a library will cause programs that used to dynamically
load it to no longer function until they are re-compiled. Therefore, UUIDs
should only be altered if you are introducing breaking ABI changes to your
library. If you are forking an existing module and making changes to it, a
similar rule applies: only keep the same UUID if you intend on keeping an
entirely backwards compatible ABI.
Built-in entities that can be accessed globally from any module (such as the
`String` type) are given a zero-UUID, which represents the "global" unit.
Anything that is a part of this unit is accesisble from any other unit, without
having to use a nickname to refer to it.
When generating code, top-level entities must be named like this if their link
name was not specified manually:
`<uuid>::<name>`
Where `<uuid>` is the base64 encoding of the UUID. For example, the built-in
String type would be assigned the following link name:
`AAAAAAAAAAAAAAAAAAAAAA==::String`
And a type `Bird` in a lone source file with the name `bird.fspl` would be:
`eT/CnSopFDlFwpDCnSEAThjDsBw=::Bird`
Methods are named as follows:
`<uuid>::<name>.<method>`
Where `<uuid>` and `<name>` correspond to the base64 UUID of the unit and the
name of the method's owner type respectively, and `<method>` corresponds to the
method name.
## Module Structure
Each module is represented by a directory, which contains source files along
with a metadata file called `fspl.mod`. The metadata file is of the form:
```
<file> -> <UUIDv4> <directive>*
<directive> -> <depedency>
<dependency> -> "+" <stringLiteral> [<ident>]
<UUIDv4> -> <stringLiteral>
```
Metadata files only make use of tokens defined in the FSPL lexer, and are
designed to make use of the same parsing and lexing infrastructure used to parse
and tokenize source files. A sample metadata file might look like:
```
'5a8353f8-cad8-4604-be60-29a2575996bc'
+ 'io'
+ '../io' customIo
```
The UUID is represented as a string, and so are addresses. When depending on a
unit, it may be "nicknamed" by supplying an identifier after the address. This
changes how the unit is referred to within the module.
## Referencing Units
Compiled by itself, an FSPL source file has no access to other units. However,
when compiling a module as a whole, all source files within the module have
access to units depended on by the module's metadata file. Note that no actual
data or code is imported into the module from the units it depends on, because
all methods and functions defined within them are automatically turned into
prototypes. The module must be linked either statically or dynamically to the
unit's object code after compilation. This is why the FSPL compiler outputs
object files by default.
FSPL source files may reference functions or types from dependencies by
prefixing them with a unit name and a double colon (`::`), like this:
```
reader: io::Reader = x
data: *:Byte = io::[readAll reader]
```
The name of a unit depends on the associated dependency directive used in the
module metadata file. If a nickname is listed, then that is used as the unit
name. Otherwise, the unit name is the basename of the address, which is
normalized and formatted into a valid identifier by the the following rules:
- If the name contains at least one dot, the last dot and everything after it
are removed
- All non-alphabetical and non-numeric characters are removed, and any
alphabetical characters that were directly after them are converted to
uppercase
- All numeric digits at the start of the string are removed
- The first character is converted to lowercase
For example:
- `100-bottles-of-glue_test`
- `Picture.jpg`
- `Just a straight up sentence`
Would become:
- `bottlesOfGlueTest`
- `picture`
- `justAStraightUpSentence`
If the unit name is still not a valid identifier or is empty, the compiler will
refuse to process the module and it is up to the user to either nickname the
unit, or change the unit's basename to something workable.
The compiler will also refuse to process the module if one or more units end up
with the same unit name. However, a function or a variable may have the same
name as a unit because units are only ever used within the context of their own
special syntax (`::`).
## Future Work
Addresses do not necessarily have to refer to units. They could also refer to
arbitrary blobs to embed into a compiled program, similarly to how Go's embed
system works. There of course would need to be a distinction between depending
on units and embedding data, because someone might want to embed an FSPL source
file. Thus, there would need to be a separate metadata file directive, possibly
starting with an `!` or something like that.