fspl/design/units.md

7.6 KiB

Units

Modules - Concept

  • Equivalent to a package in Go
  • Contains one or more FSPL source files
  • Uniqued by a UUIDv4
  • Depends on zero or more other units
  • Source files in a module can access functionality of dependencies

Addressing

When compiling source files, depending on a module, etc. an address is used to refer to a unit, which is a module or file. An addresser is anything that addresses a unit, the addressee. An addresser can be a module, a user invoking the compiler, or something else. An address is represented by a string. If the string ends in .fspl, the address refers to an FSPL source file. If not, the address refers to a module.

If the address begins in a /, ./ or ../, the address is interpreted as an absolute path beginning from the filesystem root, the current directory of the addressee, or the directory above that respectively. Otherwise, the unit is searched for within a set of standard or configured paths.

For example, if the search path is /usr/include/fspl, and the address is foo, then the unit will be located at /usr/include/fspl/foo. If the address is foo/bar, then the unit will be located at /usr/include/fspl/foo/bar. If there is an additional directory in the search path, such as /usr/local/include/fspl, then the unit will be searched for in each one (in order) until it is found.

There are standard paths that the compiler will search for units in. These are, in order of preference:

  • $HOME/.local/src/fspl
  • $HOME/.local/include/fspl
  • /usr/local/src/fspl
  • /usr/local/include/fspl
  • /usr/src/fspl
  • /usr/include/fspl

On Windows, these are used instead:

  • %LOCALAPPDATA%\fspl\src
  • %LOCALAPPDATA%\fspl\include
  • %ALLUSERSPROFILE%\fspl\src
  • %ALLUSERSPROFILE%\fspl\include
  • %ProgramFiles%\fspl\src
  • %ProgramFiles%\fspl\include

Files in include directories should not include program code, and should only define types and external functions and methods, similar to header files in C. They may have a corresponding shared object file that programs can dynamically link against.

Files in src directories may contain program code, and may be compiled into an object file if the user wishes to link them statically. Because of FSPL's ability to "skim" units (discussed later in this document), files in src may be used in the same way that files in include are. Thus, src files are effectively more "complete" versions of include files with extended capability, and that is why they are searched first.

Uniqueness

Each unit is uniqued by a UUID. Most users will never directly use UUIDs, but they are essential in order to prevent name collisions within the compiler or linker. For modules, the UUID is specified in the metadata file. For other units, the UUID is a UUIDv3 (md5) generated using the zero-UUID as a namespace and the basename of the file (with the extension) as the data.

When creating a module, a UUID should be randomly generated for it. Keep in mind that altering the UUID of a library will cause programs that used to dynamically load it to no longer function until they are re-compiled. Therefore, UUIDs should only be altered if you are introducing breaking ABI changes to your library. If you are forking an existing module and making changes to it, a similar rule applies: only keep the same UUID if you intend on keeping an entirely backwards compatible ABI.

Built-in entities that can be accessed globally from any module (such as the String type) are given a zero-UUID, which represents the "global" unit. Anything that is a part of this unit is accesisble from any other unit, without having to use a nickname to refer to it.

When generating code, top-level entities must be named like this if their link name was not specified manually:

<uuid>::<name>

Where <uuid> is the base64 encoding of the UUID. For example, the built-in String type would be assigned the following link name:

AAAAAAAAAAAAAAAAAAAAAA==::String

And a type Bird in a lone source file with the name bird.fspl would be:

eT/CnSopFDlFwpDCnSEAThjDsBw=::Bird

Methods are named as follows:

<uuid>::<name>.<method>

Where <uuid> and <name> correspond to the base64 UUID of the unit and the name of the method's owner type respectively, and <method> corresponds to the method name.

Module Structure

Each module is represented by a directory, which contains source files along with a metadata file called fspl.mod. The metadata file is of the form:

<file>       -> <UUIDv4> <directive>*
<directive>  -> <depedency>
<dependency> -> "+" <stringLiteral> [<ident>]
<UUIDv4>     -> <stringLiteral>

Metadata files only make use of tokens defined in the FSPL lexer, and are designed to make use of the same parsing and lexing infrastructure used to parse and tokenize source files. A sample metadata file might look like:

'5a8353f8-cad8-4604-be60-29a2575996bc'
+ 'io'
+ '../io' customIo

The UUID is represented as a string, and so are addresses. When depending on a unit, it may be "nicknamed" by supplying an identifier after the address. This changes how the unit is referred to within the module.

Referencing Units

Compiled by itself, an FSPL source file has no access to other units. However, when compiling a module as a whole, all source files within the module have access to units depended on by the module's metadata file. Note that no actual data or code is imported into the module from the units it depends on, because all methods and functions defined within them are automatically turned into prototypes. The module must be linked either statically or dynamically to the unit's object code after compilation. This is why the FSPL compiler outputs object files by default.

FSPL source files may reference functions or types from dependencies by prefixing them with a unit name and a double colon (::), like this:

reader: io::Reader = x
data:   *:Byte     = io::[readAll reader]

The name of a unit depends on the associated dependency directive used in the module metadata file. If a nickname is listed, then that is used as the unit name. Otherwise, the unit name is the basename of the address, which is normalized and formatted into a valid identifier by the the following rules:

  • If the name contains at least one dot, the last dot and everything after it are removed
  • All non-alphabetical and non-numeric characters are removed, and any alphabetical characters that were directly after them are converted to uppercase
  • All numeric digits at the start of the string are removed
  • The first character is converted to lowercase

For example:

  • 100-bottles-of-glue_test
  • Picture.jpg
  • Just a straight up sentence

Would become:

  • bottlesOfGlueTest
  • picture
  • justAStraightUpSentence

If the unit name is still not a valid identifier or is empty, the compiler will refuse to process the module and it is up to the user to either nickname the unit, or change the unit's basename to something workable.

The compiler will also refuse to process the module if one or more units end up with the same unit name. However, a function or a variable may have the same name as a unit because units are only ever used within the context of their own special syntax (::).

Future Work

Addresses do not necessarily have to refer to units. They could also refer to arbitrary blobs to embed into a compiled program, similarly to how Go's embed system works. There of course would need to be a distinction between depending on units and embedding data, because someone might want to embed an FSPL source file. Thus, there would need to be a separate metadata file directive, possibly starting with an ! or something like that.