hopp/design/pdl-language.md

# PDL Language Definition

PDL allows defining a protocol using HOPP and TAPE.

## Data Types

| Syntax     | TN      | CN | Description
| ---------- | ------- | -: | -----------
| I5         | SI      |    |
| I8         | LSI     |  0 |
| I16        | LSI     |  1 |
| I32        | LSI     |  3 |
| I64        | LSI     |  7 |
| I128[^2]   | LSI     | 15 |
| I256[^2]   | LSI     | 31 |
| U5         | SI      |    |
| U8         | LI      |  0 |
| U16        | LI      |  1 |
| U32        | LI      |  3 |
| U64        | LI      |  7 |
| U128[^2]   | LI      | 15 |
| U256[^2]   | LI      | 31 |
| F16        | FP      |  1 |
| F32        | FP      |  3 |
| F64        | FP      |  7 |
| F128[^2]   | FP      | 15 |
| F256[^2]   | FP      | 31 |
| String     | SBA/LBA |  * | UTF-8 string
| Buffer     | SBA/LBA |  * | Byte array
| []\<TYPE\> | OTA     |  * | Array of any type[^1]
| Table      | KTV     |  * | Table with undefined schema
| {...}      | KTV     |  * | Table with defined schema
| Any        | *       |  * | Value of an undefined type

[^1]: Excluding SI and SBA. I5 and U5 cannot be used in an array, but String and
Buffer are simply forced to use their "long" variant.

[^2]: Some systems may lack support for this.

## Tokens

PDL files are divided into tokens, which assemble together into larger language
structures. They are separated by whitespace.

| Name     | Syntax             | Description
| -------- | ------------------ | -----------
| Method   | `M[0-9A-Fa-f]{4}`  | A 16-bit hexadecimal method code.
| Key      | `[0-9A-Fa-f]{4}`   | A 16-bit hexadecimal table key.
| Ident    | `[A-Z][A-Za-z0-9]` | An identifier.
| Comma    | `,`                | A comma separator.
| LBrace   | `{`                | A left curly brace.
| RBrace   | `}`                | A right curly brace.
| LBracket | `[`                | A left square bracket.
| RBracket | `]`                | A right square bracket.
| Comment  | `\/\/.*$`          | A doc comment starting with a double-slash.

## Syntax

Types are expressed with an Ident. A table can be used by either writing the
name of the type (Table), or by defining a schema with curly braces. Arrays must
be expressed using two matching square brackets before their element type.

A table schema contains comma-separated fields in-between its braces. Each field
has three parts: the key number (Key), the field name (Ident), and the field
type. Tables, Arrays, etc. can be nested.

Files directly contain messages and types, which start with a Method token and
an Ident token respectively. A message consists of the method code (Method), the
message name (Ident), and the message's root type. This is usually a table, but
can be anything.

Messages, types, and table fields can all have doc comments preceding them,
which are used to generate documentation for the protocol. The syntax is the
same as Go's (for now). Comments aren't allowed anywhere else.

Here is an example of all that:

```
// Connect is sent from the client to the server as the first message of an
// authenticated transaction.
M0000 Connect {
	0000 Name String,
	0001 Password String,
}

// UserList is sent from the server to the client in response to a Connect
// message.
M0001 UserList {
	0000 Users []User,
}

// User holds profile information about a single user.
User {
	0000 Name      String,
	0001 Bio       String,
	0002 Followers U32,
}
```

## EBNF Description

Below is an EBNF description of the language.

```
<file>    -> (<message> | <typedef)*
<method>  -> /M[0-9A-Fa-f]{4}/
<key>     -> /[0-9A-Fa-f]{4}/
<ident>   -> /[A-Z][A-Za-z0-9]/
<field>   -> <key> <ident> <type>
<type> -> <ident>
        | "[" "]" <type>
        | "{" (<comment>* <field> ",")* [<comment>* <field>] "}"
<message> -> <comment>* <method> <ident> <type>
<typedef> -> <comment>* <ident> <type>
```