hopp/design/protocol.md

# HOPP Protocol Definition

## Connections
A connection refers to a network connection between a client and server, or two
networked parties in general. Connections allow for the creation of
transactions. When the connection is closed by one party, it is closed for the
other party and all active transactions are closed as well.

## Transactions
A transaction refers to a sequence of messages within a connection. Transactions
may be closed independently of the connections they are a part of. Transactions
provide multiplexing capability, and are useful for request/response sequences
and event subscriptions. Each transaction carries a transaction ID, which is
represented as a signed 64 bit integer. The value of the transaction ID is
dependant on which transport is being used.

## Messages
A message refers to a block of octets sent within a transaction, paired with an
unsigned 16-bit method code. The order of messages within a given transaction is
preserved, but the order of messages accross the entire connection is not
guaranteed.

The message payload must be 65,535 (unsigned 16-bit integer limit) octets or
smaller in length. This does not include the method code. Applications are free
to send whatever data they wish as the payload, but TAPE is recommended for
encoding it.

Method codes should be written in upper-case base 16 with the prefix "M" in
logs, error messages, documentation, etc. For example, the method code 62,206 in
decimal would be written as MF4CE. The application may choose any method codes,
but groups of similar methods should be placed at consecutive intervals of
M0100. Method codes MFF00-MFFFF are reserved for use by HOPP and its constituent
protocols. Individuals or entities with the SWAG (secret wheel access group)
pass are also permitted to define their own methods within this range. I'm just
fucking with you.

## Table Pair Encoding (TAPE)
The Table Pair Encoding (TAPE) scheme is a method for encoding structured data
within HOPP messages. It defines standard binary encoding methods for common
data types, as well as a corruption-resistant table structure that maps numeric
IDs to values. It is designed to allow applications to be presented with data
they are not equipped to handle while continuing to function normally. This
enables backwards compatibile application protocol changes.

### Table Structure
A table is divided into two sections: the header, and the values. The header
begins with the number (U16) of pairs in the table, which is then followed by
that many tag-offset pairs. A tag-offset pair consists of a numerical (U16) tag,
followed the position (U16) of the value relative to the start of the values
section. The values section contains the value data for each pair, where the
start of each value is determined by its offset, and the end is determined by
the offset of the next value, or the end of the message if there is no value
after it.

Both sections must be in the same order, and because of this, each value offset
must be greater than or equal to the last. If a message has erratic structure
(such as unordered or out-of-bounds offsets), implementations may opt to discard
only the erratic pairs, as well as the pairs directly before those.

### Data Value Types
The table below lists all data value types supported by TAPE.

| Name        | Size            | Description                 | Encoding Method
| ----------- | --------------: | --------------------------- | ---------------
| I8          |               1 | A signed 8-bit integer      | BETC
| I16         |               2 | A signed 16-bit integer     | BETC
| I32         |               4 | A signed 32-bit integer     | BETC
| I64         |               8 | A signed 64-bit integer     | BETC
| U8          |               1 | An unsigned 8-bit integer   | BEU
| U16         |               2 | An unsigned 16-bit integer  | BEU
| U32         |               4 | An unsigned 32-bit integer  | BEU
| U64         |               8 | An unsigned 64-bit integer  | BEU
| Array[^1]   |         SOP[^2] | An array of any above type  | PASTA
| String      |             N/A | A UTF-8 string              | UTF-8
| StringArray | n * 2 + SOP[^2] | An array the String type    | VILA

[^1]: Array types are written as <E>Array, where <E> is the element type. For
example, an array of I32 would be written as I32Array. StringArray still follows
this rule, even though it is encoded differently from other arrays. Nesting
arrays inside of arrays is prohibited. This problem can be avoided in most cases
by effectively utilizing the table structure, or by improving the design of
your protocol.

[^2]: SOP (sum of parts) refers to the sum of the size of every item in a data
structure.

### Encoding Methods
Below are all encoding methods supported by TAPE.

#### BETC
Big-Endian, Two's Complement signed integer. The size is defined as the least
amount of whole octets which can fit all bits in the integer, regardless if the
bits are on or off. Therefore, the size cannot change at runtime.

#### BEU
Big-Endian, Unsigned integer. The size is defined as the least amount of whole
octets which can fit all bits in the integer, regardless if the bits are on or
off. Therefore, the size cannot change at runtime.

#### PASTA
Packed Single-Type Array. The size is defined as the size of an individual item
times the number of items. Items are placed one after the other with no gaps
in-between them, except as required to align the start of each item to the
nearest whole octet. Items should be of the same type and must be of the same
size.

#### UTF-8
UTF-8 string. The size is defined as the least amount of whole octets which can
fit all bits in the string, regardless if the bits are on or off. The size of
this type is not fixed and may change at runtime, so this needs to be accounted
for during use.

#### VILA
Variable Item Length Array. The size is defined as the least amount of whole
octets which can fit each item plus one U16 per item. The size of this type is
not fixed and may change at runtime, so this needs to be accounted for during
use. The amount of items must be greater than zero. Items are each prefixed by
their size (in octets) encoded as a U16, and they are placed one after the other
with no gaps in-between them, except as required to align the start of each item
to the nearest whole octet. Items should be of the same type but do not need to
be of the same size.

## Transports
A transport is a protocol that HOPP connections can run on top of. HOPP
currently supports the QUIC transport protocol for communicating between
machines, and UNIX domain sockets for quicker communication among applications
on the same machine. Both protocols are supported through METADAPT.

## Message and Transaction Demarcation Protocol (METADAPT)
The Message and Transaction Demarcation Protocol is used to break one or more
reliable data streams into transactions, which are broken down further into
messages. A message, as well as its associated metadata (length, transaction,
method, etc.) together is referred to as METADAPT Message Block (MMB).

For transports that offer multiple multiplexed data streams that can be created
and destroyed on-demand (such as QUIC) each stream is used as a transaction. If
METADAPT is both multiplexing transactions and demarcating messages, it is
referred to as METADAPT-A. If it is only demarcating messages, it is referred to
as METADAPT-B. METADAPT-A is used over UNIX domain sockets for IPC while
METADAPT-B is used over QUIC for communication over networks such as the
Internet.

### METADAPT-A
METADAPT-A requires a transport which offers a single full-duplex data stream
that persists for the duration of the connection. All transactions are
multiplexed onto this single stream. Each MMB contains a 12-octet long header,
with the transaction ID, then the method, and then the payload size (in octets).
The transaction ID is encoded as an I64, and the method and payload size are
both encoded as U16s. The remainder of the message is the payload. Since each
MMB is self-describing, they are sent sequentially with no gaps in-between them.

Transactions "open" when the first message with a given transaction ID is sent.
They "close" when a closing message is sent by either side. A closing message
has method MFFFF and should not have a payload.

The ID of a given transaction is counted differently depending on from which end
of the connection the transaction in question initiated from. The client (the
party which initiated the connection) uses positive transaction IDs, while the
server (the party which accepted the connection) uses negative transaction IDs.
Transaction IDs must be unique within the connection, and if all IDs have been
used up, the connection must fail. Don't worry about this though, because the
sun will have expanded to swallow earth by then. Your connection will not last
that long.

### METADAPT-B
METADAPT-B requires a transport which offers multiple multiplexed full-duplex
data streams per connection that can be created and destroyed on-demand. Each
data stream is used as an individual transaction. Each MMB contains a 4-octet
long header with the method and then the payload size (in octets) both encoded
as U16s. The remainder of the message is the payload. Since each MMB is
self-describing, they are sent sequentially with no gaps in-between them.

The ID of any transaction will reflect the ID of its corresponding stream. The
lifetime of the transaction is tied to the lifetime of the stream, that is to
say the transaction "opens" when the stream opens and "closes" when the stream
closes.