hopp/design/protocol.md

# HOPP Protocol Definition

## Connections
A connection refers to a network connection between a client and server, or two
networked parties in general. Connections allow for the creation of
transactions. When the connection is closed by one party, it is closed for the
other party and all active transactions are closed as well.

## Transactions
A transaction refers to a sequence of messages within a connection. Transactions
may be closed independently of the connections they are a part of. Transactions
provide multiplexing capability, and are useful for request/response sequences
and event subscriptions. Each transaction carries a transaction ID, which is
represented as a signed 64 bit integer. The value of the transaction ID is
dependant on which transport is being used.

## Messages
A message refers to a block of octets sent within a transaction, paired with an
unsigned 16-bit method code. The order of messages within a given transaction is
preserved, but the order of messages accross the entire connection is not
guaranteed. There is no functional limit on the size of a message payload, but
there may be one depending on which
[METADAPT sub-protocol](#message-and-transaction-demarcation-protocol-metadapt)
is in use.

Method codes should be written in upper-case base 16 with the prefix "M" in
logs, error messages, documentation, etc. For example, the method code 62,206 in
decimal would be written as MF4CE. The application may choose any method codes,
but groups of similar methods should be placed at consecutive intervals of
M0100. Method codes MFF00-MFFFF are reserved for use by HOPP and its constituent
protocols. Individuals or entities with the SWAG (secret wheel access group)
pass are also permitted to define their own methods within this range. I'm just
fucking with you.

## Table Pair Encoding (TAPE)
The Table Pair Encoding (TAPE) scheme is a method for encoding structured data
within HOPP messages. It defines standard binary encoding methods for common
data types, as well as aggregate data types such as tables and arrays. It is
designed to allow applications to be presented with data they are not equipped
to handle while continuing to function normally. This enables backwards
compatibile application protocol changes.

TAPE expresses types using tags. A tag is 8 bits in size, and is divided into
two parts: the Type Number (TN), and the Configuration Number (CN). The TN is 3
bits, and the CN is 5 bits. Both are interpreted as unsigned integers. Both
sides of the connection must agree on the semantic meaning of the values and
their arrangement.

A TAPE structure begins with one root, which consists of a tag followed by a
payload. This is usually an aggregate data structure such as KTV to allow for
several different values.

TAPE is based on an encoding method previously developed by silt.

### Data Value Types
The table below lists all data value types supported by TAPE. They are discussed
in detail in the following sections.

| TN | Bits | Name | Description
| -: | ---: | ---- | -----------
|  0 |  000 | SI   | Small integer
|  1 |  001 | LI   | Large integer
|  2 |  010 | FP   | Floating point
|  3 |  011 | SBA  | Small byte array
|  4 |  100 | LBA  | Large byte array
|  5 |  101 | OTA  | One-tag array
|  6 |  110 | KTV  | Key-tag-value table
|  7 |  111 | N/A  | Reserved

#### Small Integer (SI)
SI encodes an integer of up to 5 bits, which are stored in the CN. It has no
payload. Whether the bits are interpreted as unsigned or as signed two's
complement is semantic information and must be agreed upon by both sides of the
connection. Thus, the value may range from 0 to 31 if unsigned, and from -16 to
17 if signed.

#### Large Integer (LI)
LI encodes an integer of up to 256 bits, which are stored in the payload. The CN
determine the length of the payload in bytes. The integer is big-endian. Whether
the payload is interpreted as unsigned or as signed two's complement is semantic
information and must be agreed upon by both sides of the connection. Thus, the
value may range from 0 to 31 if unsigned, and from -16 to 17 if signed.

#### Floating Point (FP)
FP encodes an IEEE 754 floating point number of up to 256 bits, which are stored
in the payload. The CN determines the length of the payload in bytes, and it may
only be one of these values: 16, 32, 64, 128, or 256.

#### Small Byte Array (SBA)
SBA encodes an array of up to 32 bytes, which are stored in the paylod. The
CN determines the length of the payload in bytes.

#### Large Byte Array (LBA)
LBA encodes an array of up to 2^256 bytes, which are stored in the second part
of the payload, directly after the length. The length of the data length field
in bytes is determined by the CN.

#### One-Tag Array (OTA)
OTA encodes an array of up to 2^256 items, which are stored in the payload after
the length field and the item tag, where the length field comes first. Each item
must be the same length, as they all share the same tag. The length of the data
length field in bytes is determined by the CN.

#### Key-Tag-Value Table (KTV)
KTV encodes a table of up to 2^256 key/value pairs, which are stored in the
payload after the length field. The pairs themselves consist of a 16-bit
unsigned big-endian key followed by a tag and then the payload. Pair values can
be of different types and sizes. The order of the pairs is not significant and
should never be treated as such.

## Transports
A transport is a protocol that HOPP connections can run on top of. HOPP
currently supports the QUIC transport protocol for communicating between
machines, TCP/TLS for legacy systems that do not support QUIC, and UNIX domain
sockets for faster communication among applications on the same machine. Both
protocols are supported through METADAPT.

## Message and Transaction Demarcation Protocol (METADAPT)
The Message and Transaction Demarcation Protocol is used to break one or more
reliable data streams into transactions, which are broken down further into
messages. The representation of a message (or a part thereof) on the protocol,
including its associated metadata (length, transaction, method, etc.) is
referred to as METADAPT Message Block (MMB).

For transports that offer multiple multiplexed data streams that can be created
and destroyed on-demand (such as QUIC) each stream is used as a transaction. If
METADAPT is both multiplexing transactions and demarcating messages, it is
referred to as METADAPT-A. If it is only demarcating messages, it is referred to
as METADAPT-B. METADAPT-A is used over UNIX domain sockets for IPC while
METADAPT-B is used over QUIC for communication over networks such as the
Internet.

### METADAPT-A
METADAPT-A requires a transport which offers a single full-duplex data stream
that persists for the duration of the connection. All transactions are
multiplexed onto this single stream. Each MMB contains a 12-octet long header,
with the transaction ID, then the method, and then the payload size (in octets).
The transaction ID is encoded as an I64, the method is encoded as a U16 and the
and payload size is encoded as a U64. Only the 63 least significant bits of the
payload size describe the actual size, the most significant bit controlling
chunking. See the section on chunking for more information.

The remainder of the message is the payload. Since each
MMB is self-describing, they are sent sequentially with no gaps in-between them.

Transactions "open" when the first message with a given transaction ID is sent.
They "close" when a closing message is sent by either side. A closing message
has method MFFFF and should not have a payload.

The ID of a given transaction is counted differently depending on from which end
of the connection the transaction in question initiated from. The client (the
party which initiated the connection) uses positive transaction IDs, while the
server (the party which accepted the connection) uses negative transaction IDs.
Transaction IDs must be unique within the connection, and if all IDs have been
used up, the connection must fail. Don't worry about this though, because the
sun will have expanded to swallow earth by then. Your connection will not last
that long.

#### Message Chunking
The most significant bit of the payload size field of an MMB is called the Chunk
Control Bit (CCB). If the CCB of a given MMB is zero, the represented message is
interpreted as being self-contained and the data is processed immediately. If
the CCB is one, the message is interpreted as being chunked, with the data of
the current MMB being the first chunk. The data of further MMBs sent along the
transaction will be appended to the message until an MMB is read with a zero
CCB, in which case the MMB will be the last chunk and any more MMBs will be
interpreted as normal.

### METADAPT-B
METADAPT-B requires a transport which offers multiple multiplexed full-duplex
data streams per connection that can be created and destroyed on-demand. Each
data stream is used as an individual transaction. Each MMB contains a 4-octet
long header with the method and then the payload size (in octets) encoded as a
U16 and U64 respectively. The remainder of the message is the payload. Since
each MMB is self-describing, they are sent sequentially with no gaps in-between
them.

The ID of any transaction will reflect the ID of its corresponding stream. The
lifetime of the transaction is tied to the lifetime of the stream, that is to
say the transaction "opens" when the stream opens and "closes" when the stream
closes.