hopp/design/protocol.md

11 KiB

HOPP Protocol Definition

Connections

A connection refers to a network connection between a client and server, or two networked parties in general. Connections allow for the creation of transactions. When the connection is closed by one party, it is closed for the other party and all active transactions are closed as well.

Transactions

A transaction refers to a sequence of messages within a connection. Transactions may be closed independently of the connections they are a part of. Transactions provide multiplexing capability, and are useful for request/response sequences and event subscriptions. Each transaction carries a transaction ID, which is represented as a signed 64 bit integer. The value of the transaction ID is dependant on which transport is being used.

Messages

A message refers to a block of octets sent within a transaction, paired with an unsigned 16-bit method code. The order of messages within a given transaction is preserved, but the order of messages accross the entire connection is not guaranteed. There is no functional limit on the size of a message payload, but there may be one depending on which METADAPT sub-protocol is in use.

Method codes should be written in upper-case base 16 with the prefix "M" in logs, error messages, documentation, etc. For example, the method code 62,206 in decimal would be written as MF4CE. The application may choose any method codes, but groups of similar methods should be placed at consecutive intervals of M0100. Method codes MFF00-MFFFF are reserved for use by HOPP and its constituent protocols. Individuals or entities with the SWAG (secret wheel access group) pass are also permitted to define their own methods within this range. I'm just fucking with you.

Table Pair Encoding (TAPE)

The Table Pair Encoding (TAPE) scheme is a method for encoding structured data within HOPP messages. It defines standard binary encoding methods for common data types, as well as aggregate data types such as tables and arrays. It is designed to allow applications to be presented with data they are not equipped to handle while continuing to function normally. This enables backwards compatibile application protocol changes.

The length of a TAPE structure is assumed to be given by the surrounding protocol, which is usually METADAPT-A or B. The root of a TAPE structure can be any data value, but is usually a table, which can contain several values that each have a numeric key. Values can also be nested. Both sides of the connection must agree on what data type should be the root value, the data type of each known table value, etc.

Data Value Types

The table below lists all data value types supported by TAPE.

Name Size Description Encoding Method
I8 1 A signed 8-bit integer BETC
I16 2 A signed 16-bit integer BETC
I32 4 A signed 32-bit integer BETC
I64 8 A signed 64-bit integer BETC
U8 1 An unsigned 8-bit integer BEU
U16 2 An unsigned 16-bit integer BEU
U32 4 An unsigned 32-bit integer BEU
U64 8 An unsigned 64-bit integer BEU
Array1 An array of any above type PASTA
String A UTF-8 string UTF-8
StringArray An array the String type VILA
Table A table of any type TTLV

Encoding Methods

Below are all encoding methods supported by TAPE.

BETC

Big-Endian, Two's Complement signed integer. The size is defined as the least amount of whole octets which can fit all bits in the integer, regardless if the bits are on or off. Therefore, the size cannot change at runtime.

BEU

Big-Endian, Unsigned integer. The size is defined as the least amount of whole octets which can fit all bits in the integer, regardless if the bits are on or off. Therefore, the size cannot change at runtime.

GBEU

Growing Big-Endian, Unsigned integer. The integer is broken up into 8-bit chunks, where the first bit of each chunk is a CCB. The chunk with its CCB set to zero instead of one is the last chunk in the integer. Chunks are ordered from most significant to least significant (big endian). The size is defined as the least amount of whole octets which can fit all chunks of the integer. The size of this type is not fixed and may change at runtime, so this needs to be accounted for during use.

PASTA

Packed Single-Type Array. The size is defined as the size of an individual item times the number of items. Items are placed one after the other with no gaps in-between them, except as required to align the start of each item to the nearest whole octet. Items should be of the same type and must be of the same size.

UTF-8

UTF-8 string. The size is defined as the least amount of whole octets which can fit all bits in the string, regardless if the bits are on or off. The size of this type is not fixed and may change at runtime, so this needs to be accounted for during use.

VILA

Variable Item Length Array. The size is defined as the least amount of whole octets which can fit each item plus one GBEU per item describing that item's size. The size of this type is not fixed and may change at runtime, so this needs to be accounted for during use. The amount of items must be greater than zero. Items are each prefixed by their size (in octets) encoded as a GBEU, and they are placed one after the other with no gaps in-between them, except as required to align the start of each item to the nearest whole octet. Items should be of the same type but do not need to be of the same size.

TTLV

TAPE Tag Length Value. The size is defined as the least amount of whole octets which can fit each item plus one U16 and one GBEU per item, where the latter of which describes that item's size. The size of this type is not fixed and may change at runtime, so this needs to be accounted for during use. Items are each prefixed by their numerical tag encoded as a U16, and their size (in octets) encoded as a GBEU. Items are placed one after the other with no gaps in-between them, except as required to align the start of each item to the nearest whole octet. Items need not be of the same type nor the same size.

Transports

A transport is a protocol that HOPP connections can run on top of. HOPP currently supports the QUIC transport protocol for communicating between machines, TCP/TLS for legacy systems that do not support QUIC, and UNIX domain sockets for faster communication among applications on the same machine. Both protocols are supported through METADAPT.

Message and Transaction Demarcation Protocol (METADAPT)

The Message and Transaction Demarcation Protocol is used to break one or more reliable data streams into transactions, which are broken down further into messages. The representation of a message (or a part thereof) on the protocol, including its associated metadata (length, transaction, method, etc.) is referred to as METADAPT Message Block (MMB).

For transports that offer multiple multiplexed data streams that can be created and destroyed on-demand (such as QUIC) each stream is used as a transaction. If METADAPT is both multiplexing transactions and demarcating messages, it is referred to as METADAPT-A. If it is only demarcating messages, it is referred to as METADAPT-B. METADAPT-A is used over UNIX domain sockets for IPC while METADAPT-B is used over QUIC for communication over networks such as the Internet.

METADAPT-A

METADAPT-A requires a transport which offers a single full-duplex data stream that persists for the duration of the connection. All transactions are multiplexed onto this single stream. Each MMB contains a 12-octet long header, with the transaction ID, then the method, and then the payload size (in octets). The transaction ID is encoded as an I64, the method is encoded as a U16 and the and payload size is encoded as a U64. Only the 63 least significant bits of the payload size describe the actual size, the most significant bit controlling chunking. See the section on chunking for more information.

The remainder of the message is the payload. Since each MMB is self-describing, they are sent sequentially with no gaps in-between them.

Transactions "open" when the first message with a given transaction ID is sent. They "close" when a closing message is sent by either side. A closing message has method MFFFF and should not have a payload.

The ID of a given transaction is counted differently depending on from which end of the connection the transaction in question initiated from. The client (the party which initiated the connection) uses positive transaction IDs, while the server (the party which accepted the connection) uses negative transaction IDs. Transaction IDs must be unique within the connection, and if all IDs have been used up, the connection must fail. Don't worry about this though, because the sun will have expanded to swallow earth by then. Your connection will not last that long.

Message Chunking

The most significant bit of the payload size field of an MMB is called the Chunk Control Bit (CCB). If the CCB of a given MMB is zero, the represented message is interpreted as being self-contained and the data is processed immediately. If the CCB is one, the message is interpreted as being chunked, with the data of the current MMB being the first chunk. The data of further MMBs sent along the transaction will be appended to the message until an MMB is read with a zero CCB, in which case the MMB will be the last chunk and any more MMBs will be interpreted as normal.

METADAPT-B

METADAPT-B requires a transport which offers multiple multiplexed full-duplex data streams per connection that can be created and destroyed on-demand. Each data stream is used as an individual transaction. Each MMB contains a 4-octet long header with the method and then the payload size (in octets) encoded as a U16 and U64 respectively. The remainder of the message is the payload. Since each MMB is self-describing, they are sent sequentially with no gaps in-between them.

The ID of any transaction will reflect the ID of its corresponding stream. The lifetime of the transaction is tied to the lifetime of the stream, that is to say the transaction "opens" when the stream opens and "closes" when the stream closes.


  1. Array types are written as Array, where is the element type. For example, an array of I32 would be written as I32Array. StringArray still follows this rule, even though it is encoded differently from other arrays. Nesting arrays inside of arrays is prohibited. This problem can be avoided in most cases by effectively utilizing the table structure, or by improving the design of your protocol. ↩︎