194 lines
10 KiB
Markdown
194 lines
10 KiB
Markdown
# HOPP Protocol Definition
|
|
|
|
## Connections
|
|
A connection refers to a network connection between a client and server, or two
|
|
networked parties in general. Connections allow for the creation of
|
|
transactions. When the connection is closed by one party, it is closed for the
|
|
other party and all active transactions are closed as well.
|
|
|
|
## Transactions
|
|
A transaction refers to a sequence of messages within a connection. Transactions
|
|
may be closed independently of the connections they are a part of. Transactions
|
|
provide multiplexing capability, and are useful for request/response sequences
|
|
and event subscriptions. Each transaction carries a transaction ID, which is
|
|
represented as a signed 64 bit integer. The value of the transaction ID is
|
|
dependant on which transport is being used.
|
|
|
|
## Messages
|
|
A message refers to a block of octets sent within a transaction, paired with an
|
|
unsigned 16-bit method code. The order of messages within a given transaction is
|
|
preserved, but the order of messages accross the entire connection is not
|
|
guaranteed.
|
|
|
|
The message payload must be 65,535 (unsigned 16-bit integer limit) octets or
|
|
smaller in length. This does not include the method code. Applications are free
|
|
to send whatever data they wish as the payload, but TAPE is recommended for
|
|
encoding it.
|
|
|
|
Method codes should be written in upper-case base 16 with the prefix "M" in
|
|
logs, error messages, documentation, etc. For example, the method code 62,206 in
|
|
decimal would be written as MF4CE. The application may choose any method codes,
|
|
but groups of similar methods should be placed at consecutive intervals of
|
|
M0100. Method codes MFF00-MFFFF are reserved for use by HOPP and its constituent
|
|
protocols. Individuals or entities with the SWAG (secret wheel access group)
|
|
pass are also permitted to define their own methods within this range. I'm just
|
|
fucking with you.
|
|
|
|
## Table Pair Encoding (TAPE)
|
|
The Table Pair Encoding (TAPE) scheme is a method for encoding structured data
|
|
within HOPP messages. It defines standard binary encoding methods for common
|
|
data types, as well as a corruption-resistant table structure that maps numeric
|
|
IDs to values. It is designed to allow applications to be presented with data
|
|
they are not equipped to handle while continuing to function normally. This
|
|
enables backwards compatibile application protocol changes.
|
|
|
|
### Table Structure
|
|
A table is divided into two sections: the header, and the values. The header
|
|
begins with the number (U16) of pairs in the table, which is then followed by
|
|
that many tag-offset pairs. A tag-offset pair consists of a numerical (U16) tag,
|
|
followed the position (U16) of the value relative to the start of the values
|
|
section. The values section contains the value data for each pair, where the
|
|
start of each value is determined by its offset, and the end is determined by
|
|
the offset of the next value, or the end of the message if there is no value
|
|
after it.
|
|
|
|
Both sections must be in the same order, and because of this, each value offset
|
|
must be greater than or equal to the last. If a message has erratic structure
|
|
(such as unordered or out-of-bounds offsets), implementations may opt to discard
|
|
only the erratic pairs, as well as the pairs directly before those.
|
|
|
|
### Data Value Types
|
|
The table below lists all data value types supported by TAPE.
|
|
|
|
| Name | Size | Description | Encoding Method
|
|
| ----------- | --------------: | --------------------------- | ---------------
|
|
| I8 | 1 | A signed 8-bit integer | BETC
|
|
| I16 | 2 | A signed 16-bit integer | BETC
|
|
| I32 | 4 | A signed 32-bit integer | BETC
|
|
| I64 | 8 | A signed 64-bit integer | BETC
|
|
| U8 | 1 | An unsigned 8-bit integer | BEU
|
|
| U16 | 2 | An unsigned 16-bit integer | BEU
|
|
| U32 | 4 | An unsigned 32-bit integer | BEU
|
|
| U64 | 8 | An unsigned 64-bit integer | BEU
|
|
| Array[^1] | SOP[^2] | An array of any above type | PASTA
|
|
| String | N/A | A UTF-8 string | UTF-8
|
|
| StringArray | n * 2 + SOP[^2] | An array the String type | VILA
|
|
|
|
[^1]: Array types are written as <E>Array, where <E> is the element type. For
|
|
example, an array of I32 would be written as I32Array. StringArray still follows
|
|
this rule, even though it is encoded differently from other arrays. Nesting
|
|
arrays inside of arrays is prohibited. This problem can be avoided in most cases
|
|
by effectively utilizing the table structure, or by improving the design of
|
|
your protocol.
|
|
|
|
[^2]: SOP (sum of parts) refers to the sum of the size of every item in a data
|
|
structure.
|
|
|
|
### Encoding Methods
|
|
Below are all encoding methods supported by TAPE.
|
|
|
|
#### BETC
|
|
Big-Endian, Two's Complement signed integer. The size is defined as the least
|
|
amount of whole octets which can fit all bits in the integer, regardless if the
|
|
bits are on or off. Therefore, the size cannot change at runtime.
|
|
|
|
#### BEU
|
|
Big-Endian, Unsigned integer. The size is defined as the least amount of whole
|
|
octets which can fit all bits in the integer, regardless if the bits are on or
|
|
off. Therefore, the size cannot change at runtime.
|
|
|
|
#### PASTA
|
|
Packed Single-Type Array. The size is defined as the size of an individual item
|
|
times the number of items. Items are placed one after the other with no gaps
|
|
in-between them, except as required to align the start of each item to the
|
|
nearest whole octet. Items should be of the same type and must be of the same
|
|
size.
|
|
|
|
#### UTF-8
|
|
UTF-8 string. The size is defined as the least amount of whole octets which can
|
|
fit all bits in the string, regardless if the bits are on or off. The size of
|
|
this type is not fixed and may change at runtime, so this needs to be accounted
|
|
for during use.
|
|
|
|
#### VILA
|
|
Variable Item Length Array. The size is defined as the least amount of whole
|
|
octets which can fit each item plus one U16 per item. The size of this type is
|
|
not fixed and may change at runtime, so this needs to be accounted for during
|
|
use. The amount of items must be greater than zero. Items are each prefixed by
|
|
their size (in octets) encoded as a U16, and they are placed one after the other
|
|
with no gaps in-between them, except as required to align the start of each item
|
|
to the nearest whole octet. Items should be of the same type but do not need to
|
|
be of the same size.
|
|
|
|
## Transports
|
|
A transport is a protocol that HOPP connections can run on top of. HOPP
|
|
currently supports the QUIC transport protocol for communicating between
|
|
machines, and UNIX domain sockets for quicker communication among applications
|
|
on the same machine. Both protocols are supported through METADAPT.
|
|
|
|
## Message and Transaction Demarcation Protocol (METADAPT)
|
|
The Message and Transaction Demarcation Protocol is used to break one or more
|
|
reliable data streams into transactions, which are broken down further into
|
|
messages. The representation of a message (or a part thereof) on the protocol,
|
|
including its associated metadata (length, transaction, method, etc.) is
|
|
referred to as METADAPT Message Block (MMB).
|
|
|
|
For transports that offer multiple multiplexed data streams that can be created
|
|
and destroyed on-demand (such as QUIC) each stream is used as a transaction. If
|
|
METADAPT is both multiplexing transactions and demarcating messages, it is
|
|
referred to as METADAPT-A. If it is only demarcating messages, it is referred to
|
|
as METADAPT-B. METADAPT-A is used over UNIX domain sockets for IPC while
|
|
METADAPT-B is used over QUIC for communication over networks such as the
|
|
Internet.
|
|
|
|
### METADAPT-A
|
|
METADAPT-A requires a transport which offers a single full-duplex data stream
|
|
that persists for the duration of the connection. All transactions are
|
|
multiplexed onto this single stream. Each MMB contains a 12-octet long header,
|
|
with the transaction ID, then the method, and then the payload size (in octets).
|
|
The transaction ID is encoded as an I64, the method is encoded as a U16 and the
|
|
and payload size is encoded as a U64. Only the 63 least significant bits of the
|
|
payload size describe the actual size, the most significant bit controlling
|
|
chunking. See the section on chunking for more information.
|
|
|
|
The remainder of the message is the payload. Since each
|
|
MMB is self-describing, they are sent sequentially with no gaps in-between them.
|
|
|
|
Transactions "open" when the first message with a given transaction ID is sent.
|
|
They "close" when a closing message is sent by either side. A closing message
|
|
has method MFFFF and should not have a payload.
|
|
|
|
The ID of a given transaction is counted differently depending on from which end
|
|
of the connection the transaction in question initiated from. The client (the
|
|
party which initiated the connection) uses positive transaction IDs, while the
|
|
server (the party which accepted the connection) uses negative transaction IDs.
|
|
Transaction IDs must be unique within the connection, and if all IDs have been
|
|
used up, the connection must fail. Don't worry about this though, because the
|
|
sun will have expanded to swallow earth by then. Your connection will not last
|
|
that long.
|
|
|
|
#### Message Chunking
|
|
|
|
The most significant bit of the payload size field of an MMB is called the Chunk
|
|
Control Bit (CCB). If the CCB of a given MMB is zero, the represented message is
|
|
interpreted as being self-contained and the data is processed immediately. If
|
|
the CCB is one, the message is interpreted as being chunked, with the data of
|
|
the current MMB being the first chunk. The data of further MMBs sent along the
|
|
transaction will be appended to the message until an MMB is read with a zero
|
|
CCB, in which case the MMB will be the last chunk and any more MMBs will be
|
|
interpreted as normal.
|
|
|
|
### METADAPT-B
|
|
METADAPT-B requires a transport which offers multiple multiplexed full-duplex
|
|
data streams per connection that can be created and destroyed on-demand. Each
|
|
data stream is used as an individual transaction. Each MMB contains a 4-octet
|
|
long header with the method and then the payload size (in octets) encoded as a
|
|
U16 and U64 respectively. The remainder of the message is the payload. Since
|
|
each MMB is self-describing, they are sent sequentially with no gaps in-between
|
|
them.
|
|
|
|
The ID of any transaction will reflect the ID of its corresponding stream. The
|
|
lifetime of the transaction is tied to the lifetime of the stream, that is to
|
|
say the transaction "opens" when the stream opens and "closes" when the stream
|
|
closes.
|