Need to support larger messages, arrays #2
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
16 KiB is a lot of data sometimes but other times it really isn't. It's in the territory where some data could very well go over the limit and cause random issues. The reason its like this is because large messages could clog the line when using METADAPT-A, blocking other transactions while a gigabyte or more of data is transferred, and it could enable DOS attacks on applications that face the internet (or ingest a lot of data from the user without checking for an upper limit).
For METADAPT-B, this could be solved by simply increasing the message size field to a U64 (like in websockets), because with QUIC each stream is flow-controlled individually. For METADAPT-A, we would need to have the first bit of the message length be a "chunk" bit, meaning the data of the next message in the same transaction should be appended onto the data in this message, and the "chain" of chunks would end when a message doesn't have that bit set. Multiplexing with other transactions would be unaffected.
For DOS mitigation, a good solution would be to allow the protocol to impose a size limit on messages (including chunked ones) that would default to maybe like 1 megabyte. An application still might want to send unbounded streams of data though—so there could be an alternate way of reading the next message in a transaction that would return a reader, which would be fed chunks and closed on end. This would work, because imagine this scenario:
A protocol defines a file transfer transaction that starts with a Get message from the client, and ends with a Return message from the server. The return message is not TAPE encoded, but contains file contents.
A client initiates the transaction with:
And then requests the next message from the API as a reader.
The Server sends:
Using an alternate method where it receives a writer from the API. It pipes the data into the message, and then closes the writer, which flushes the buffer and sends the final message.
The client is returned the method code, alongside a reader which is fed new chunks as they arrive.
Additionally, TAPE could use a redesign. It does not need to be corruption resistant because a slightly corrupt message should just be totally rejected, and it needs to handle bigger data anyway if these changes are to mean anything. Plain TLV with a U16 tag and a U32 length would work well, the reason it isn't a U64 is accepting a buffer of that length would open the floodgates to DOS attacks. A U32 would at least be reasonable.
However, this would end up wasting a lot of data. Most things aren't even going to be close to a U32 in size, so on average that is 3/4 bytes that are zeros, which means the TL would in many cases be longer than the V. To solve this, there could be some sort of expanding integer format that works a bit like the METADAPT-A chunks, where the first bit of every byte could be a chunk bit, and all byte chunks would be big-endian'd together (not counting the chunk bit). Because the vast majority of data is likely going to be under 127 bytes long, this would allow the vast majority of Ls to be only one byte long, but infinitely long fields would still be possible.
Here's how much it costs to encode the type and length information with each encoding method, compared to the size of the actual value:
For how much length chunks it would take to describe 1 megabyte of data:
Binary form:
000100000000000000000000
Chunking:
000 1000000 0000000 0000000
Adding chunk bits:
10000000 11000000 10000000 00000000
As can be seen, chunked encoding in all cases uses up either the least amount of bytes, or is tied for the least amount of bytes. I'd call that an epic win.
TAPE could have a similar DOS mitigation measure as METADAPT, where the protocol can define a maximum field length (for all fields, because to enable backwards compatible protocols, field lengths must be self describing and not dependent on their type) that is around 1 megabyte by default.
VILA also needs to be altered to support chunked length encoding, and should probably draw its maximum element length from TAPE.