message-size-increase #3
123
design/branched-generated-encoder.md
Normal file
123
design/branched-generated-encoder.md
Normal file
@ -0,0 +1,123 @@
|
||||
# Branched Generated Decoder
|
||||
|
||||
Pasted here because Tebitea is down
|
||||
|
||||
## The problem
|
||||
|
||||
TAPE is designed so that the decoder can gloss over data it does not understand.
|
||||
Technically the protocol allows for this, but I completely forgot to implement
|
||||
this in the generated decoder, oops. This would be trivial if TAPE messages were
|
||||
still flat tables, but they aren't, because those aren't useful enough. So,
|
||||
let's analyze the problem.
|
||||
|
||||
## When it happens
|
||||
|
||||
There are two reasons something might not match up with the expected data:
|
||||
|
||||
The first and most obvious is unrecognized keys. If the key is not in the set of
|
||||
recognized keys for a KTV, it should leave the corresponding struct field blank.
|
||||
Once #6 has been implemented, throw an error if the data was not optional.
|
||||
|
||||
The second is wrong types. If we are expecting KTV and get SBA, we should leave
|
||||
the data as empty. The aforementioned concern about #6 also applies here. We
|
||||
don't need to worry about special cases at the structure root, because it would
|
||||
be technically possible to make the structure root an option, so it really is
|
||||
just a normal value. Until #6, we will leave that blank too.
|
||||
|
||||
## Preliminary ideas
|
||||
|
||||
The first is going to be pretty simple. All we need to do is have a skimmer
|
||||
function that skims over TAPE data very, and then call that on the KTV value
|
||||
each time we run into a mystery key. It should only return an error if the
|
||||
structure of the data is malformed in such a way that it cannot continue to the
|
||||
next one. This should be stored in the tape package alongside the dynamic
|
||||
decoding functions, because they will essentially function the same way and
|
||||
could probably share lots of code.
|
||||
|
||||
The second is a bit more complicated because of the existence of KTV and OTA
|
||||
because they are aggregate types. Go types work a bit differently, as if you
|
||||
have an array of an array of an array of ints, that information is represented
|
||||
in one place, whereas TAPE doesn't really do that. All of that information is
|
||||
sort of buried within the data structure, so we don't know what we will be
|
||||
decoding before we actually do it. Whenever we encounter a type we don't expect,
|
||||
we would need to abort decoding of the entire data structure, and then skim over
|
||||
whatever detritus is left, which would literally be in a half-decoded state. The
|
||||
fact that the code is generated flat and thus cannot use return or defer
|
||||
statements contributes to the complexity of this problem. We need to go up, but
|
||||
we can't. There is no up, only forward.
|
||||
|
||||
Of course, the dynamic decoder does not have this problem in the first place
|
||||
because it doesn't expect anything, and constructs the destination to fit
|
||||
whatever it sees in the TAPE structure as it is decoding it. KTVs are completely
|
||||
dynamic because they are implemented as maps, so the only time it needs to
|
||||
completely comprehend a type is with OTAs. There is a function called typeOf
|
||||
that gets the type of the current tag and returns it as a reflect.Type, which
|
||||
necessitates recursion and peeking at OTAs and their elements.
|
||||
|
||||
We could try to do the same thing in the generated decoder, comparing the
|
||||
determined type against the expected type to try to figure out whether we should
|
||||
decode an array or a table, etc. This is immediately problematic as it requires
|
||||
memory to be allocated, both for the peek buffer and the resulting tree of type
|
||||
information. If we end up with some crazy way to keep track of the types, that's
|
||||
only one half of the allocation problem and we would still be spending extra
|
||||
cycles going over all of that twice.
|
||||
|
||||
## Performance constraints
|
||||
|
||||
The generated decoder is supposed to blaze through data, and it can't do that if
|
||||
it does all the singing and dancing that the dynamic decoder does. It's time for
|
||||
some performance constraints:
|
||||
|
||||
- No allocations, except as required to build the destination for the data
|
||||
- No redundant work
|
||||
- So, no freaking peeking
|
||||
- It should take well under 500 lines of generated code to decode one message of
|
||||
reasonable size (i.e. be careful not to bloat the binary)
|
||||
|
||||
I'm not really going to do my usual thing here of making a slow version and
|
||||
speeding it up over time based on evidence and experimentation because these
|
||||
constraints inform the design so much it would be impossible to continue without
|
||||
them. I am 99% confident that these constraints will allow for an acceptable
|
||||
baseline of performance (for generated code) and we can still profile and
|
||||
micro-optimize later. This is good enough for me.
|
||||
Heavy solution
|
||||
|
||||
There is a solution that might work very well which involves completely redoing
|
||||
the generated decoding code. We could create a function for every source type to
|
||||
destination type mapping that exists in protocol, and then compose them all
|
||||
together. The decoding methods for each message or type would be wrappers around
|
||||
the correct function for their root TAPE -> Go type mapping. The main benefit of
|
||||
this is it would make this problem a lot more manageable because the interface
|
||||
points between the data would be represented by function boundaries. This would
|
||||
allow the use of return and defer statements, and would allow more code sharing,
|
||||
producing a smaller binary. Go would probably inline these where needed.
|
||||
|
||||
Would this work? Probably. More investigation is required to make sure. I want
|
||||
to stop re-writing things I don't need to. On the other hand, it is just the
|
||||
decoder.
|
||||
|
||||
## Light solution
|
||||
|
||||
TODO: find a solution that satisfies the performance constraints, keeps the same
|
||||
identical interface, and works off the same code. I am convinced this is doable,
|
||||
and it might even allow us to extract more data from an unexpected structure.
|
||||
However, continuing this way might introduce unmanageable complexity. It is
|
||||
already a little unmanageable and I am just one pony (kind of).
|
||||
|
||||
## Implementation
|
||||
|
||||
Heavy solution is going to work here, applied to only the points of
|
||||
`Generator.generateDecodeValue` where it decodes an aggregate data structure.
|
||||
That way, only minimal amounts of code need to be redone.
|
||||
|
||||
Whenever a branch needs to happen, a call shall be generated, a deferred
|
||||
implementation request shall be added to a special FIFO queue within the
|
||||
generator. After generating data structures and their root decoding functions,
|
||||
the generator shall pick away at this queue until no requests remain. The
|
||||
generator shall accept new items during this process, so that recursion is
|
||||
possible. This is all to ensure it is only ever writing one function at a time
|
||||
|
||||
The functions shall take a pointer to a type that accepts any type like (~) the
|
||||
destination's base type. We should also probably just call
|
||||
`Generator.generateDecodeValue` directly on user defined types this way, keeping
|
||||
their public `Decode` methods just for convenience.
|
Loading…
x
Reference in New Issue
Block a user