diff --git a/design/branched-generated-encoder.md b/design/branched-generated-encoder.md new file mode 100644 index 0000000..9360b6f --- /dev/null +++ b/design/branched-generated-encoder.md @@ -0,0 +1,123 @@ +# Branched Generated Decoder + +Pasted here because Tebitea is down + +## The problem + +TAPE is designed so that the decoder can gloss over data it does not understand. +Technically the protocol allows for this, but I completely forgot to implement +this in the generated decoder, oops. This would be trivial if TAPE messages were +still flat tables, but they aren't, because those aren't useful enough. So, +let's analyze the problem. + +## When it happens + +There are two reasons something might not match up with the expected data: + +The first and most obvious is unrecognized keys. If the key is not in the set of +recognized keys for a KTV, it should leave the corresponding struct field blank. +Once #6 has been implemented, throw an error if the data was not optional. + +The second is wrong types. If we are expecting KTV and get SBA, we should leave +the data as empty. The aforementioned concern about #6 also applies here. We +don't need to worry about special cases at the structure root, because it would +be technically possible to make the structure root an option, so it really is +just a normal value. Until #6, we will leave that blank too. + +## Preliminary ideas + +The first is going to be pretty simple. All we need to do is have a skimmer +function that skims over TAPE data very, and then call that on the KTV value +each time we run into a mystery key. It should only return an error if the +structure of the data is malformed in such a way that it cannot continue to the +next one. This should be stored in the tape package alongside the dynamic +decoding functions, because they will essentially function the same way and +could probably share lots of code. + +The second is a bit more complicated because of the existence of KTV and OTA +because they are aggregate types. Go types work a bit differently, as if you +have an array of an array of an array of ints, that information is represented +in one place, whereas TAPE doesn't really do that. All of that information is +sort of buried within the data structure, so we don't know what we will be +decoding before we actually do it. Whenever we encounter a type we don't expect, +we would need to abort decoding of the entire data structure, and then skim over +whatever detritus is left, which would literally be in a half-decoded state. The +fact that the code is generated flat and thus cannot use return or defer +statements contributes to the complexity of this problem. We need to go up, but +we can't. There is no up, only forward. + +Of course, the dynamic decoder does not have this problem in the first place +because it doesn't expect anything, and constructs the destination to fit +whatever it sees in the TAPE structure as it is decoding it. KTVs are completely +dynamic because they are implemented as maps, so the only time it needs to +completely comprehend a type is with OTAs. There is a function called typeOf +that gets the type of the current tag and returns it as a reflect.Type, which +necessitates recursion and peeking at OTAs and their elements. + +We could try to do the same thing in the generated decoder, comparing the +determined type against the expected type to try to figure out whether we should +decode an array or a table, etc. This is immediately problematic as it requires +memory to be allocated, both for the peek buffer and the resulting tree of type +information. If we end up with some crazy way to keep track of the types, that's +only one half of the allocation problem and we would still be spending extra +cycles going over all of that twice. + +## Performance constraints + +The generated decoder is supposed to blaze through data, and it can't do that if +it does all the singing and dancing that the dynamic decoder does. It's time for +some performance constraints: + +- No allocations, except as required to build the destination for the data +- No redundant work +- So, no freaking peeking +- It should take well under 500 lines of generated code to decode one message of +reasonable size (i.e. be careful not to bloat the binary) + +I'm not really going to do my usual thing here of making a slow version and +speeding it up over time based on evidence and experimentation because these +constraints inform the design so much it would be impossible to continue without +them. I am 99% confident that these constraints will allow for an acceptable +baseline of performance (for generated code) and we can still profile and +micro-optimize later. This is good enough for me. +Heavy solution + +There is a solution that might work very well which involves completely redoing +the generated decoding code. We could create a function for every source type to +destination type mapping that exists in protocol, and then compose them all +together. The decoding methods for each message or type would be wrappers around +the correct function for their root TAPE -> Go type mapping. The main benefit of +this is it would make this problem a lot more manageable because the interface +points between the data would be represented by function boundaries. This would +allow the use of return and defer statements, and would allow more code sharing, +producing a smaller binary. Go would probably inline these where needed. + +Would this work? Probably. More investigation is required to make sure. I want +to stop re-writing things I don't need to. On the other hand, it is just the +decoder. + +## Light solution + +TODO: find a solution that satisfies the performance constraints, keeps the same +identical interface, and works off the same code. I am convinced this is doable, +and it might even allow us to extract more data from an unexpected structure. +However, continuing this way might introduce unmanageable complexity. It is +already a little unmanageable and I am just one pony (kind of). + +## Implementation + +Heavy solution is going to work here, applied to only the points of +`Generator.generateDecodeValue` where it decodes an aggregate data structure. +That way, only minimal amounts of code need to be redone. + +Whenever a branch needs to happen, a call shall be generated, a deferred +implementation request shall be added to a special FIFO queue within the +generator. After generating data structures and their root decoding functions, +the generator shall pick away at this queue until no requests remain. The +generator shall accept new items during this process, so that recursion is +possible. This is all to ensure it is only ever writing one function at a time + +The functions shall take a pointer to a type that accepts any type like (~) the +destination's base type. We should also probably just call +`Generator.generateDecodeValue` directly on user defined types this way, keeping +their public `Decode` methods just for convenience.