hopp/design/branched-generated-encoder.md

# Branched Generated Decoder

Pasted here because Tebitea is down

## The problem

TAPE is designed so that the decoder can gloss over data it does not understand.
Technically the protocol allows for this, but I completely forgot to implement
this in the generated decoder, oops. This would be trivial if TAPE messages were
still flat tables, but they aren't, because those aren't useful enough. So,
let's analyze the problem.

## When it happens

There are two reasons something might not match up with the expected data:

The first and most obvious is unrecognized keys. If the key is not in the set of
recognized keys for a KTV, it should leave the corresponding struct field blank.
Once #6 has been implemented, throw an error if the data was not optional.

The second is wrong types. If we are expecting KTV and get SBA, we should leave
the data as empty. The aforementioned concern about #6 also applies here. We
don't need to worry about special cases at the structure root, because it would
be technically possible to make the structure root an option, so it really is
just a normal value. Until #6, we will leave that blank too.

## Preliminary ideas

The first is going to be pretty simple. All we need to do is have a skimmer
function that skims over TAPE data very, and then call that on the KTV value
each time we run into a mystery key. It should only return an error if the
structure of the data is malformed in such a way that it cannot continue to the
next one. This should be stored in the tape package alongside the dynamic
decoding functions, because they will essentially function the same way and
could probably share lots of code.

The second is a bit more complicated because of the existence of KTV and OTA
because they are aggregate types. Go types work a bit differently, as if you
have an array of an array of an array of ints, that information is represented
in one place, whereas TAPE doesn't really do that. All of that information is
sort of buried within the data structure, so we don't know what we will be
decoding before we actually do it. Whenever we encounter a type we don't expect,
we would need to abort decoding of the entire data structure, and then skim over
whatever detritus is left, which would literally be in a half-decoded state. The
fact that the code is generated flat and thus cannot use return or defer
statements contributes to the complexity of this problem. We need to go up, but
we can't. There is no up, only forward.

Of course, the dynamic decoder does not have this problem in the first place
because it doesn't expect anything, and constructs the destination to fit
whatever it sees in the TAPE structure as it is decoding it. KTVs are completely
dynamic because they are implemented as maps, so the only time it needs to
completely comprehend a type is with OTAs. There is a function called typeOf
that gets the type of the current tag and returns it as a reflect.Type, which
necessitates recursion and peeking at OTAs and their elements.

We could try to do the same thing in the generated decoder, comparing the
determined type against the expected type to try to figure out whether we should
decode an array or a table, etc. This is immediately problematic as it requires
memory to be allocated, both for the peek buffer and the resulting tree of type
information. If we end up with some crazy way to keep track of the types, that's
only one half of the allocation problem and we would still be spending extra
cycles going over all of that twice.

## Performance constraints

The generated decoder is supposed to blaze through data, and it can't do that if
it does all the singing and dancing that the dynamic decoder does. It's time for
some performance constraints:

- No allocations, except as required to build the destination for the data
- No redundant work
- So, no freaking peeking
- It should take well under 500 lines of generated code to decode one message of
reasonable size (i.e. be careful not to bloat the binary)

I'm not really going to do my usual thing here of making a slow version and
speeding it up over time based on evidence and experimentation because these
constraints inform the design so much it would be impossible to continue without
them. I am 99% confident that these constraints will allow for an acceptable
baseline of performance (for generated code) and we can still profile and
micro-optimize later. This is good enough for me.
Heavy solution

There is a solution that might work very well which involves completely redoing
the generated decoding code. We could create a function for every source type to
destination type mapping that exists in protocol, and then compose them all
together. The decoding methods for each message or type would be wrappers around
the correct function for their root TAPE -> Go type mapping. The main benefit of
this is it would make this problem a lot more manageable because the interface
points between the data would be represented by function boundaries. This would
allow the use of return and defer statements, and would allow more code sharing,
producing a smaller binary. Go would probably inline these where needed.

Would this work? Probably. More investigation is required to make sure. I want
to stop re-writing things I don't need to. On the other hand, it is just the
decoder.

## Light solution

TODO: find a solution that satisfies the performance constraints, keeps the same
identical interface, and works off the same code. I am convinced this is doable,
and it might even allow us to extract more data from an unexpected structure.
However, continuing this way might introduce unmanageable complexity. It is
already a little unmanageable and I am just one pony (kind of).

## Implementation

Heavy solution is going to work here, applied to only the points of
`Generator.generateDecodeValue` where it decodes an aggregate data structure.
That way, only minimal amounts of code need to be redone.

Whenever a branch needs to happen, a call shall be generated, a deferred
implementation request shall be added to a special FIFO queue within the
generator. After generating data structures and their root decoding functions,
the generator shall pick away at this queue until no requests remain. The
generator shall accept new items during this process, so that recursion is
possible. This is all to ensure it is only ever writing one function at a time

The functions shall take a pointer to a type that accepts any type like (~) the
destination's base type. We should also probably just call
`Generator.generateDecodeValue` directly on user defined types this way, keeping
their public `Decode` methods just for convenience.

The tape package shall contain a skimming function that takes a decoder and a
tag, and recursively consumes the decoder given the context of the tag. This
shall be utilized by the decoder functions to skip over values if their tags
or keys do not match up with what is expected.