diff --git a/design/branched-generated-encoder.md b/design/branched-generated-encoder.md
new file mode 100644
index 0000000..9360b6f
--- /dev/null
+++ b/design/branched-generated-encoder.md
@@ -0,0 +1,123 @@
+# Branched Generated Decoder
+
+Pasted here because Tebitea is down
+
+## The problem
+
+TAPE is designed so that the decoder can gloss over data it does not understand.
+Technically the protocol allows for this, but I completely forgot to implement
+this in the generated decoder, oops. This would be trivial if TAPE messages were
+still flat tables, but they aren't, because those aren't useful enough. So,
+let's analyze the problem.
+
+## When it happens
+
+There are two reasons something might not match up with the expected data:
+
+The first and most obvious is unrecognized keys. If the key is not in the set of
+recognized keys for a KTV, it should leave the corresponding struct field blank.
+Once #6 has been implemented, throw an error if the data was not optional.
+
+The second is wrong types. If we are expecting KTV and get SBA, we should leave
+the data as empty. The aforementioned concern about #6 also applies here. We
+don't need to worry about special cases at the structure root, because it would
+be technically possible to make the structure root an option, so it really is
+just a normal value. Until #6, we will leave that blank too.
+
+## Preliminary ideas
+
+The first is going to be pretty simple. All we need to do is have a skimmer
+function that skims over TAPE data very, and then call that on the KTV value
+each time we run into a mystery key. It should only return an error if the
+structure of the data is malformed in such a way that it cannot continue to the
+next one. This should be stored in the tape package alongside the dynamic
+decoding functions, because they will essentially function the same way and
+could probably share lots of code.
+
+The second is a bit more complicated because of the existence of KTV and OTA
+because they are aggregate types. Go types work a bit differently, as if you
+have an array of an array of an array of ints, that information is represented
+in one place, whereas TAPE doesn't really do that. All of that information is
+sort of buried within the data structure, so we don't know what we will be
+decoding before we actually do it. Whenever we encounter a type we don't expect,
+we would need to abort decoding of the entire data structure, and then skim over
+whatever detritus is left, which would literally be in a half-decoded state. The
+fact that the code is generated flat and thus cannot use return or defer
+statements contributes to the complexity of this problem. We need to go up, but
+we can't. There is no up, only forward.
+
+Of course, the dynamic decoder does not have this problem in the first place
+because it doesn't expect anything, and constructs the destination to fit
+whatever it sees in the TAPE structure as it is decoding it. KTVs are completely
+dynamic because they are implemented as maps, so the only time it needs to
+completely comprehend a type is with OTAs. There is a function called typeOf
+that gets the type of the current tag and returns it as a reflect.Type, which
+necessitates recursion and peeking at OTAs and their elements.
+
+We could try to do the same thing in the generated decoder, comparing the
+determined type against the expected type to try to figure out whether we should
+decode an array or a table, etc. This is immediately problematic as it requires
+memory to be allocated, both for the peek buffer and the resulting tree of type
+information. If we end up with some crazy way to keep track of the types, that's
+only one half of the allocation problem and we would still be spending extra
+cycles going over all of that twice.
+
+## Performance constraints
+
+The generated decoder is supposed to blaze through data, and it can't do that if
+it does all the singing and dancing that the dynamic decoder does. It's time for
+some performance constraints:
+
+- No allocations, except as required to build the destination for the data
+- No redundant work
+- So, no freaking peeking
+- It should take well under 500 lines of generated code to decode one message of
+reasonable size (i.e. be careful not to bloat the binary)
+
+I'm not really going to do my usual thing here of making a slow version and
+speeding it up over time based on evidence and experimentation because these
+constraints inform the design so much it would be impossible to continue without
+them. I am 99% confident that these constraints will allow for an acceptable
+baseline of performance (for generated code) and we can still profile and
+micro-optimize later. This is good enough for me.
+Heavy solution
+
+There is a solution that might work very well which involves completely redoing
+the generated decoding code. We could create a function for every source type to
+destination type mapping that exists in protocol, and then compose them all
+together. The decoding methods for each message or type would be wrappers around
+the correct function for their root TAPE -> Go type mapping. The main benefit of
+this is it would make this problem a lot more manageable because the interface
+points between the data would be represented by function boundaries. This would
+allow the use of return and defer statements, and would allow more code sharing,
+producing a smaller binary. Go would probably inline these where needed.
+
+Would this work? Probably. More investigation is required to make sure. I want
+to stop re-writing things I don't need to. On the other hand, it is just the
+decoder.
+
+## Light solution
+
+TODO: find a solution that satisfies the performance constraints, keeps the same
+identical interface, and works off the same code. I am convinced this is doable,
+and it might even allow us to extract more data from an unexpected structure.
+However, continuing this way might introduce unmanageable complexity. It is
+already a little unmanageable and I am just one pony (kind of).
+
+## Implementation
+
+Heavy solution is going to work here, applied to only the points of
+`Generator.generateDecodeValue` where it decodes an aggregate data structure.
+That way, only minimal amounts of code need to be redone.
+
+Whenever a branch needs to happen, a call shall be generated, a deferred
+implementation request shall be added to a special FIFO queue within the
+generator. After generating data structures and their root decoding functions,
+the generator shall pick away at this queue until no requests remain. The
+generator shall accept new items during this process, so that recursion is
+possible. This is all to ensure it is only ever writing one function at a time
+
+The functions shall take a pointer to a type that accepts any type like (~) the
+destination's base type. We should also probably just call
+`Generator.generateDecodeValue` directly on user defined types this way, keeping
+their public `Decode` methods just for convenience.