On a bytecode representation #9

Open
opened 2022-04-02 21:19:43 +00:00 by mars · 1 comment
Owner

My original intention with Sprite was to make the human-readable, C-like code that the current examples are written in the ONLY way of writing Sprite. However, the more I think about this, the less it makes sense. Here are some reasons I wanted to do this:

  1. it encourages users of the runtime to peer into the code that the runtime is executing and learn more about what the script authors were intending with their code
  2. having a single, standard way of authoring Sprite reduces the complexity of writing it, and provides a common language for all Sprite code

Let's look at some obvious consequences of the runtime having to execute only the "human-readable" representation:

  1. Larger script sizes. Scripts will contain a large amount of whitespace and comments. This is a pretty trivial point of resource consumption considering that it IS only a text format, and the parsing code is really fucking fast. However, script size also costs internet bandwidth, and bandwidth is an important resource in extremely low internet-availability scenarios, such as mobile networks.
  2. Semantic analysis. Type deduction takes a relatively long time to compute. If the same script is ran on ten thousand different computers, and each one needs to solve for the exact same variable types on its own, then that's ten thousand wasted computations, each one costing CPU time and electricity. Sprite is not a cryptocurrency. A bytecode format would explicitly type every value, moving the computational load of determining specific types for variables off of the runtime backend, and onto the code author.

JavaScript, like Sprite, was also intended to be human-readable, and it was intended to be a common language for both website developers and website users to understand what the code is doing. However, JavaScript is in its mid-20s now and has demonstrated some major pitfalls in these design goals:

  1. JavaScript code is heavily obfuscated. JavaScript is dynamically-typed and is much "looser" syntactically (to be polite) than any other scripting language, enabling it to be heavily obfuscated. Most non-hand-authored JavaScript code on the internet now is, in effect, bloated bytecode. Sprite, despite being a statically-typed language, is of course vulnerable to the same kind of obfuscation, and past a certain point, obfuscated Sprite code will become equally unreadable. Anyone wishing to make their Sprite code unreadable could easily accomplish it.
  2. JavaScript is now commonly used as a compilation target by TypeScript and not hand-authored, and that TypeScript code is not immediately available to the user. This is largely a failure of JavaScript's loose typing to adequately support larger web applications, development teams, and greater system complexity. This is less of a concern for Sprite because it allows for very little ambiguity by design.

The biggest reason to use bytecode is that the runtime can be made simpler. The compilation pipeline can be split into two distinct parts: a frontend (parsing/semantics), and a backend (monomorphization/JIT/execution), with the bytecode as an intermediate format. With this split model, parts of the compiler frontend could be modified, as long as it still emits valid bytecode. This could be used to make new authoring languages (like what Kotlin is to the JVM), or to rewrite compilation stages in other languages (like to replace the "canonical" frontend in Rust with a frontend in C, Python, Lua, JavaScript, whatever). The backend could also be modified in a similar way, for example to provide alternate compilation backends, like LLVM, specific ISAs, or an interpreter.

Source code availability must be encouraged and supported by the authors of source code, not the source code itself. Sprite is only a computer program, and is not capable of overthrowing proprietary software on its own. "Human-readability" and software freedom in Sprite scripts cannot be enforced, and should be a non-goal if it means that the runtime needs to sacrifice some of its own readability and flexibility (a la v8, Google's JavaScript runtime).

Ironically, supporting a "human-unreadable" Sprite bytecode enables the runtime itself to become much more accessible to a variety of audiences.

My original intention with Sprite was to make the human-readable, C-like code that the current examples are written in the ONLY way of writing Sprite. However, the more I think about this, the less it makes sense. Here are some reasons I wanted to do this: 1. it encourages users of the runtime to peer into the code that the runtime is executing and learn more about what the script authors were intending with their code 2. having a single, standard way of authoring Sprite reduces the complexity of writing it, and provides a common language for all Sprite code Let's look at some obvious consequences of the runtime having to execute only the "human-readable" representation: 1. Larger script sizes. Scripts will contain a large amount of whitespace and comments. This is a pretty trivial point of resource consumption considering that it IS only a text format, and the parsing code is really fucking fast. However, script size also costs internet bandwidth, and bandwidth is an important resource in extremely low internet-availability scenarios, such as mobile networks. 2. Semantic analysis. Type deduction takes a relatively long time to compute. If the same script is ran on ten thousand different computers, and each one needs to solve for the exact same variable types on its own, then that's ten thousand wasted computations, each one costing CPU time and electricity. Sprite is not a cryptocurrency. A bytecode format would explicitly type every value, moving the computational load of determining specific types for variables off of the runtime backend, and onto the code author. JavaScript, like Sprite, was also *intended* to be human-readable, and it was *intended* to be a common language for both website developers and website users to understand what the code is doing. However, JavaScript is in its mid-20s now and has demonstrated some major pitfalls in these design goals: 1. JavaScript code is heavily obfuscated. JavaScript is dynamically-typed and is much "looser" syntactically (to be polite) than any other scripting language, enabling it to be heavily obfuscated. Most non-hand-authored JavaScript code on the internet now is, in effect, bloated bytecode. Sprite, despite being a statically-typed language, is of course vulnerable to the same kind of obfuscation, and past a certain point, obfuscated Sprite code will become equally unreadable. Anyone wishing to make their Sprite code unreadable could easily accomplish it. 2. JavaScript is now commonly used as a compilation target by TypeScript and not hand-authored, and that TypeScript code is not immediately available to the user. This is largely a failure of JavaScript's loose typing to adequately support larger web applications, development teams, and greater system complexity. This is less of a concern for Sprite because it allows for very little ambiguity by design. The biggest reason to use bytecode is that the runtime can be made simpler. The compilation pipeline can be split into two distinct parts: a frontend (parsing/semantics), and a backend (monomorphization/JIT/execution), with the bytecode as an intermediate format. With this split model, parts of the compiler frontend could be modified, as long as it still emits valid bytecode. This could be used to make new authoring languages (like what Kotlin is to the JVM), or to rewrite compilation stages in other languages (like to replace the "canonical" frontend in Rust with a frontend in C, Python, Lua, JavaScript, whatever). The backend could also be modified in a similar way, for example to provide alternate compilation backends, like LLVM, specific ISAs, or an interpreter. Source code availability must be encouraged and supported by the authors of source code, not the source code itself. Sprite is only a computer program, and is not capable of overthrowing proprietary software on its own. "Human-readability" and software freedom in Sprite scripts cannot be enforced, and should be a non-goal if it means that the runtime needs to sacrifice some of its own readability and flexibility (a la v8, Google's JavaScript runtime). Ironically, supporting a "human-unreadable" Sprite bytecode enables the runtime itself to become much more accessible to a variety of audiences.

An immediate advantage of bytecode being the representation is that it would make it easier to deal with code generation; I can easily imagine doing stuff like having a node based visual editor that outputs sprite bytecode (that’d be possible with human readable code as well but the process would be simpler to develop tooling for).

Another advantage of bytecode's portability besides simple storage space / bandwidth / etc. is that it'd be easier to implement memory budgets / etc. (depending on implementation ofc); for instance, the host app could simply be like "oh, I see you're giving me a certain amount of Sprite bytecode, I can work with that" with far less analysis involved.

An immediate advantage of bytecode being the representation is that it would make it easier to deal with code generation; I can easily imagine doing stuff like having a node based visual editor that outputs sprite bytecode (that’d be possible with human readable code as well but the process would be simpler to develop tooling for). Another advantage of bytecode's portability besides simple storage space / bandwidth / etc. is that it'd be easier to implement memory budgets / etc. (depending on implementation ofc); for instance, the host app could simply be like "oh, I see you're giving me a certain amount of Sprite bytecode, I can work with that" with far less analysis involved.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mars/sprite-rs#9
No description provided.