4 How to Parse
Sasha Koshka edited this page 2022-10-12 07:04:38 +00:00

The processces of parsing and lexing are implemented in the compiler in similar ways. This page details the specifics of this general process, and how to extend it with new features. Note that compiler is not perfectly implemented—there are places in which these practices are not exactly followed to a T.

Operation Structs

Operation structs (namely ParsingOperation and LexingOperation) contain information about a process that is being carried out. Operation structs store information about:

  • The raw source material being operated on
  • State information, such as the operation's position in that raw source material
  • The results of the operation, which are continuously appended to as the operation continues

These structs are not used outside of their respective modules, and are created internally by a public facing function that takes in a path to the source material and returns the operation's result. These public facing functions call a member function of the operation struct, which then carries out the operation.

Operation Methods

All methods of an operation struct are private, since they are called by the aforementioned public facing function. There is one main method that is named after the kind of operation being done (e.g. lexingOperation.tokenize(), parsingOperation.parse()). This method runs through the source material, and should generally call dedicated methods when it encounters something that needs to be parsed/lexed. An example of one of these methods would be lexingOperation.tokenizeNumber().

Each of these methods must:

  1. Start on the first element (whether that be a rune or a token) of what they need to read
  2. Ensure that the first element is a valid start for the expected result of this method
  3. Create (if need be) and initialize the variable containing the result with information about the operations location
  • For example, a method meant to tokenize a number should fail if the first thing it encounters is a symbol. This is entirely redundant in most cases and does not affect output at all, but should reduce the likelihood of bugs.
  1. Run through until the end, gathering the result
  2. Advance the operation struct to the element after the end, if need be
  • For example, a method meant to tokenize a string should advance the parser to the rune after the closing quotation. This ensures that methods can be properly called one after another.
  • If the method consumes an entire line up until a newline, it should advance to the beginning of the next one before exiting.

Methods can return their results in one of two ways. The first, most robust, and most obvious way is for them to simply return the result. Sometimes, however, this option is not preferable. In this case, the method should take in a parameter called into, which should be a pointer of the "container" in which to put the result. For example, the method parsingOperation.parseEnumMembers() takes in a pointer to an enum struct where the enum members need to go. As it parses each enum member, it adds it to the enum section's member slice. This approach is primarily for methods that parse multiple things, and in order to ensure good error handling, it is advisable to handle the parsing of individual items in a separate method.