Add analyzer README.md

2024-02-10 14:47:47 -05:00 · 2024-02-10 14:47:47 -05:00 · eacde3a4f9
commit eacde3a4f9
parent 21e6fb94a1
1 changed files with 89 additions and 0 deletions
--- a/analyzer/README.md
+++ b/analyzer/README.md
@ -0,0 +1,89 @@
+# analyzer
+
+## Responsibilities
+
+- Define syntax tree type that contains entities
+- Turn streams of tokens into abstract syntax tree entities
+
+## Organization
+
+The entry point for all logic defined in this package is the Tree type. On this
+type, the Analyze() method is defined. This method checks the semantic
+correctness of an AST, fills in semantic fields within its data structures, and
+arranges them into the Tree.
+
+Tree contains a scopeContextManager. The job of scopeContextManager is to manage
+a stack of scopeContexts, which are each tied to a function or method that is
+currently being analyzed. In turn, each scopeContext manages stacks of
+entity.Scopes and entity.Loops. This allows for greedy/recursive analysis of
+functions and methods.
+
+## Operation
+
+When the analyze method is called, several hidden fields in the Tree are filled
+out. Tree.ensure() instantiates data that can persist between analyses, which
+consists of map initialization and merging the data in the builtinTypes map into
+Tree.Types.
+
+After Tree.ensure completes, Tree.assembleRawMaps() takes top-level entities
+from the AST and organizes them into rawTypes, rawFunctions, and rawMethods. It
+does this so that top-level entites can be indexed by name. While doing this, it
+ensures that function and type names are unique, and method names are unique
+within the type they are defined on.
+
+Next, Tree.analyzeDeclarations() is called. This is the entry point for the
+actual analysis logic. For each item in the raw top-level entity maps, it calls
+a specific analysis routine, which is one of:
+
+- Tree.analyzeTypedef()
+- Tree.analyzeFunction()
+- Tree.analyzeMethod()
+
+These routines all have two crucial properties that make them very useful:
+
+- They refer to top-level entities by name instead of by memory location
+- If the entity has already been analyzed, they return that entity instead of
+  analyzing it again
+
+Because of this, they are also used as accessors for top level entities within
+more specific analysis routines. For example, the routine Tree.analyzeCall()
+will call Tree.analyzeFunction() in order to get information about the function
+that is being called. If the function has not yet been analyzed, it is analyzed
+(making use of scopeContextManager to push a new scopeContext), and other
+routines (including Tree.analyzeDeclarations()) will not have to analyze it all
+over agian. After a top-level entity has been analyzed, these routines will
+always return the same pointer to the one instance of the analyzed entity.
+
+## Expression Analysis and Assignment
+
+Since expressions make up the bulk of FSPL, expression analysis makes up the
+bulk of the semantic analyzer. Whenever an expression needs to be analyzed,
+Tree.analyzeExpression() is called. This activates a switch to call one of many
+specialized analysis routines based on the expression entity's concrete type.
+
+Much of expression analysis consists of the analyze checking to see if the
+result of one expression can be assigned to the input of another. To this end,
+assignment rules are used. There are five different assignment modes:
+
+- Strict: Structural equivalence, but named types are treated as opaque and are
+  not tested. This applies to the root of the type, and to types enclosed as
+  members, elements, etc. This is the assignment mode most often used.
+- Weak: Like strict, but the root types specifically are compared as if they
+  were not named. analyzer.ReduceToBase() is used to accomplish this.
+- Structural: Full structural equivalence, and named types are always reduced.
+- Coerce: Data of the source type must be convert-able to the destination type.
+  This is used in value casts.
+- Force: All assignment rules are ignored. This is only used in bit casts.
+
+
+All expression analysis routines take in as a parameter the type that the result
+expression is being assigned to, and the assignment mode. To figure out whether
+or not they can be assigned, they in turn (usually) call Tree.canAssign().
+Tree.canAssign() is used to determine whether data of a source type can be
+assigned to a destination type, given an assignment mode. However, it is not
+called automatically by Tree.analyzeExpression() because:
+
+- Determining the source type is sometimes non-trivial (see
+  Tree.analyzeOperation())
+- Literals have their own very weak assignment rules, and are designed to be
+  assignable to a wide range of data types