From de3e0d28361d278d2120d45b0b70b95abdf96714 Mon Sep 17 00:00:00 2001 From: Vern Paxson Date: Sat, 4 Jan 2025 14:10:35 -0800 Subject: [PATCH] first draft of ZAM internals documentation --- src/script_opt/ZAM/INTERNALS.md | 1603 +++++++++++++++++++++++++++++++ 1 file changed, 1603 insertions(+) create mode 100644 src/script_opt/ZAM/INTERNALS.md diff --git a/src/script_opt/ZAM/INTERNALS.md b/src/script_opt/ZAM/INTERNALS.md new file mode 100644 index 0000000000..79f3c392df --- /dev/null +++ b/src/script_opt/ZAM/INTERNALS.md @@ -0,0 +1,1603 @@ +

+ +ZAM Optimization: Internals + +

+ +[_Overview_](#overview) - +[_Source Code_](#layout) - +[_High Level_](#high-level) - +[_Inlining_](#inlining) - +[_AST Profiling_](#ast-profiling) - +[_AST Reduction_](#reduction) - +[_AST Analysis_](#ast-analysis) - +[_AST Optimization_](#ast-opt) - +[_Properties of BiFs_](#bif-properties) - +[_Zeek Abstract Machine_](#zeek-abstract-machine) - +[_Compiling to ZAM_](#compile-to-zam) - +[_Finalizing ZAM Compilation_](#finalizing) - +[_ZAM Execution_](#execution) - +[_Troubleshooting_](#trouble) - +[_BTests_](#btests) - + + +

+ +
+ + +## Overview + +### How Things Work Without ZAM +During initializing, Zeek parses the provided scripts to translate them +into Abstract Syntax Trees (ASTs), with one AST associated with each +*function* (a term we use to also include single instances of an event handler +or hook body). The ASTs consist of two basic types of nodes, "statements" +(derived from the `Stmt` class) and "expressions" (derived from `Expr`). The +root of a function's AST is a `Stmt` node, very often in particular a `StmtList` +node which is a statement comprised of a list of child `Stmt` nodes. + +`Stmt` nodes have `Execute` methods that when called perform the operation +corresponding to a script statement. Often, this execution in turn evaluates +`Expr` nodes, which is done via `Eval` methods. The `Eval` methods return `Val` +objects, often newly constructed, which are used to hold the full range +of Zeek data type values (e.g., `interval` or `table`). Global script +variables have a single `Val` associated with their _identifier_ +(an `ID` object). Local variables reside at a specific position in a `Frame` +object associated with an instance of the function's execution. + +In Zeek's default operation, every time a function is called - either due +to an event being generated, another function calling the function, or +in some situations the event engine calling a function directly - the +arguments, which are (pointers to) `Val` objects, are copied into the `Frame` at specific +locations, and then Zeek's *script interpreter* evaluates the AST by +calling its top-level node's `Execute` method. That call in turn recursively +calls the `Execute` methods of the node's children, some of which will then +call expression `Eval` methods; those will often recurse, and can also lead +to further function calls. + +### ZAM's General Approach + +This approach of using an interpreter to recursively execute the AST has +the benefit of being easy to implement and extend. However, it is very +heavy in terms of C++ method calls and `Val` object creation (and dynamic memory allocation) and manipulation, +which leads to inefficient performance. + +**ZAM** (Zeek Abstract Machine) script optimization aims for higher script +performance by switching execution to a model that, while still interpreted, +is much lower level, and as a consequence can avoid nearly all function +calls (other than initial ones) and greatly reduce the creation and manipulation +of `Val` objects. It does so by translating the ASTs into operations in a +custom byte-code-style instruction set. The function body's AST is replaced +by an array of these instructions. When executing the function, a program +counter (PC) repeatedly indexes the array to locate the next +instruction to evaluate. This approach reduces the use of recursive +method calls to instead be a loop that fetches an instruction to execute, +dispatches it via a huge C++ `switch` statement, then +typically increments the PC (but could instead assign it to a value +to effect a branch), and continues until a `return` instruction executes, +or the PC reaches the end of the instruction array. + +The user activates ZAM optimization by specifying `-O ZAM` on the command +line (or setting the `$ZEEK_ZAM` environment variable to a non-zero value). + +Overall, ZAM's design has been governed by the goals of (1) simple +workflow for the user (just adding an argument to the command line, without +any need for separate builds) and (2) full interoperability with +non-optimized scripts and event engine internals. + +An alternative form of script optimization, `-O gen-C++`, translates +the original ASTs into C++ code, which then is compiled directly +into Zeek. (Thus, this form has a more complex workflow than the simplicity +of simply specifying `-O ZAM` on the command line.) The implementation of +this form of script optimization shares some functionality with that for +ZAM - option processing, AST profiling, and some properties computed +for statement bodies and globals - but for the most part the two +implementations are separate. + + + +## Source Code Layout + +The elements of ZAM's implementation appear in several places: + +* `src/` + The most significant script optimization elements at the top + level are extensions to the `Expr` and `Stmt` classes to add + methods for analyzing the properties of AST nodes and for + transforming their representation in various ways for [AST + reduction](#reduction) (see below). There are also minor top-level changes + to Zeek's option processing to control the usage of script + optimization. + +* `src/script_opt/` + The overall drivers for script optimization (both ZAM and + compiling-to-C++), including option processing, and classes for + analyzing and manipulating ASTs (see [AST Reduction](#reduction) below). Not + much of this is ZAM-specific, although the AST manipulations are + currently only used by ZAM. + +* `src/script_opt/ZAM/` + The ZAM compiler and associated ZAM-specific code. This processing + is done once ASTs are fully reduced, per [AST Reduction](#reduction). Each + instance of the compiler takes a function and its associated AST + body and replaces the AST body with a `ZBody` object that interprets + the resulting byte code. For event handlers and hooks with multiple + bodies, each body is compiled separately (however, see [event + handler coalescence](#coal) below). + +* `src/script_opt/ZAM/OPs/` + The collection of templates describing low-level (byte code) ZAM + operations. These are processed by `gen-zam` to create various C++ + include files used for ZAM compilation and execution. There's a + lengthy [`README.txt`](https://github.com/zeek/zeek/blob/master/src/script_opt/ZAM/OPs/README.txt) describing the templating language. + +* `src/script_opt/ZAM/maint/` + Scripts for ZAM maintenance. Currently, there's just one, for + finding Zeek functions that are known to the event engine. These + are always treated as potentially recursive (a distinction used + by ZAM to determine whether it needs to avoid certain optimizations). + See the associated [`README`](https://github.com/zeek/zeek/blob/master/src/script_opt/ZAM/maint/README) for more information. + +* `auxil/gen-zam/` + The `gen-zam` utility reads in ZAM operation templates and generates + the C++ files needed to implement the instructions. + + + +## High-level compilation process + +ZAM compilation proceeds along the following steps: + +1. Zeek parses all of the scripts and translates them to ASTs, as normal. +Upon finishing each function, it's noted for potential script optimization +analysis (via a call to `analyze_func()`); for event handlers and hooks +with multiple bodies, this is done per-body. Similarly, lambda expressions +are remembered (via a call to `analyze_lambda()`), including the internal +lambdas used by `when` expressions (`analyze_when_lambda()`). If the scripts +include any global statements, those are treated as an additional function +(`analyze_global_stmts()`). + +1. Zeek's setup calls `analyze_scripts()` to now collectively analyze/compile +the various functions/events/hooks. + +1. Non-recursive function calls are recursively [_inlined_](#inlining) (as long as not too +large), to eliminate function call overhead and facilitate optimization +across function calls. + +1. Each function body's AST is [_profiled_](#ast-profiling) to determine a large number +of properties (such as which locals and globals it uses, a list of all +of its `Expr` and `Stmt` AST nodes, constants, type aliases, etc.). + +1. For each function being optimized, its AST is [_reduced_](#reduction) to a simplified +form suitable conducive to analysis and directly translatable into ZAM +operations. + +1. The reduced AST is [_analyzed_](#ast-analysis) to enable [_optimization_](#ast-opt) such as (among other things) by propagating +constants, eliminating recomputation of common subexpressions, and removing +assignments that won't ultimately be used. + +1. The optimized AST is [compiled](#compile-to-zam) into an initial ZAM program. Most +AST `Expr` nodes translate into a single ZAM operation, while `Stmt` +nodes might result in a few (but not a lot of) operations. Control flow +at this point is in terms of low-level _gotos_. + +1. ZAM's [low-level optimizer](#finalizing) processes the initial program for further +improvements. These include collapsing chains of branches, eliminating +unused assignments and other dead code, and coalescing variables whose +usage does not overlap into a shared `ZVal` frame element. + +1. At this point, the function's body (which is a single `Stmt` node, though +often that `Stmt` is a `StmtList` that includes multiple `Stmt`s) is +replaced by a `ZBody` node. `ZBody`s are a subclass of `Stmt` so the change +works in a manner fully compatible with non-optimized scripts. + +1. After all scripts have been compiled and the global statements executed, +`clear_script_analysis()` recovers the considerable additional memory +allocated during script optimization. + +Below there's a section for each of these steps other than the first two +and the last (given their simplicity), along with a section for [how `ZBody` +execution works](#execution). + + +## Inlining + +Zeek scripts represent function calls with `CallExpr` AST nodes. +To (greatly) reduce the number of calls, the [_inliner_](https://github.com/zeek/zeek/blob/master/src/script_opt/Inline.cc) replaces most instances of such nodes +with a special `InlineExpr` node. The exceptions are: + +* _Built-in functions_ (BiFs). A possible future optimization would +be to extend the `bifcl` BiF compiler to generate versions of BiFs +that ZAM could call directly, rather than needing to go through the +fairly heavyweight `Func::Invoke()` interface. + +* _Recursive functions_. Inlining these could of course lead to infinite +expansion. (In principle they could be inlined up to a point, but this +doesn't seem worth the effort.) The inliner computes the transitive closure +of which-function-calls-which in order to identify both direct and indirect +recursion. It turns out that recursive Zeek functions are quite rare, +though there are a few. + +* _Functions that make indirect calls_ (i.e., via a `function` variable or +value) are presumed to be potentially recursive and are skipped. These +are likewise rare. + +* _Functions called by the event engine_ are likewise skipped, +as these might be recursive. They could in fact be inlined (though it's +unlikely they will be also called by script functions) since doing so won't +blow them up, but again it doesn't appear to be worth the effort, and +it's useful for other optimizations to flag them as potentially recursive. + +* _Event handlers_ are irrelevant for inlining since they cannot be directly +called by a script. + +* _Hooks_ in principle _could_ be inlined, but since they contain multiple +bodies this is a bit tricky, so presently they aren't. (However, see the following discussion.) + + +Before proceeding, the inliner identifies event handlers with multiple +bodies and _coalesces_ them into a single body (subject to some sizing +constraints), essentially replacing the collection with a single body +that "calls" each body in turn. + +Inlining the calls of each candidate function proceeds recursively. `Stmt` +AST nodes support a (new) `Inline` method that traverses the node's children +to identify any `CallExpr` nodes. These are replaced with `InlineExpr` +nodes and the result recursively expanded to continue inlining any +callees-of-the-callees. When a call is inlined, the AST body of the inlined +function is _duplicated_ (rather than referred to directly), using the (new) `Duplicate()` +method that AST nodes now support. Doing so is important, as otherwise +optimizations that are valid for one call to the inlined function can alter +other instances of calls to that function where the optimization is not +in fact valid. + +When inlining an AST, variables in the inlined function's body have _`.n`_ appended to their names, where `n` reflects the inlining depth. This is done in order to avoid name collisions with outer variables that use the same name. + +The inliner tracks the complexity (in terms of number of AST nodes) +of a function body as it's progressively inlined, and stops if the +size exceeds a threshold. Doing so helps reduce the time needed for +ZAM compilation, as without this cap, some functions can get quite huge +and subsequent analyses running on them will take a very long time. + +The inliner also tracks those functions that have had all of their +instances (locations where they are called) inlined. There's no need to +compile these functions further, since they won't ever be called directly, +and as this includes a large number of functions, doing so is a +significant gain for ZAM compilation time. + +Note that the result of inlining a function continues to be an AST that +Zeek's interpreter can execute. Because of this you might think that +we can speed up interpreted script execution by just using the inliner, +but it turns out to not yield much direct gain by itself, only when +fully compiled to ZAM. + +For debugging and maintenance purposes, you can explicitly enable inlining +using `-O inline`. If that's the only script-optimization-related +option given on the command line then script bodies will be inlined but +otherwise unmodified. You can also use `-O no-inline` to turn inlining _off_ +but otherwise perform script optimization, which is helpful for determining +whether or not a problem is due to inlining. + + +## AST Profiling + +Each function body is profiled by recursively traversing its AST to inspect +all of its elements. The full collection of function bodies is then +_globally_ profiled (such as what are all the globals and types used). +See the extensive comment at the beginning of [`src/script_opt/ProfileFunc.h`](https://github.com/zeek/zeek/blob/master/src/script_opt/ProfileFunc.h) +for a discussion of various considerations for constructing these profiles. + + +## AST Reduction + +The goal of AST _reduction_ is to transform an AST into a form where each +element is as simple as it can be. For expressions, this means that either +it's simply a constant or a variable (referred to as _singletons_), +or it's an assignment to an operation all of whose operands are singletons. +Statements are in reduced form if any expressions they refer to are either +just singletons, or in some cases very simple expressions (such as reduced +conditional expressions for `if` statements). Note that the end result of +AST reduction is still an AST executable by the interpreter; however, it generally runs inefficiently +since it has many more assignments and local (temporary) variables than the original. + +Reduced ASTs are much closer in structure to ZAM's low-level instructions - +which for the most part are in terms of a few variables and possibly a +single constant - and thus conducive to translation into the byte code. +In addition, reduced ASTs are much easier to reason about (such as whether +an occurrence of given expression can be replaced by a variable previously +computed to hold that same expression). + +To transform expressions into a form that can be represented by singletons requires creating +numerous temporary variables to hold intermediary values. At this stage, +temporaries are treated like other local variables (and allocated their +own slot in the `Frame` object used when the interpreter executes a +function body). They have one important property, which is they are only +assigned a value once, which makes it easier to reason about the safety +of their subsequent usage. Later optimization has stages to determine when +temporaries can be _doubled up_ and, essentially, reused. + +Reduction also provides an opportunity to *fold* expressions with constant +operands into a single constant value, and to apply simplifications when +some-but-not-all of an expression's operands are constant, such as reducing +`x + 0` to `x`. + +Reduction occurs recursively: AST node classes provide `Reduce()` methods +that generally first reduce their operands and then their own representation. +The process is done in the context of a `Reducer` object. It manages +temporary variables (including additional ones used to support inlining) +and a number of AST optimizations. Reduction proceeds with an initial recursive +pass over a function body's AST to transform it into reduced form, and then +a second optimization pass that leverages analysis of the reduced form to +identify opportunities for constant propagation, assignment elimination, and +the like ([see below](#ast-opt)). + +Along with the `Reduce()` method, AST node classes provide additional +reduction-related methods. `IsReduced()` returns true if the given node +is in a reduced form. If the node isn't, it calls `NonReduced()`, which +remembers the node in a global variable that can be inspected when debugging +problems where ASTs fail to fully reduce. `HasReducedOps()` returns true +if all of the operands of a node are reduced; this is potentially different +than `IsReduced()` because the latter has the form of either a singleton or +an assignment to an expression with singleton operands, whereas +`HasReducedOps()` only refers to the operands, without having an +assignment. So, for example, `tmp = a + b` is in reduced form, while +`a + b` has reduced operands. + +Some AST nodes transform when reduced into fully different representations. +For example, `++x` reduces to `x = x + 1` - changing from an Increment +node to an assignment to an Add node. The predicate `WillTransform()` +identifies these. In addition, some nodes transform when present in +conditionals, but not otherwise. For example, in isolation `x && y` +will reduce to `tmp = x && y`, but in a conditional such as `if ( x && y ) S` +it will transform into `if ( x ) { if ( y ) S }` (in order to preserve the +property of not even evaluating `y` if `x` is false). These sorts +of constructs are flagged using the `WillTransformInConditional()` predicate. + +Note that the process of reduction generally only creates new nodes when a +node changes its type (such as the `++x` example above), or for assigning +an expression's value to a temporary. Otherwise, nodes are altered in place, +in order to diminish the memory allocation/deallocation churn that would +otherwise result. + +When new nodes are created, however, it's important that they have the +same _location_ information associated with them as that for the original +nodes from which they're derived. This aids both pinpointing warnings and +error messages, and enables script-line-level ZAM profiling (not yet documented). +To make this easy to implement, a helper function `with_location_of()` +sets the location of one (new) node to that of an existing node. + +Along with adding new AST methods to support reduction, script optimization +also introduces new types of AST nodes that are not needed by regular +interpreted execution: + +* `InlineExpr` nodes replace `CallExpr` nodes during inlining, and these transform into `CatchReturnStmt` nodes during reduction. ([See below](#catchreturn).) + +* `CheckAnyLenStmt` nodes provide a run-time check when assigning multiple variables to a single `any` value (a non-obvious language feature!). It + also has a counterpart, `AnyIndexExpr`, to capture the assignment + itself. + +* `AppendTo` nodes replace `x += y` expressions when `x` is a `vector` + and therefore the expression is an append operation rather than + an add-to operation. + +* `IndexAssignExpr` nodes replace assignments of the form `x[y] = z` + with a three-operand expression. + +* `FieldLHSAssignExpr` similarly replace assignments of the form + `x$field = y`. + +* `AssignRecordFieldsExpr` nodes are an internal optimization used + when a series of assignments updates a number of fields in one + record with values taken from another record. + +* `AddRecordFieldsExpr` nodes are an internal optimization used + when a series of `+=` assignments updates a number of fields in one + record with values taken from another record. + +* `ConstructFromRecordExpr` nodes are an internal optimization when + a record constructor directly uses a number of fields from the + same type of record. + +* `CoerceFromAnyVecExpr` nodes are added when a `vector of any` + vector is used in a context where a _concrete_ vector is expected. + +* `ScriptOptBuiltinExpr` nodes replace certain scripting idioms + with a custom instruction (that can then be translated into a + single ZAM operation). See the discussion of [optimization](#ast-opt) + below for further details. + +* `NopExpr` is a dummy expression used as a placeholder for expressions + that have been elided as unnecessary. + + +* `CatchReturnStmt` is used for inlining. It executes a block of code until it executes a `return` statement (or comes to the end of the block), and, if appropriate, assigns to a special return-value variable if the `return` includes an expression. + +For maintenance, you can run with _only_ transform-to-reduced-form turned on by +specifying `-O xform`. (You can combine this with other options, too, like +`-O inline`.) + + +## AST Analysis + +After transforming the AST to reduced form comes the AST-level optimization +phase. This begins with a traversal of the reduced AST to compute information +about the usage of the variables (_identifiers_, corresponding to `ID` objects) +present in the AST, including newly introduced temporaries. + +Two key properties for AST optimization are analyzing a variable's [_reaching +definitions_](#reaching) and a statement's [_use definitions_](#use-def) +(or "use-defs"). +Note that these are standard notions for compiler optimization, as is the +related notion of _confluence_ discussed below - they're not home-grown +concepts. The same is true for the notion of converting the AST to _reduced_ +form. + +We now discuss each notion in turn, and then how they're leveraged for optimization. + + +### Reaching Definitions + +A variable's reaching definitions reflect where in the AST the variable's +value is one of: (1) unassigned, (2) definitely assigned to a given value, (3) +definitely assigned but to an uncertain value, (4) _maybe_ assigned but +also maybe not. These are computed in terms of their state _prior_ to +execution of a statement. For example, in this code block: + +``` + 1. local a: int; + 2. local b: int; + 3. a = x + y; + 4. if ( foo_condition ) + 5. b = a * a; + 6. a = x + z; + 7. print a, b; + 8. print x + y; +``` + +Before the execution of statements 1 and 2, the values of `a` and `b` are +unassigned, and the same holds before statement 3. Before the execution +of statement 4 `a` is _definitely assigned_ (and `b` is unassigned), and +`a`s associated expression is `x + y`. The same holds at the beginning +of statement 5. (In this and later examples for simplicity we confine our +discussion to `a` and `b`, and don't expand the discussion to other variables +like `x`.) + +At the beginning of statement 6, `a` and `b` are both _definitely assigned_, +with `b` having an associated expression of `a * a` (note that here it +matters _which_ reaching-definition of `a` is reflected in the expression +`a * a`). At the beginning of statement 7, `a` is +definitely-assigned-but-to-an-uncertain-value, and `b` is maybe-assigned +(so we can flag a potential used-but-not-set usage error in the print +statement). The same holds at the beginning of statement 8. Because of +that, the optimizer can tell that it's not safe to replace `x + y` with +`a`, even though `a` was previously assigned to that expression. If we +removed statement 6, however, then `a`s reaching definition at statement +8 would be definitely-assigned with an associated expression of `x + y`, +and we _could_ transform statement 8 into `print a`. + +While in the above example it's straightforward how to compute reaching +definitions for `a` and `b`, this becomes trickier in the face of loops. +For example, if the above were instead: + +``` + 1. local a: int; + 2. local b: int; + 3. a = x + y; + 4. for ( [x2, y2] in my_tbl ) + 5. b = a * a; + 6. a = x2 + y2; + 7. print a, b; + 8. print x + y; +``` + +then the reaching definition of `a` before statement 4 is +definitely-assigned-but-to-an-uncertain-value, and for `b` it is +maybe-assigned. This is because we may be arriving at the beginning of +the `for` loop at statement 4 for a second time, having already gone through +the loop once. Hence, `a` at the beginning of statement 4 might have the +assignment value of `x + y`, or it might have the value of `x2 + y2`. `b` +might be unassigned, or it might be assigned to `a * a` (and that expression +itself reflects an uncertain value of `a`). + +Computing reaching definitions in the presence of loops, switches, and +if-else structures is done by introducing the notion of control flow +_confluence_. Confluence refers to control flow either splitting from one +point into multiple paths (such as due to if-else or switch structures), +or multiple paths re-converging at a single point (such as due to loops, +`break`, `next`, `fallthrough`, and also at the end of if-else and switch +structures). The rules for propagating reaching definitions through +confluence regions are fairly simple: on a split, the reaching definitions +of a variable travel down both paths. At a merge, if all paths coming into +the merge have the same reaching definition for a variable, it retains its +definitely-define status (though perhaps with a different associated +expression, as long as all of the inbound paths agree on that expression). +If all paths have _some_ reaching definition, but some of these differ, +then the new reaching definition is +definitely-assigned-but-to-an-uncertain-value. If some of the paths have +_no_ reaching definition while others do have a definition, then the +variable is maybe-defined. If none of them do, then it's unassigned. + +The one trickiness here is that loops need to be visited multiple times: +first to propagate the reaching definitions upon the initial arrival at +the start of the loop, and then again to merge into those the reaching +definitions at active at the end of the loop, and then to re-propagate +those updated values through the loop's body. + +The reaching-definitions analysis also provides the basis for Zeek's `-u` +_used without definition_ and _possibly used without definition_ warnings. (Another version of the former is generated by ZAM's low-level instruction analyzer.) + + + +### Use-Defs + +A statement's use-defs refer to all of the variable definitions that are +used _after_ execution of that statement, where "after" means in a +control-flow sense. Use-defs are computed by conducting a _reverse_ +traversal of the AST, which we will illustrate here using the previous example: + +``` + 1. local a: int; + 2. local b: int; + 3. a = x + y; + 4. for ( [x2, y2] in my_tbl ) + 5. b = a * a; + 6. a = x2 + y2; + 7. print a, b; + 8. print x + y; +``` + +After statements 8 and 7, neither `a` nor `b` is used. +We then back up to statement 4, since that's the previous statement +in the outer block. After 4, `a` and `b` are both used. We then propagate +that information to the analysis of statement 6: after it, both `a` and +`b` are used. However, after statement 5, only `b` is used, since `a` +was reassigned at statement 5. _Before_ statement 5, `a` is used (as +the RHS of the assignment), but `b` is not. + +We then need to revisit the beginning of the loop (statement 4), computing +_confluence_ of `a`-is-used with `a`-and-`b`-are-used, which here is done +by union operations, so we keep `a`-and-`b`-are-used for statement 4. +This means that after statement 3, `a`-and-`b` are used, but after statement +2 (and statement 1), only `b` is used - which looks weird, because it's +unassigned at that point, but the notion is that its current value can be +subsequently used, whether assigned-or-not, while the current value of `a` +after statement 2 will _not_ be used, since it's reassigned right after +statement 2. + +The use-def analysis also provides the basis Zeek's `-u` +_assignment unused_ warning. + + + +## AST Optimization Elements + +Once the above properties are computed, the `Reducer` object makes a +second reduction pass over the AST. This has several elements. + +### Constant propagation + +For an assignment of the form `x = c`, where `c` is a constant, +the optimizer replaces any subsequent use of exactly that instance +of `x` (and not any other possible instances of `x`) with `c`. (Mechanistically, +this is done by AST nodes in their `Reduce()` methods calling +the `Reducer` object's `UpdateExpr()` method.) This change +may then provide opportunities for either _folding_ (evaluating +at compile time any expression for which all of the operands +are constants, and replacing the expression's AST node with +a new one that just holds the resulting constant) or further +constant propagation, for example if later the assignment +`y = x` appears. That would be transformed into `y = c` and then +subsequent uses of `y` also are potential opportunities for +constant propagation. + +Note that a `const` global can also be treated as a constant +in this context, since there's no way to alter its value during +script execution. + + + +### Removal of unused assignments + +If there's an assignment of the form `x = y` (where `y` might be +a constant or an expression rather than a simple variable), and +the use-defs reveal that there's no subsequent use of `x`, +then the optimizer removes the assignment. (Note that at this point +perhaps there's no use of `y` after its earlier assignment, and if not then it +can be removed, too.) You might think that such assignments +are rare, but these situations (as well as the next one) actually +can be quite common for the temporary variables introduced during +AST reduction. + +### Assignment cascades + +A sequence like `x = a; y = x` can be replaced by `y = a`. +This might render `x` removable per the above discussion. + +### Dead code removal + +If a statement cannot be reached, it's safe to remove it. +The main benefit of doing so is that it can unearth what are +now [_unused assignments_](#unused-assignments) (see above). + +### Common subexpression elimination (CSE) + +CSE is a powerful optimization that identifies expressions +that do not require computing because their value is already +available in a variable. It is especially apt for optimizing +Zeek scripts because script writers often repeat expressions +rather than factor their code, for example in sequences like: + +``` + c$conn$x = ... + c$conn$y = ... +``` + +The interpreter will compute `c$conn` twice, but the optimizer +will transform this to + +``` + tmp = c$conn; + tmp$x = ... + tmp$y = ... +``` + +and avoid the second computation. + +Given Zeek's semantics, determining that it is _safe_ to substitute +a previous computation of an expression turns out to be quite +complex. To illustrate, consider the following code fragment: + +``` + x = r1$f1 + r2$f2; + ... + y = r1$f1 + r2$f2; +``` + +For it to be safe to transform the final statement into `y = x` +first requires that the _reaching definitions_ of `r1` and `r2` +are identical at the final statement. In addition, the use-def +of `x` at the statement needs to come (only) from the first +statement. + +Even that is not enough, however. Consider the following larger +sequence: + +``` + x = r1$f1 + r2$f2; + z = r1; + z$f1 = something_else; + ... + y = r1$f1 + r2$f2; +``` + +Because `z` is a _shallow_ copy of `r1`, changing `z`s `$f1` field +also changes `r1`'s. Note that it's not enough to look for assignments +to `r1` like at the 2nd line in the example; suppose `z` is a record of the same +type as `r1` that's passed in as a parameter (or retrieved from a +table, or as a subrecord of another record, or ...). Then it could +_still_ be an alias for `r1`. In general, if the expression of +interest (`r1$f1 + r2$f2`) contains any _aggregates_ (tables, sets, +record, vectors), then any comparable modification to any +aggregate of the same type renders the CSE replacement potentially +unsafe. + +In addition, consider: + +``` + x = r1$f1 + r2$f2; + noninlined_func(); + ... + y = r1$f1 + r2$f2; +``` + +Perhaps function `noninlined_func` modifies a global record that +happens to be aliased to `r1` - then the CSE is invalid. (Note that +`noninlined_func` might be a BiF. Most of these have no side effects +along these lines, but a few of them might - or might call script-level +functions that do. We discuss BiF properties further [below](#bif-properties).) + +In fact, it's worse than that. Suppose we have: + +``` + x = r1$f1 + r2$f2; + tbl[foo] = something_else; + ... + y = r1$f1 + r2$f2; +``` + +If the table `tbl` has an `&default` attribute, and the value of +`foo` is not in the table, _and_ the `&default` calls a function +rather than simply supplying a constant value, then perhaps that +function modifies a global record that happens to be aliased to +`r1`! (Even worse: maybe that function inserts something into a +_different_ table and _its_ `&default` function modifies such a +global!) + +You might think that given this wide-ranging collection of headaches, +we'd just skip CSE, or only do it in very straightforward situations. +However, it is such a large gain - much of which occurs in +non-straightforward situations - that the optimization framework +does indeed track all of these sorts of considerations. This is +done by a combination of AST profiling (which includes figuring +out things like identifying which aggregate types, if any, `tbl`s +`&default` attribute could alter, either directly or indirectly) +plus a bunch of careful analysis in `CSE.cc` and +`Reducer::ExprValid()` (which [documents](https://github.com/zeek/zeek/blob/9e85a0d27da6b65bb50470cc83df2d4b9bf207eb/src/script_opt/Reduce.cc#L504) a number of considerations). + +### AST Idioms + +The optimizer recognizes several scripting idioms and standard +script-level functions: + +* Conditionals of the form `x > y ? x : y` or `x < y ? x : y` for _maximum_ +or _minimum_. + +* Relationals of the form `|aggr| > 0` (and variants) for testing whether +an aggregate has any elements. + +* Calls to `id_string()` to convert a connection's 4-tuple to a string of +the form `%s:%d > %s:%d`. (These turn out to be an inefficient hot-spot.) + +These are all converted to `ScriptOptBuiltinExpr` AST nodes to enable the +ZAM compiler to compile them to specialized instructions. + + +## BiF Properties + +As mentioned above, in some contexts the optimizer needs to know various +attributes of a given BiF in order to assess any considerations involved +when a script function calls it. The interfaces for inquiring about different +attributes are in `src/script_opt/FuncInfo.h`, with the corresponding code +in `src/script_opt/FuncInfo.cc`. + +The information is in fact slightly broader than just covering BiFs: it +also includes certain standard functions defined as Zeek scripts that +optimization knows how to optimize. + +The various attributes of interest are: + +* Completely free of side effects. Does not alter any script-level state. +(Not a concern if it modifies internal state.) + +* The same, but also doesn't modify any other Zeek state, though might +have side effects such as writing to a disk file. + +* Calls with the same arguments always yield the same results (_idempotency_), +if the call is made after Zeek initialization. + +* The same, but holds for even if the call is made prior to Zeek initialization, +along with a promise that no errors/warnings can be generated. The +distinction here from the previous item is important because calls to these +sorts of functions with constant arguments can be _folded_, whereas the +previous sorts of functions cannot. + +* A script function known to the event engine and potentially called during +its processing. + +* A script function known to ZAM and replaceable by specialized instructions. + +As noted previously, these properties are maintained via a combination of +a script in `src/script_opt/ZAM/maint/` and a special BTest +(`testing/btest/opt/ZAM-bif-tracking.zeek`). + + +## Zeek Abstract Machine + +Before describing how the ZAM compiler transforms reduced, optimized +ASTs into low-level byte code, it's helpful to sketch the execution +target, i.e., just what is ultimately going to be executed. + +As noted above, ZAM-optimized function bodies are represented +by `ZBody` objects, a subclass of `Stmt` AST node. The main parts of +ZAM execution concern the _frame_ that holds the values of local +variables and the _operations_ (instructions) to execute, which we +cover here in turn. + + +### ZAM Values and Their Management + +When executing a function body without script optimization, +the Zeek interpreter uses a `Frame` object to hold a collection +of `ValPtr` smart pointers to dynamically allocated `Val` objects. +Each local variable (including function parameters) is allocated +an _offset_ in the `Frame` object, with lambda _captures_ being +dealt with via special-casing when evaluating the value associated +with a `NameExpr` AST node (each of which corresponds to either a local +variable, a capture, or a global). + +The ZAM _frame_ is analogous but more streamlined: it is a low-level +C++ array (not a `std::array`) of `ZVal`s. Each `ZVal` is a union of +all of the possible low-level Zeek value types (e.g., `count`, `table` ...). +`ZVal`s differ from `ValPtr`s in three important ways: (1) they do not +requiring memory allocation/deallocation for simple types like `count`; +(2) access to their values does not incorporate type-checking, thus it's +faster (but also more dangerous); (3) by itself a given `ZVal` +is _ambiguous_ - it does not carry around any type information, and +thus that must be explicitly incorporated. The most significant implication +of this last difference is that ZAM execution needs to include explicit +memory management, rather than getting it for free as is the case +with `ValPtr`s (however `ValPtr`s can also incur overhead due to unnecessary +memory management). + +In addition, internally Zeek `VectorVal`s hold their values using a +`std::vector`. (This is the case regardless of whether script +optimization is in use). In some scripting contexts such vectors can be +typed as `vector of any` whereas at run-time the vector _does_ have a +concrete type. This in particular happens when a `vector` value is +initialized using an empty `vector()` constructor. Because of this +discrepancy, ZAM needs to explicitly transform such `vector of any`s +into `vector of T` where `T` is the actual type. The script interpreter +doesn't need to make this change because it always deals with the elements +of the vector as `Val` objects (which require dynamic construction from +the underlying `ZVal` union). + +ZAM also needs to deal with function _parameters_ (passed into a ZAM `ZBody` +using a script-level `Frame` object holding `ValPtr`s), _globals_, +and lambda _captures_. Parameters are loaded from the frame +into a corresponding ZAM frame slot, translating the `ValPtr` to a +`ZVal`. In interpreted execution, globals and captures have their +own `ValPtr`s separate from the `Frame` object. ZAM loads these into +corresponding `ZVal` locals, and writes back their values to their +separate storage on whenever the global or capture is assigned. (This +turns out to not be common. ZAM used to have a more complex mechanism +for tracking which globals were _dirty_ and needed updating, but it +could lead to errors in complex situations where globals were separately +altered - such as via `&default` functions - so now a simple write-through +cache is used.) + +If a ZAM function is potentially recursive (as revealed by AST profiling), +then its _frame_ is dynamically allocated upon execution. However, if it's +not potentially recursive (far and away the most common case), then the +frame is created _statically_, which avoids allocation/deallocation overhead. + + +### ZAM Instructions + +Each function body is compiled into a ZAM program made up of an array +of pointers to ZAM instructions (`ZInst` objects). Instructions have +a standardized format: + +* `op` + The associated op-code. Used as the index for a + C++ switch statement to access the specific C++ code + associated with executing the instruction. + +* `v1`, `v2`, `v3`, `v4` + Four integer values, often referred to as _slots_. + These are often used as indexes into the function's _frame_ + in order to access or assign to variables, but they + can also be used as integer constants, or indices into + auxiliary data structures. By convention, if an instruction + assigns to a variable (very common), the variable is + designated by the value in `v1`. In any case, slots + are always used left-to-right, so for example if an + instruction needs 3 integer values, `v4` will not be used + and the others will be. In addition, for instructions + that use slots both as frame indices and as constants, + the frame indices are always the lefthand values, and + the constants the later (rightward) values. + +* `c` + An associated script-level constant, expressed as a `ZVal`. + This value is optional (since many instructions don't need + a script-level constant); if not present, `c` is set to + an empty `ZVal`. + +* `op_type` + Encodes how the instruction uses the various elements. + Expressed as a `ZAMOpType` constant. For example, an + `op_type` of `OP_VVC_I2` means that the instruction + uses two of the slots (so `v1` and `v2`) and also the + constant in `c`, and the second of the slots (`v2`) is + interpreted as an integer rather than as a frame index. + See `src/script_opt/ZAM/ZOp.h` for further description + and a list of all of the types. +`op_type`s are used for printing out ZAM instructions +in a meaningful fashion (hugely helpful for debugging); +they are not used during normal execution. + +* `t` + An optional `TypePtr` value giving a Zeek scripting type + associated with the instruction. This is often required + because `ZVal` values do not include their associated + `Type`. + +* `t2` + A second such `TypePtr` value - only rarely needed. + +* `is_managed` + An optional boolean that indicates whether the `v1` slot + (always an assignment target, if the boolean is present) + requires memory management. + +* `aux` + An optional pointer to _auxiliary_ information needed + by the instruction. See the [discussion](#inst-aux) below. + +* `loc` + The source code location associated with the instruction, + expressed as a `ZAMLocInfo` object. These values are + actually more complex than the `Location` objects associated + with AST nodes, because they also include the _call graph_ + of locations leading up to the given point in the source + code due to inlining of function calls, which allows + disambiguating different instantiations of the same + source code line. + +Some instructions need to deal with an extensive collection of operands, +beyond what fits with the above static framework. For example, a function +call with 5 arguments cannot be expressed using just 4 slots. For these, +the `aux` field points to a `ZInstAux` object. The main use of these objects is +to hold an array of `AuxElem` objects, each of which specifies either an +integer value (usually meant to be used as a frame slot) or a constant, which +might be a low-level `ZVal` or a higher-level `ValPtr`, along with an +associated type and an indication of whether use of the element requires +memory management. + + +Auxiliary `ZInstAux` objects also provide various grab-bag fields used by +oddball instructions here-and-there. These include information for +constructing lambdas and `when` triggers, attributes, event handler names, +mappings of record fields and other information for efficient record +creation and updating, elements used for looping over tables, and the +control flow information associated with the instruction (such as whether +it's the start of a conditional or the end of a block), useful for debugging. + +ZAM instructions come in two forms: `ZInst` objects, corresponding to what's +described above, and _intermediary_ `ZInstI` objects, a subclass of `ZInst`, +which use abstract branches (pointers to other `ZInstI` objects rather +than integer program-counter values) and track analysis information +such as whether the instruction is _live_, its loop depth, and how +many branches come into it. These attributes are used for [low-level +optimization](#finalizing), as described below. + + +## Generating ZAM Instructions + +The general philosophy guiding the design of the ZAM instruction set +is to factor out as much static information as possible, meaning that +individual instructions should avoid conditional tests when the outcome +of the condition can be determined at compile time. For example, +in the interpreter the `TimesExpr` AST node for handling the "`*`" +multiplication operator checks at run-time, using an _if-else_ sequence, +whether the operands are `int`, `count`, or `double`. ZAM instead +has 3~different ZAM instructions, one for each type of operand, so +no run-time test is needed. + +Actually, it has 21 different instructions for multiplication. The extra +dimensions come from (1) which of the operands is a constant (for which +there are three possibilities - none, the lefthand one, the righthand one - note that if they are _both_ constants then instead script optimization +will fold the multiplication at compile-time), (2) whether the LHS of the +operation is a direct assignment to a variable or instead an assignment +to a record field, and (3) whether its a vector multiplication (in which +case the operands are always variables). + +For multiplication, there's actually no need to support both +first-operand-is-a-constant and second-operand-is-a-constant as +discussed above, since the operation is commutative and therefore could +always be written to only require one of these. However - as we'll +discuss shortly - because ZAM instructions are automatically generated +from templates, we get both flavors of one-operand-is-constant for "free". + +Clearly it would be very tedious, and bug-prone, to code up all 21 possibilities by hand. Instead, ZAM uses a _templating language_ where all +that's required is to specify the particulars of a given family of +instructions, allowing the auxiliary program `gen-zam` to produce all of +the family members. For example, for multiplication the template is: + +``` + binary-expr-op Times + op-type I U D + vector + eval $1 * $2 +``` + +and That's It. Here the template specifies the `Times` family of instructions. +They correspond to a binary (two-operand) expression, so `gen-zam` knows +that each instruction will take two operands and assign a result to the LHS +(always specified by the instruction's `v1` slot, as discussed above). +The next line instructs `gen-zam` to produce versions for (signed) integers, +unsigned integers, and double-precision. The third line specifies that +it should also produce versions for vectorized operation. Finally, +the fourth line gives abstract C++ code for evaluating the expression. +Here, `$1` refers to the first operand (which might be a variable given +by a frame slot, or might be a constant) and likewise `$2` the second +operand. (This expression is also used as the "kernel" for vector operations.) + +From this, `gen-zam` produces instructions with the following op-codes: + +``` + OP_TIMES_VCV_D + OP_TIMES_VCV_U + OP_TIMES_VCV_I + OP_TIMES_VCVi_field_D + OP_TIMES_VCVi_field_U + OP_TIMES_VCVi_field_I + OP_TIMES_VVC_D + OP_TIMES_VVC_U + OP_TIMES_VVC_I + OP_TIMES_VVCi_field_D + OP_TIMES_VVCi_field_U + OP_TIMES_VVCi_field_I + OP_TIMES_VVV_D + OP_TIMES_VVV_U + OP_TIMES_VVV_I + OP_TIMES_VVVi_field_D + OP_TIMES_VVVi_field_U + OP_TIMES_VVVi_field_I + OP_TIMES_VVV_vec_D + OP_TIMES_VVV_vec_U + OP_TIMES_VVV_vec_I +``` + +It also produces C++ that knows, upon encountering a `TimesExpr` AST node +(in reduced form), how to map it to a usage of the appropriate one of these instructions. + +The templates processed by `gen-zam` reside in `src/script_opt/ZAM/OPs/`. +There's a [README.txt](https://github.com/zeek/zeek/blob/9e85a0d27da6b65bb50470cc83df2d4b9bf207eb/src/script_opt/ZAM/OPs/README.txt) file in that directory that describes each of +the templating language components. We'll now walk through one more +example to convey the flavor of some of the other considerations. + +Here is the template used for some forms of `add` statements (add an +element to a set), where there's only a single index (so for example `add +my_set[my_ind]`, but not for `add my_set[my_ind1, my_ind2]`): + +``` + op AddStmt1 + op1-read + set-type $1 + classes VV VC + eval EvalAddStmt($1, $2.ToVal(Z_TYPE)) +``` + +It defines the `AddStmt1` family of instructions, and tells `gen-zam` that +these are simply operations (i.e., they don't have any additional information +about layout etc. such as conveyed by `binary-expr-op`). The second +line informs the ZAM low-level optimizer that these instructions do _not_ +assign to their first slot (`v1`), contrary to how most instructions +work. (Note that they do _alter_ the aggregate specified by `v1`, but +that is different from assigning a new aggregate to that variable, so +the operation is "read-only" regarding the value of the aggregate.) + +The next line specifies that the type that should be associated with the +instructions comes from the _second_ operand. (This looks weird, but it's +because `$$` is used for the first operand since for most instructions +that's the assignment target, so in these templates, the first _RHS_ +operand is in fact the second instruction operand.) + +The following line instructs `gen-zam` to produce two flavors of the +instruction, one of type `OP_VV` (two variable operands, specified by the +`v1` and `v2` slots), and one of type `OP_VC` (one variable operand, found +in `v1`, and one constant operand). + +The `eval` specifies a C++ code template for which `$1` will be replaced +with the first operand, and `$2` with the second (whether it is a variable +or a constant). `Z_TYPE` is a C++ `#define` definition expanding to +the Zeek `TypePtr` corresponding to the second operand (per the `set-type` +specification). `ToVal()` is a `ZVal` method for converting a `ZVal` to +a `ValPtr`. `EvalAddStmt` is a templating _macro_ (which is implemented +by turning it into a C++ `#define` macro). Here's its definition: + +``` +macro EvalAddStmt(lhs, ind) + auto index = ind; + bool iterators_invalidated = false; + lhs.AsTable()->Assign(std::move(index), nullptr, true, &iterators_invalidated); + if ( iterators_invalidated ) + WARN("possible loop/iterator invalidation"); +``` + +`gen-zam` generates in `build/src/` a large number of C++ files, each +included for use in a different context. For the above example of `AddStmt1`, +these are: + +### ZAM-OpDesc.h + +This contains a description of each ZAM instruction, allowing + reflection for identifying some forms of errors when writing + ZAM instruction templates. Here the entries are: +``` +{ OP_ADDSTMT1_VV, + { + "VV", + "", + "EvalAddStmt(frame[z.v1], frame[z.v2].ToVal(Z_TYPE))\n" } +}, +{ OP_ADDSTMT1_VC, + { + "VC", + "", + "EvalAddStmt(frame[z.v1], z.c.ToVal(Z_TYPE))\n" } +}, +``` + +These are included as part of the definition of a large _map_ + in `src/script_opt/ZAM/Validate.cc` (not yet documented). + +### ZAM-MacroDesc.h + +A companion file with descriptions of all of the macros used + to define ZAM instructions. For our example, what's germane is: +``` +{ "EvalAddStmt", + "#define EvalAddStmt(lhs, ind) \\\n" + " auto index = ind; \\\n" + " bool iterators_invalidated = false; \\\n" + " lhs.AsTable()->Assign(std::move(index), nullptr, true, &iterators_invalidated); \\\n" + " if ( iterators_invalidated ) \\\n" + " WARN(\"possible loop/iterator invalidation\");" +}, +``` + +### ZAM-EvalDefs.h + +The code used to execute the instruction. Incorporated into a + (very) large switch statement in `ZBody::Exec()`. Quite straightforward + here (and usually): + +``` +case OP_ADDSTMT1_VV: + { + EvalAddStmt(frame[z.v1], frame[z.v2].ToVal(Z_TYPE)) + } + break; + +case OP_ADDSTMT1_VC: + { + EvalAddStmt(frame[z.v1], z.c.ToVal(Z_TYPE)) + } + break; +``` + +### ZAM-MethodDefs.h + +C++ methods that the ZAM compiler calls upon encountering an appropriate +`add` statement AST node in order to convert it to ZAM code. + +``` +const ZAMStmt ZAMCompiler::AddStmt1VV(const NameExpr* n1, const NameExpr* n2) + { + ZInstI z; + z = GenInst(OP_ADDSTMT1_VV, n1, n2); + z.SetType(n2->GetType()); + return AddInst(z); + } + +const ZAMStmt ZAMCompiler::AddStmt1VC(const NameExpr* n, const ConstExpr* c) + { + ZInstI z; + z = GenInst(OP_ADDSTMT1_VC, n, c); + z.SetType(c->GetType()); +return AddInst(z); +} +``` + +(These methods are declared in a separate file, `ZAM-MethodDecls.h`.) + +### ZAM-Op1FlavorsDefs.h + +Captures the _flavor_ of how the LHS operand is treated. Here: + +``` + OP1_READ, // OP_ADDSTMT1_VV + OP1_READ, // OP_ADDSTMT1_VC +``` + +Other values are `OP1_WRITE` (the most common), `OP1_READ_WRITE`, +and `OP1_INTERNAL`. + +### ZAM-OpSideEffects.h + +Tracks whether a given instruction has side effects (meaning, +even if its assignment target turns out to not be needed, +the low-level optimizer should not delete the instruction). +This is `false` for most instructions, including our example ones: + +``` + false, // OP_ADDSTMT1_VV + false, // OP_ADDSTMT1_VC +``` + +### ZAM-OpsDefs.h + +Defines each op-code as part of a large C++ `enum`. Here our example +instructions are simply listed, along with all the others: + +``` + OP_ADDSTMT1_VV, + OP_ADDSTMT1_VC, +``` + +### ZAM-OpsNamesDefs.h + +Translates op-codes to human-readable strings for debugging. + +``` + case OP_ADDSTMT1_VV: return "addstmt1-VV"; + case OP_ADDSTMT1_VC: return "addstmt1-VC"; +``` + +For the above example, `gen-zam` will also create an entry in +`ZAM-EvalMacros.h`: + +``` +#define EvalAddStmt(lhs, ind) \ + auto index = ind; \ + bool iterators_invalidated = false; \ + lhs.AsTable()->Assign(std::move(index), nullptr, true, &iterators_invalidated); \ + if ( iterators_invalidated ) \ + WARN("possible loop/iterator invalidation"); +``` + +### Other Generated Files + +While not germane for our `AddStmt1` family of instructions, `gen-zam` +also generates a large number of additional files, which we briefly describe +here: + +* `ZAM-GenExprsDefsC1.h` \ + `ZAM-GenExprsDefsC2.h` \ + `ZAM-GenExprsDefsC3.h` \ + `ZAM-GenExprsDefsV.h` \ +These all hold switch statements for converting `Expr` AST nodes into calls +to methods defined in `ZAM-MethodDefs.h`. The first three apply when one +of the 1st, 2nd or 3rd expression operands (on the RHS) is a constant. The +last applies when all of the operands are variables. + +* `ZAM-GenFieldsDefsC1.h` \ + `ZAM-GenFieldsDefsC2.h` \ + `ZAM-GenFieldsDefsV.h` \ +Similar, but when the LHS is an assignment to a record field rather than +to a simple variable. + +* `ZAM-DirectDefs.h` \ +Used to override the mappings provided by the above when a special method +needs to be used rather than the default ones produced by `gen-zam`. + +* `ZAM-Vec1EvalDefs.h` \ +`ZAM-Vec2EvalDefs.h` \ +Switch statements for evaluating 1-operand and 2-operand vector operations. + +* `ZAM-Conds.h` \ +Switch statements for converting relationals used in `if` statements into +their corresponding ZAM instructions. + +### ZAM-AssignFlavorsDefs.h + +This one is fairly involved, so we give it its own section. + +Provides up to three types of information associated with an instruction +or an instruction family. For example +``` +assignment_flavor[OP_CALL2_V][TYPE_DOUBLE] = OP_CALL2_V_D; +``` +is used to determine, for a given function call with two arguments +(`OP_CALL2_V`), returning a type `double`, that the corresponding op-code +is `OP_CALL2_V_D`. + +``` +assignmentless_op[OP_CALL2_V_D] = OP_CALL2_X; +``` +is used to determine that if, for such a call, the low-level optimizer +determines that the return value won't be used, then the instruction can +be translated into a `OP_CALL2_X` instruction ... + +``` +assignmentless_op_class[OP_CALL2_V_D] = OP_X; +``` +... and if such a translation is done, then this entry indicates that the resulting +instruction has an `op_type` of `OP_X` (meaning, no variables or constants, +because all of the call information is in the `aux` field). + + + +## Compiling To ZAM + +The ZAM compiler walks the optimized, reduced AST and for each node +generates a small number (very often one) of ZAM instructions to implement +it. This process is greatly eased by the numerous _glue_ files generated +by `gen-zam`; these nearly automate compiling `Expr` nodes, and for +`Stmt` nodes often provide readily available methods. + +Still, some compilation instead requires lower-level generation of individual +`ZInstI` instructions. Much of this is eased by a set of `GenInst()` methods +defined in `src/script_opt/ZAM/Gen-Inst.h`. These take `NameExpr` and +`ConstExpr` nodes and automatically map them to instruction slots corresponding +to ZAM frame positions, or the `c` constant associated with ZAM instructions. + +The largest structural changes concern control flow: ASTs have notions of +inner and outer statement blocks, and abstract constructs such as `for` +loops and `switch` statements, which don't readily fit into low-level byte +code execution. The compiler turns such constructs into abstract `go-to` +statements that work by adjusting the program counter used during ZAM +execution. + +To help with debugging these transformations, the compilation process +associated _control flow types_ with ZAM instructions. These include: + +* `CFT_IF` + The start of an _if_ statement (i.e., just before computing + whether to branch). + +* `CFT_ELSE` + The start of the _else_ branch of an _if_ statement. + +* `CFT_BLOCK_END` + The given instruction marks the end of a block of statements. + +* `CFT_LOOP` + The start of a `for` loop. This generally comes _earlier_ than the + loop's condition, because some instructions must be executed to + initialize the loop. + +* `CFT_LOOP_COND` + The instruction that tests the condition of whether to continue + the loop. + +* `CFT_LOOP_END` + The end of the loop's body. + +* `CFT_BREAK` + The given branch corresponds to a `break` statement. + +* `CFT_NEXT` + The given branch corresponds to a `next` statement. + +* `CFT_DEFAULT` + The given statement starts the `default` case of a switch statement. + +* `CFT_INLINED_RETURN` + The given statement marks the return from an inlined function call. + +Note that some of these attributes might apply _multiple times_ for a given +instruction. For example, the same instruction might mark the end of more +than one statement block (`CFT_BLOCK_END`). Because of this consideration, +control flow information records not just the presence of a given type of +control flow but also a count regarding how many instances it represents. + +This stage of compilation introduces one significant optimization: replacement +of calls to certain Zeek BiFs with specialized instructions. Doing so can +be a significant performance win for BiFs that either are called frequently +but don't take too long to execute (in which case the win is by eliminating +function call and value conversion overhead); or that have internal overhead +due to needing to dynamically assess the types of their arguments, whereas +for a given call ZAM already knows what these are statically. + +When encountering a call, the `IsZAM_BuiltIn()` predicate returns +true if it corresponds to a BiF that ZAM can replace (and it goes ahead +with the replacement). An analogous call `IsZAM_BuiltInCond()` is +used for calls that are part of conditional tests (`if` statements or +`while` tests). + +At this point in ZAM's evolution there are enough such BiFs (about three +dozen) that it helps to both isolate the code for dealing with them (see +`src/script_opt/ZAM/BuiltIn.cc`) and define a number of abstractions to +make adding new ones easier. For example, using these the entire code +necessary to replace the `Files::__analyzer_enabled` BiF is: + +``` +SimpleZBI ae_ZBI{"Files::__analyzer_enabled", OP_ANALYZER_ENABLED_VC, OP_ANALYZER_ENABLED_VV}; +``` + +plus the following ZAM instruction template: +``` +internal-op Analyzer-Enabled +classes VV VC +op-types I X +eval $$ = ZAM::file_mgr_analyzer_enabled($1.ToVal(Z_TYPE)->AsEnumVal()); +``` + +For the first part of this, a `SimpleZBI` is a _ZAM BuiltIn_ (ZBI) that +takes either no arguments or only one argument. The initializer above uses +a throw-away name (`ae_ZBI`) to associate the name of the BiF with two ZAM +instructions, one taking a constant argument and one taking a variable +argument; both return a value (the initial `V` of the `classes`). + +Additional subclasses support BiFs taking more arguments, and two complex +BiF substitutions: a replacement for `sort()` that knows to skip the +replacement if the arguments are invalid (something that the BiF only +determines at run-time) and streamlines the very common case of the argument being a vector with only 0 or 1 elements; and a replacement for `cat()` that removes the +need for that function to analyze the types of its arguments at run-time. +Regarding this latter, a good target for the future would be to replace +calls to the `fmt()` BiF, as it has similar run-time dispatch. `fmt()` has +the added complexity of needing to support different format strings; perhaps +the replacement could initially restrict its usage to simpler formats. + + +## Finalizing the ZAM Program + +Once the compiler has converted the AST to a set of intermediary ZAM +instructions, two more steps remain: (1) _concretizing_ the instructions +into a final ZAM program, and (2) _optimizing_ the resulting low-level code. + +Concretization is straightforward: branches that were initially represented +in abstract terms (pointers to instructions), including loops, `next` and +`break` statements, are now changed into branches to absolute locations +in the program (i.e., integer offsets into the array of instructions). +This includes the tables used for `switch` statements. + +In addition, when doing so the compiler also computes _loop levels_, +i.e., the degree of nested looping present for each instruction. These +are used by the low-level optimizer to track the [usage range of frame +variables](#frame-lifetime) (see below). + +The low-level optimization involves a number of steps: + +- ZAM computes how many branches target each instruction (used for +deciding whether code is "dead"). + +- All "NOP" instructions are "killed" (marked as not [_live_](#live-inst); see below). These can be introduced +during compilation for uses such as placeholders for loop branches. + +- The optimizer then repeatedly makes passes over the ZAM program +until a pass no longer introduces any change. Each pass covers the +following: + + - For any _live_ instruction, checking whether it branches to the next live +instruction, and if so, killing the branch, +unless it has side effects (which loop iterations do). + - If the instruction does not continue to its successor (e.g., an +unconditional branch elsewhere), and the successor doesn't have any labels +(it's not a branch target), then the successor is killed. + - Any branch (including conditional) that targets an unconditional branch +can be (repeatedly) altered to instead branch to the target's own target. + + - For each frame variable, ZAM computes it's _lifetime_: when it is assigned +and when it is last used. This computation incorporates the _loop levels_ +mentioned above in order to correctly recognize that a frame variable's +lifetime might come _before_ its first assignment if it's inside a loop. +The logic regarding assignment is somewhat involved because it needs to +track instructions that don't follow the common assign-to-slot style, +and similarly for usage, since for some instructions the frame variables +it uses are in the _auxiliary_ information rather than being expressed +directly in the instruction's slots. + - ZAM kills any variable assignments +for which the variable isn't subsequently used (unless the instruction +has side effects, in which case we transform the instruction into its +complementary form that doesn't do an assignment). + - ZAM removes loads of globals or captures in straight-line +code (no inbound branches) for which the same load has occurred earlier. + +- After an iteration over the above finally does not result in any changes, the optimizer +makes a pass over the program to determine which frame variables have lifetimes +that do not overlap. These are then candidates for _doubling up_, i.e., sharing a frame slot. This step is a vital optimization +as without it, due to aggressive inlining, a ZAM program's frame can grow +very large, requiring much more time to initialize and manage. + +- The one nuance with doubling-up is that sharing is not done across a mix +of frame variables that are _managed_ (use reference counting, e.g. `table` values) and _unmanaged_ (e.g., `count` values), to ensure consistent memory +management for doubled-up slots across their different uses. + +- This pass also identifies Zeek scripting globals (which temporarily reside +in frame variables) that aren't used, and removes them. + +- At this point, the instructions all need to have their slots updated to +reflect the frame sharing. Similarly to the previous identification of frame +variable lifetimes, this needs to be done with sensitivity to atypical +instructions that don't assign to the first slot and/or have frame variable +offsets in their _auxiliary_ information. + +- As a final step, the `Frame` size associated with the function is reduced +to just the number of parameters used when calling it. This makes the +invocation of compiled functions more efficient because they don't need +to allocate large `Frame` object arrays that would for the most part would +go unused. + +Whether or not optimization is run, the last compilation step is to translate +the (live) abstract `ZInstI` instructions to concrete `ZInst` instructions +to form the final program, which is used to construct a `ZBody` object +that replaces the `Stmt` AST object that originally represented the function's +body. + +Once done with the compilation, the memory associated with the ASTs is +reclaimed, to aid in reducing the memory footprint of ZAM-compiled scripts. +(An exception is when using ZAM's fine-grained profiling, where some of +the original state is kept in order to loop over functions to generate +their profiles.) + + +## Execution + +Functions with `ZBody` objects for their bodies look the same, externally, +as those with `Stmt` object bodies. This means that ZAM-optimized scripts +are fully interoperable with non-ZAM-optimized scripts (and with calls made +up from the event engine). + +Interpreted scripts have a `Frame` object that holds a number of `ValPtr` +objects pointing to the values of the script's local variables, including +its function parameters. For ZAM-optimized scripts, the local variables +are managed separately in an array of `ZVal` objects, so their `Frame` +objects just have slots for the function parameters (necessary for +interoperability). The first ZAM instructions of a ZAM program then +load the parameters from the `Frame` and store their `ZVal` representation +as local variables. + +Prior to executing the ZAM program, `ZBody::Exec` creates a `ZBodyStateManager` +state management object. This object provides memory management for all +of the `ZVal` locals that require it; because it does so via its destructor, +memory management occurs even if execution is aborted via a C++ exception. +(It also manages state for some objects used during execution - in particular, +tracking the values used for `for` loop iterations.) + +Execution proceeds using a program counter that indexes into the collection +of ZAM instructions that comprise the program. Executing a single instruction +involves switching on its op-code into a large collection of cases, one +per distinct operation. After most instructions, the PC is incremented +to fetch the next instruction, but branches instead assign new values +to it. Execution finishes when the PC exceeds the number of available +instructions. At that point, all that remains is to construct the return +value (as a `ValPtr`), if any. + + +## BTests + +The BTest `-a zam` alternative runs the CI test suite using ZAM optimization. +Most of the tests run producing identical results to ordinary interpreted execution. Those that vary, requiring +their own `Baseline.zam/` baseline, usually differ simply in terms of +error messages, or execution information such as backtraces. In addition, the +`testing/btest/opt/` directory has a number of tests specifically designed +for script optimization, including a few for catching regressions of +bugs now fixed. + + +## Trouble-shooting + +To diagnose a ZAM problem, the first step is to construct as simple of +a reproducer as possible. There are several tools for doing so: + +* Using `--optimize-file=xyz` will confine optimization to just scripts +whose filename matches the regular expression _xyz_. (The regular expression +is unanchored by default, but you can use `^` and `$` to anchor it if +need be.) Multiple instances of this argument will expand the optimization +to scripts matching any of the corresponding regular expressions. + +* Using `--optimize-func=xyz` will confine optimization to functions +(or events or hooks) whose name matches _xyz_. (Here the regular expression +_is_ anchored, though you can override that by adding `.*` to either end. +For example, you can use `--optimize-func=Module::.*` to match every function in the +`Module` namespace.) + +* True functions (i.e., not events or hooks) often are _fully optimized away_ +by the inliner, which can make pinpointing issues with them difficult. +You can turn off the inliner using `-O no-inline`. + +* In addition, as noted above, ZAM can _coalesce_ multiple event handlers +for the same event into a single body, which can make it difficult to +isolate issues in just one particular handler. You can turn this off using +`-O no-event-handler-coalescence`. + +* You can run the interpreter on the reduced AST by using just `-O xform` +(no `-O ZAM`). If the problem already manifests at that point, then it +relates to the reduction process. + +* You can inspect the AST-level transformations ZAM does using `-O dump-xform`. + +* If `-O xform` doesn't cause the problem to manifest, you can try +`-O optimize-AST` instead, which also employs the AST-level optimizer, +though still executes the result using the interpreter. + +* Bugs arising from the AST-level optimizer sometimes are due to mistakes +in how ZAM computes which variables are defined and used at which points +in a function. You can dump information about this using `-O dump-uds`. + +* If the optimized AST doesn't manifest the problem, you can use +`-O gen-ZAM-code` to generate ZAM code but without using the AST optimizer +nor ZAM's low-level optimizer. + +* You can generate ZAM code that uses the AST optimizer but not ZAM's +low-level optimizer via `-O no-ZAM-opt`. + +* The analog to `-O dump-xform` for ZAM code (rather than for AST +transformations) is `-O dump-ZAM`. This includes a bunch of intermediary +stages. If you want to inspect just the final code, use `-O dump-final-ZAM`. + +* When using the debugger to analyze ZAM compilation, or the resulting +execution, one very handy function is `obj_desc()`. It takes a pointer to +a Zeek `Obj` and returns its string description. (Note that it requires +bare pointers, so if you have an `IntrusivePtr` - as is often the case - +you can invoke it using `print obj_desc(my_intrusive_ptr.ptr_)`.) + +* When rebuilding Zeek due to trying to fix ZAM issues, note that if you +change any ZAM instructions, you sometimes need to compile twice. +Ninja (and maybe Make) misses some of the dependencies, which are subtle +since they come from use of dynamically generated files.