ZAM documentation updated to reflect finer-grained profiling

2025-10-02 06:38:20 +00:00 · 2024-03-10 17:13:41 -04:00 · 2024-03-10 17:13:41 -04:00 · 593355cf2c
commit 593355cf2c
parent 0538c68221
1 changed files with 111 additions and 6 deletions
--- a/src/script_opt/ZAM/README.md
+++ b/src/script_opt/ZAM/README.md
@ -7,12 +7,14 @@ ZAM Optimization: User's Guide
 [_Overview_](#overview) -
 [_Known Issues_](#known-issues) -
 [_Optimization Options_](#script-optimization-options) -
+[_ZAM Profiling_](#ZAM-profiling) -

 </h4>


 <br>

+<a name="overview"></a>
 ## Overview

 Zeek's _ZAM optimization_ is an experimental feature that changes the
@ -22,8 +24,9 @@ that are then executed by recursively interpreting each node in a given
 tree.  With script optimization, Zeek compiles the trees into a low-level
 form that can generally be executed more efficiently.

-You specify use of this feature by including `-O ZAM` on the command
-line.  (Note that this option takes a few seconds to generate the ZAM code, unless you're using `-b` _bare mode_.)
+You specify use of this feature by including `-O ZAM` on the command line.
+(Note that this option takes a few seconds to generate the ZAM code, unless
+you're using `-b` _bare mode_.)

 How much faster will your scripts run?  There's no simple answer to that.
 It depends heavily on several factors:
@ -43,6 +46,7 @@ rather than something a lot more dramatic.

 <br>

+<a name="known-issues"></a>
 ## Known Issues


@ -54,9 +58,6 @@ of script interpretation).

 ### Deficiencies:

-* Run-time error messages in compiled scripts have diminished identifying
-information.
-
 * The optimizer assumes you have ensured initialization of your variables.
 If your script uses a variable that hasn't been set, the compiled code may
 crash or behave aberrantly. You can use the `-u` command-line flag to find such potential usage issues.
@ -76,6 +77,7 @@ different.

 <br>

+<a name="script-optimization-options"></a>
 ## Script Optimization Options

 Users will generally simply use `-O ZAM` to invoke the script optimizer.
@ -88,16 +90,119 @@ issues:
 |`dump-uds`	|	Dump use-defs to _stdout_.|
 |`dump-xform`	|	Dump transformed scripts to _stdout_.|
 |`dump-ZAM`	|	Dump generated ZAM code to _stdout_.|
+|`gen-ZAM-code`		|	Generate ZAM without additional optimizations.|
 |`help`		|	Print this list.|
 |`inline`		|	Inline function calls.|
+|`no-inline`		|	Suppress inlining even if another option implies it.|
 |`no-ZAM-opt`	|	Turn off low-level ZAM optimization.|
 |`optimize-all`	|	Optimize all scripts, even inlined ones. You need to separately specify which optimizations you want to apply, e.g., `-O inline -O xform`.|
 |`optimize-AST`	|	Optimize the (transform) AST; implies `xform`.|
-|`profile-ZAM`	|	Generate to _stdout_ a ZAM execution profile. (Requires configuring with `--enable-debug`.)|
+|`profile-ZAM`	|	Generate to "zprof.out" a ZAM execution profile. (Requires configuring with `--enable-ZAM-profiling` or `--enable-debug`.)|
 |`report-recursive`	|	Report on recursive functions and exit.|
 |`report-uncompilable`	|	Report on uncompilable functions and exit. For ZAM, all functions should be compilable.|
 |`xform`		|	Transform scripts to "reduced" form.|

+<a name="ZAM-profiling"></a>
+## ZAM Profiling
+
+ZAM supports detailed script execution profiling, activated using `-O
+profile-ZAM`. (This option implies `-O ZAM` unless you've already specified
+some ZAM optimization options.) Profiles are written to `zprof.out`. These
+profiles have a number of components, and are intended to be subsetted
+(such as by using `grep`) and then further processed, such as by using
+`sort` to pick out instances with large values.
+
+When profiling, ZAM gathers for each function body its total number of
+calls, total CPU time (including time spent in any script functions or
+BiFs it calls), an estimate of total memory allocations, and the number
+of sampled ZAM instructions (see below). Memory impact is done by comparing
+Zeek's total memory usage when a function body starts with its total usage
+when the body finishes. Any increase is charged against the function body;
+decreases however are not (because it's not as clear whether they are
+meaningful). This approach only approximates actual memory usage because
+often allocations can be met by memory currently available at user level,
+not requiring any kernel allocations; and if they _do_ require kernel
+allocations, those can be significantly larger than the immediate need.
+That said, experience finds that the resulting values (reported in bytes)
+do generally reflect the execution's state-holding impact.
+
+Often the report will state that a function body "did not execute" when
+in fact you're sure it did. This arises due to ZAM's heavy use of _inlining_:
+while the code of the function body did indeed execute, it did so only in
+the context of _other_ function bodies, and that's where the CPU & memory
+impact are charged. (You can suppress this by using `-O no-inline` to turn
+off ZAM's inlining, although with a significant performance impact.)
+
+In addition to per-function profiling, ZAM also profiles individual ZAM
+instructions. Because fine-grained profiling of every instruction execution
+imposes a significant performance penalty,
+ZAM does instruction-level profiling using _sampling_.  The default sampling
+rate is 1-in-100. (You can control it via the `ZEEK_ZAM_PROF_SAMPLING_RATE`
+environment variable, and setting that variable to `1` effectively turns
+off sampling). More frequent sampling rates slow down execution further but
+provide more accurate information. With the default rate, the slowdown is
+about 2x, so not something to use in production in its present form.
+
+At the top of `zprof.out`, ZAM reports the sampling rate and also its
+estimate of the cost to profiling a single ZAM instruction, and of assessing
+memory impact. When reporting CPU times, ZAM subtracts off these costs.
+(If the resulting value is negative, it's rounded up to 0.) Reported CPU
+times do _not_ factor in the sampling rate, so for example if you want to
+estimate the full impact of executing an instruction, when using the default
+sampling rate you would multiple the reported value by 100.
+
+For each profiled instruction, ZAM associates a _call tree_ reflecting
+each function body and "statement block" leading up to the specific point
+in the scripts for which the instruction executes. The call tree appears
+after a `//` delimiter, such as this:
+
+`
+Config::config_option_changed 4 2 0.000009 load-val-VV-S 2 (Config::location), interpreter frame[2] // Site::zeek_init;Site::zeek_init:315-336;Site::zeek_init:326-329;Site::zeek_init:329;Site::update_private_address_space;Site::update_private_address_space:283-312;Site::update_private_address_space:310;Config::config_option_changed;Config::config_option_changed:142-151
+`
+
+This line is reporting a single ZAM `load-val-VV-S` instruction corresponding
+to the `Config::config_option_changed` function body. The value of `4`
+reflects the position of this instruction in the fully compiled body (the
+instruction right above it will be numbered `3`). The `2` and `0.000009`
+indicate that the instruction's execution was sampled two times, for a
+total of 9 microseconds of CPU time. The text after `load-val-VV-S` up
+through the `//` delimiter give some particulars of the instruction's
+operands. What follows is then an accounting for the scripting leading to
+the instruction's execution: it's coming from a `zeek_init` handler,
+including a number of statement blocks (such as lines `315-336`), with the
+final `zeek_init` statement being line `329`. At that point,
+`Site::update_private_address_space` was called, and it in turn called
+`Config::config_option_changed`. (The `Config::config_option_change` call
+was not inlined; if it had been, then the label in the first column would
+have been either `zeek_init` or `Site::update_private_address_space`.)
+
+A key point about the format used here is that it's the same as used by
+`flamegraph.pl` to generate _Flame Graphs_. Thus, you can generate a flame
+graph of ZAM-compiled script execution using the following:
+
+`
+grep // zprof.out | awk '{ printf "%s %d\n", $NF, $4 * 1e8 }' | flamegraph.pl >my-flame-graph.svg
+`
+
+Here the `awk` invocation is printing out the call tree (`$NF`) followed
+by the sampled CPU time multiplied by 100,000,000 to convert it to
+microseconds and to expand it by the default sampling rate of 100.
+
+The profile also computes per-module sampling statistics, which you can
+examine using `grep ^module zprof.out`. These include lines like:
+
+`
+module Weird sampled CPU time 0.095283, 157523 sampled instructions
+`
+
+This summary is derived from all sampled instructions whose call tree
+included some function from the `Weird` module.  As usual, you would
+multiply the sampled values by the sampling rate to get the full estimated
+values.
+
+Finally, note that using ZAM profiling with its default sampling rate slows
+down execution by 30-50%.
+
 <br>
 <br>