diff --git a/src/script_opt/ZAM/README.md b/src/script_opt/ZAM/README.md index a1313a6dd3..dc9c62da68 100644 --- a/src/script_opt/ZAM/README.md +++ b/src/script_opt/ZAM/README.md @@ -7,12 +7,14 @@ ZAM Optimization: User's Guide [_Overview_](#overview) - [_Known Issues_](#known-issues) - [_Optimization Options_](#script-optimization-options) - +[_ZAM Profiling_](#ZAM-profiling) -
+ ## Overview Zeek's _ZAM optimization_ is an experimental feature that changes the @@ -22,8 +24,9 @@ that are then executed by recursively interpreting each node in a given tree. With script optimization, Zeek compiles the trees into a low-level form that can generally be executed more efficiently. -You specify use of this feature by including `-O ZAM` on the command -line. (Note that this option takes a few seconds to generate the ZAM code, unless you're using `-b` _bare mode_.) +You specify use of this feature by including `-O ZAM` on the command line. +(Note that this option takes a few seconds to generate the ZAM code, unless +you're using `-b` _bare mode_.) How much faster will your scripts run? There's no simple answer to that. It depends heavily on several factors: @@ -43,6 +46,7 @@ rather than something a lot more dramatic.
+ ## Known Issues @@ -54,9 +58,6 @@ of script interpretation). ### Deficiencies: -* Run-time error messages in compiled scripts have diminished identifying -information. - * The optimizer assumes you have ensured initialization of your variables. If your script uses a variable that hasn't been set, the compiled code may crash or behave aberrantly. You can use the `-u` command-line flag to find such potential usage issues. @@ -76,6 +77,7 @@ different.
+ ## Script Optimization Options Users will generally simply use `-O ZAM` to invoke the script optimizer. @@ -88,16 +90,119 @@ issues: |`dump-uds` | Dump use-defs to _stdout_.| |`dump-xform` | Dump transformed scripts to _stdout_.| |`dump-ZAM` | Dump generated ZAM code to _stdout_.| +|`gen-ZAM-code` | Generate ZAM without additional optimizations.| |`help` | Print this list.| |`inline` | Inline function calls.| +|`no-inline` | Suppress inlining even if another option implies it.| |`no-ZAM-opt` | Turn off low-level ZAM optimization.| |`optimize-all` | Optimize all scripts, even inlined ones. You need to separately specify which optimizations you want to apply, e.g., `-O inline -O xform`.| |`optimize-AST` | Optimize the (transform) AST; implies `xform`.| -|`profile-ZAM` | Generate to _stdout_ a ZAM execution profile. (Requires configuring with `--enable-debug`.)| +|`profile-ZAM` | Generate to "zprof.out" a ZAM execution profile. (Requires configuring with `--enable-ZAM-profiling` or `--enable-debug`.)| |`report-recursive` | Report on recursive functions and exit.| |`report-uncompilable` | Report on uncompilable functions and exit. For ZAM, all functions should be compilable.| |`xform` | Transform scripts to "reduced" form.| + +## ZAM Profiling + +ZAM supports detailed script execution profiling, activated using `-O +profile-ZAM`. (This option implies `-O ZAM` unless you've already specified +some ZAM optimization options.) Profiles are written to `zprof.out`. These +profiles have a number of components, and are intended to be subsetted +(such as by using `grep`) and then further processed, such as by using +`sort` to pick out instances with large values. + +When profiling, ZAM gathers for each function body its total number of +calls, total CPU time (including time spent in any script functions or +BiFs it calls), an estimate of total memory allocations, and the number +of sampled ZAM instructions (see below). Memory impact is done by comparing +Zeek's total memory usage when a function body starts with its total usage +when the body finishes. Any increase is charged against the function body; +decreases however are not (because it's not as clear whether they are +meaningful). This approach only approximates actual memory usage because +often allocations can be met by memory currently available at user level, +not requiring any kernel allocations; and if they _do_ require kernel +allocations, those can be significantly larger than the immediate need. +That said, experience finds that the resulting values (reported in bytes) +do generally reflect the execution's state-holding impact. + +Often the report will state that a function body "did not execute" when +in fact you're sure it did. This arises due to ZAM's heavy use of _inlining_: +while the code of the function body did indeed execute, it did so only in +the context of _other_ function bodies, and that's where the CPU & memory +impact are charged. (You can suppress this by using `-O no-inline` to turn +off ZAM's inlining, although with a significant performance impact.) + +In addition to per-function profiling, ZAM also profiles individual ZAM +instructions. Because fine-grained profiling of every instruction execution +imposes a significant performance penalty, +ZAM does instruction-level profiling using _sampling_. The default sampling +rate is 1-in-100. (You can control it via the `ZEEK_ZAM_PROF_SAMPLING_RATE` +environment variable, and setting that variable to `1` effectively turns +off sampling). More frequent sampling rates slow down execution further but +provide more accurate information. With the default rate, the slowdown is +about 2x, so not something to use in production in its present form. + +At the top of `zprof.out`, ZAM reports the sampling rate and also its +estimate of the cost to profiling a single ZAM instruction, and of assessing +memory impact. When reporting CPU times, ZAM subtracts off these costs. +(If the resulting value is negative, it's rounded up to 0.) Reported CPU +times do _not_ factor in the sampling rate, so for example if you want to +estimate the full impact of executing an instruction, when using the default +sampling rate you would multiple the reported value by 100. + +For each profiled instruction, ZAM associates a _call tree_ reflecting +each function body and "statement block" leading up to the specific point +in the scripts for which the instruction executes. The call tree appears +after a `//` delimiter, such as this: + +` +Config::config_option_changed 4 2 0.000009 load-val-VV-S 2 (Config::location), interpreter frame[2] // Site::zeek_init;Site::zeek_init:315-336;Site::zeek_init:326-329;Site::zeek_init:329;Site::update_private_address_space;Site::update_private_address_space:283-312;Site::update_private_address_space:310;Config::config_option_changed;Config::config_option_changed:142-151 +` + +This line is reporting a single ZAM `load-val-VV-S` instruction corresponding +to the `Config::config_option_changed` function body. The value of `4` +reflects the position of this instruction in the fully compiled body (the +instruction right above it will be numbered `3`). The `2` and `0.000009` +indicate that the instruction's execution was sampled two times, for a +total of 9 microseconds of CPU time. The text after `load-val-VV-S` up +through the `//` delimiter give some particulars of the instruction's +operands. What follows is then an accounting for the scripting leading to +the instruction's execution: it's coming from a `zeek_init` handler, +including a number of statement blocks (such as lines `315-336`), with the +final `zeek_init` statement being line `329`. At that point, +`Site::update_private_address_space` was called, and it in turn called +`Config::config_option_changed`. (The `Config::config_option_change` call +was not inlined; if it had been, then the label in the first column would +have been either `zeek_init` or `Site::update_private_address_space`.) + +A key point about the format used here is that it's the same as used by +`flamegraph.pl` to generate _Flame Graphs_. Thus, you can generate a flame +graph of ZAM-compiled script execution using the following: + +` +grep // zprof.out | awk '{ printf "%s %d\n", $NF, $4 * 1e8 }' | flamegraph.pl >my-flame-graph.svg +` + +Here the `awk` invocation is printing out the call tree (`$NF`) followed +by the sampled CPU time multiplied by 100,000,000 to convert it to +microseconds and to expand it by the default sampling rate of 100. + +The profile also computes per-module sampling statistics, which you can +examine using `grep ^module zprof.out`. These include lines like: + +` +module Weird sampled CPU time 0.095283, 157523 sampled instructions +` + +This summary is derived from all sampled instructions whose call tree +included some function from the `Weird` module. As usual, you would +multiply the sampled values by the sampling rate to get the full estimated +values. + +Finally, note that using ZAM profiling with its default sampling rate slows +down execution by 30-50%. +