ZAM documentation updated to reflect finer-grained profiling

This commit is contained in:
Vern Paxson 2024-03-10 17:13:41 -04:00 committed by Tim Wojtulewicz
parent 0538c68221
commit 593355cf2c

View file

@ -7,12 +7,14 @@ ZAM Optimization: User's Guide
[_Overview_](#overview) -
[_Known Issues_](#known-issues) -
[_Optimization Options_](#script-optimization-options) -
[_ZAM Profiling_](#ZAM-profiling) -
</h4>
<br>
<a name="overview"></a>
## Overview
Zeek's _ZAM optimization_ is an experimental feature that changes the
@ -22,8 +24,9 @@ that are then executed by recursively interpreting each node in a given
tree. With script optimization, Zeek compiles the trees into a low-level
form that can generally be executed more efficiently.
You specify use of this feature by including `-O ZAM` on the command
line. (Note that this option takes a few seconds to generate the ZAM code, unless you're using `-b` _bare mode_.)
You specify use of this feature by including `-O ZAM` on the command line.
(Note that this option takes a few seconds to generate the ZAM code, unless
you're using `-b` _bare mode_.)
How much faster will your scripts run? There's no simple answer to that.
It depends heavily on several factors:
@ -43,6 +46,7 @@ rather than something a lot more dramatic.
<br>
<a name="known-issues"></a>
## Known Issues
@ -54,9 +58,6 @@ of script interpretation).
### Deficiencies:
* Run-time error messages in compiled scripts have diminished identifying
information.
* The optimizer assumes you have ensured initialization of your variables.
If your script uses a variable that hasn't been set, the compiled code may
crash or behave aberrantly. You can use the `-u` command-line flag to find such potential usage issues.
@ -76,6 +77,7 @@ different.
<br>
<a name="script-optimization-options"></a>
## Script Optimization Options
Users will generally simply use `-O ZAM` to invoke the script optimizer.
@ -88,16 +90,119 @@ issues:
|`dump-uds` | Dump use-defs to _stdout_.|
|`dump-xform` | Dump transformed scripts to _stdout_.|
|`dump-ZAM` | Dump generated ZAM code to _stdout_.|
|`gen-ZAM-code` | Generate ZAM without additional optimizations.|
|`help` | Print this list.|
|`inline` | Inline function calls.|
|`no-inline` | Suppress inlining even if another option implies it.|
|`no-ZAM-opt` | Turn off low-level ZAM optimization.|
|`optimize-all` | Optimize all scripts, even inlined ones. You need to separately specify which optimizations you want to apply, e.g., `-O inline -O xform`.|
|`optimize-AST` | Optimize the (transform) AST; implies `xform`.|
|`profile-ZAM` | Generate to _stdout_ a ZAM execution profile. (Requires configuring with `--enable-debug`.)|
|`profile-ZAM` | Generate to "zprof.out" a ZAM execution profile. (Requires configuring with `--enable-ZAM-profiling` or `--enable-debug`.)|
|`report-recursive` | Report on recursive functions and exit.|
|`report-uncompilable` | Report on uncompilable functions and exit. For ZAM, all functions should be compilable.|
|`xform` | Transform scripts to "reduced" form.|
<a name="ZAM-profiling"></a>
## ZAM Profiling
ZAM supports detailed script execution profiling, activated using `-O
profile-ZAM`. (This option implies `-O ZAM` unless you've already specified
some ZAM optimization options.) Profiles are written to `zprof.out`. These
profiles have a number of components, and are intended to be subsetted
(such as by using `grep`) and then further processed, such as by using
`sort` to pick out instances with large values.
When profiling, ZAM gathers for each function body its total number of
calls, total CPU time (including time spent in any script functions or
BiFs it calls), an estimate of total memory allocations, and the number
of sampled ZAM instructions (see below). Memory impact is done by comparing
Zeek's total memory usage when a function body starts with its total usage
when the body finishes. Any increase is charged against the function body;
decreases however are not (because it's not as clear whether they are
meaningful). This approach only approximates actual memory usage because
often allocations can be met by memory currently available at user level,
not requiring any kernel allocations; and if they _do_ require kernel
allocations, those can be significantly larger than the immediate need.
That said, experience finds that the resulting values (reported in bytes)
do generally reflect the execution's state-holding impact.
Often the report will state that a function body "did not execute" when
in fact you're sure it did. This arises due to ZAM's heavy use of _inlining_:
while the code of the function body did indeed execute, it did so only in
the context of _other_ function bodies, and that's where the CPU & memory
impact are charged. (You can suppress this by using `-O no-inline` to turn
off ZAM's inlining, although with a significant performance impact.)
In addition to per-function profiling, ZAM also profiles individual ZAM
instructions. Because fine-grained profiling of every instruction execution
imposes a significant performance penalty,
ZAM does instruction-level profiling using _sampling_. The default sampling
rate is 1-in-100. (You can control it via the `ZEEK_ZAM_PROF_SAMPLING_RATE`
environment variable, and setting that variable to `1` effectively turns
off sampling). More frequent sampling rates slow down execution further but
provide more accurate information. With the default rate, the slowdown is
about 2x, so not something to use in production in its present form.
At the top of `zprof.out`, ZAM reports the sampling rate and also its
estimate of the cost to profiling a single ZAM instruction, and of assessing
memory impact. When reporting CPU times, ZAM subtracts off these costs.
(If the resulting value is negative, it's rounded up to 0.) Reported CPU
times do _not_ factor in the sampling rate, so for example if you want to
estimate the full impact of executing an instruction, when using the default
sampling rate you would multiple the reported value by 100.
For each profiled instruction, ZAM associates a _call tree_ reflecting
each function body and "statement block" leading up to the specific point
in the scripts for which the instruction executes. The call tree appears
after a `//` delimiter, such as this:
`
Config::config_option_changed 4 2 0.000009 load-val-VV-S 2 (Config::location), interpreter frame[2] // Site::zeek_init;Site::zeek_init:315-336;Site::zeek_init:326-329;Site::zeek_init:329;Site::update_private_address_space;Site::update_private_address_space:283-312;Site::update_private_address_space:310;Config::config_option_changed;Config::config_option_changed:142-151
`
This line is reporting a single ZAM `load-val-VV-S` instruction corresponding
to the `Config::config_option_changed` function body. The value of `4`
reflects the position of this instruction in the fully compiled body (the
instruction right above it will be numbered `3`). The `2` and `0.000009`
indicate that the instruction's execution was sampled two times, for a
total of 9 microseconds of CPU time. The text after `load-val-VV-S` up
through the `//` delimiter give some particulars of the instruction's
operands. What follows is then an accounting for the scripting leading to
the instruction's execution: it's coming from a `zeek_init` handler,
including a number of statement blocks (such as lines `315-336`), with the
final `zeek_init` statement being line `329`. At that point,
`Site::update_private_address_space` was called, and it in turn called
`Config::config_option_changed`. (The `Config::config_option_change` call
was not inlined; if it had been, then the label in the first column would
have been either `zeek_init` or `Site::update_private_address_space`.)
A key point about the format used here is that it's the same as used by
`flamegraph.pl` to generate _Flame Graphs_. Thus, you can generate a flame
graph of ZAM-compiled script execution using the following:
`
grep // zprof.out | awk '{ printf "%s %d\n", $NF, $4 * 1e8 }' | flamegraph.pl >my-flame-graph.svg
`
Here the `awk` invocation is printing out the call tree (`$NF`) followed
by the sampled CPU time multiplied by 100,000,000 to convert it to
microseconds and to expand it by the default sampling rate of 100.
The profile also computes per-module sampling statistics, which you can
examine using `grep ^module zprof.out`. These include lines like:
`
module Weird sampled CPU time 0.095283, 157523 sampled instructions
`
This summary is derived from all sampled instructions whose call tree
included some function from the `Weird` module. As usual, you would
multiply the sampled values by the sampling rate to get the full estimated
values.
Finally, note that using ZAM profiling with its default sampling rate slows
down execution by 30-50%.
<br>
<br>