mirror of
https://github.com/zeek/zeek.git
synced 2025-10-08 17:48:21 +00:00
192 lines
8.1 KiB
Markdown
192 lines
8.1 KiB
Markdown
<h1 align="center">
|
|
|
|
Compiling Zeek Scripts To C++: User's Guide
|
|
|
|
</h1><h4 align="center">
|
|
|
|
[_Overview_](#overview) -
|
|
[_Workflows_](#workflows) -
|
|
[_Known Issues_](#known-issues) -
|
|
|
|
</h4>
|
|
|
|
|
|
<br>
|
|
|
|
Overview
|
|
--------
|
|
|
|
Zeek's _script compiler_ is an experimental feature that translates Zeek
|
|
scripts into C++, which is then compiled directly into the `zeek` binary in
|
|
order to gain higher performance by removing the need for Zeek to use an
|
|
interpreter to execute the scripts. Using this feature requires a
|
|
somewhat complex [workflow](#workflows).
|
|
|
|
How much faster will your scripts run? There's no simple answer to that.
|
|
It depends heavily on several factors:
|
|
|
|
* What proportion of the processing during execution is spent in Zeek's
|
|
_Event Engine_ rather than executing scripts.
|
|
|
|
* What proportion of the script's processing is spent executing built-in
|
|
functions (BiFs).
|
|
It might well be that most of your script processing actually occurs inside
|
|
the _Logging Framework_, for example, and thus you won't see much improvement.
|
|
|
|
* Those two factors add up to gains often on the order of only 10-15%,
|
|
rather than something a lot more dramatic. On the other hand, using
|
|
this feature you can afford to put significantly more functionality in
|
|
Zeek scripts without worrying as much about introducing performance
|
|
bottlenecks.
|
|
|
|
That said, I'm very interested in situations where the performance
|
|
gains appear unsatisfying. Also note that when using the compiler, you
|
|
can analyze the performance of your scripts using C++-oriented tools -
|
|
the translated C++ code generally bears a clear relationship
|
|
with the original Zeek script.
|
|
|
|
If you want to know how the compiler itself works, see the sketch
|
|
at the beginning of `Compile.h`.
|
|
|
|
<br>
|
|
|
|
|
|
Workflows
|
|
---------
|
|
|
|
_Before building Zeek_, see the first of the [_Known Issues_](#known-issues)
|
|
below regarding compilation times. If your aim is to exploration of the
|
|
functionality rather than production use, you might want to build Zeek
|
|
using `./configure --enable-debug`, which can reduce compilation times by
|
|
50x (!). Once you've built it, the following sketches how to create
|
|
and use compiled scripts.
|
|
|
|
The main code generated by the compiler is taken from
|
|
`build/CPP-gen.cc`. An empty version of this is generated when
|
|
first building Zeek.
|
|
|
|
As a user, the most common workflow is to build a version of Zeek that
|
|
has a given target script (`target.zeek`) compiled into it. This means
|
|
_all of the code pulled in by `target.zeek`_, including the base scripts
|
|
(or the "bare" subset if you invoke the compiler when running `zeek -b`).
|
|
The following workflow assumes you are in the `build/` subdirectory:
|
|
|
|
1. `./src/zeek -O gen-C++ target.zeek`
|
|
The generated code is written to
|
|
`CPP-gen.cc`.
|
|
2. `ninja` or `make` to recompile Zeek
|
|
3. `./src/zeek -O use-C++ target.zeek`
|
|
Executes with each function/hook/event
|
|
handler pulled in by `target.zeek` replaced with its compiled version.
|
|
|
|
Instead of the last line above, you can use the following variants:
|
|
|
|
3. `./src/zeek -O report-C++ target.zeek`
|
|
For each function body in
|
|
`target.zeek`, reports which ones have compiled-to-C++ bodies available,
|
|
and also any compiled-to-C++ bodies present in the `zeek` binary that
|
|
`target.zeek` does not use. Useful for debugging.
|
|
|
|
The above workflows require the subsequent `zeek` execution to include
|
|
the `target.zeek` script. You can avoid this by replacing the first step with:
|
|
|
|
1. `./src/zeek -O gen-standalone-C++ target.zeek >target-stand-in.zeek`
|
|
|
|
(and then building as in the 2nd step above).
|
|
This option prints to _stdout_ a
|
|
(very short) "stand-in" Zeek script that you can load using
|
|
`target-stand-in.zeek` to activate the compiled `target.zeek`
|
|
without needing to include `target.zeek` in the invocation (nor
|
|
the `-O use-C++` option). After loading the stand-in script,
|
|
you can still access types and functions declared in `target.zeek`.
|
|
|
|
Note: the implementation differences between `gen-C++` and `gen-standalone-C++`
|
|
wound up being modest enough that it might make sense to just always provide
|
|
the latter functionality, which it turns out does not introduce any
|
|
additional constraints compared to the current `gen-C++` functionality.
|
|
On the other hand, it's possible (not yet established) that code created
|
|
using `gen-C++` can be made to compile significantly faster than
|
|
standalone code.
|
|
|
|
Another option, `-O add-C++`, instead _appends_ the generated code to existing C++ in `CPP-gen.cc`.
|
|
You can use this option repeatedly for different scripts and then
|
|
compile the collection _en masse_.
|
|
|
|
There are additional workflows relating to running the test suite, which
|
|
we document only briefly here as they're likely going to change or go away
|
|
, as it's not clear they're actually needed.
|
|
|
|
* `non-embedded-build`
|
|
Builds `zeek` without any embedded compiled-to-C++ scripts.
|
|
* `bare-embedded-build`
|
|
Builds `zeek` with the `-b` "bare-mode" scripts compiled in.
|
|
* `full-embedded-build`
|
|
Builds `zeek` with the default scripts compiled in.
|
|
|
|
<br>
|
|
|
|
* `eval-test-suite`
|
|
Runs the test suite using the `cpp` alternative over the given set of tests.
|
|
* `test-suite-build`
|
|
Incrementally compiles to `CPP-gen-addl.h` code for the given test suite elements.
|
|
|
|
<br>
|
|
|
|
* `single-test.sh`
|
|
Builds the given btest test as a single `add-C++` add-on and then runs it.
|
|
* `single-full-test.sh`
|
|
Builds the given btest test from scratch as a self-contained `zeek`, and runs it.
|
|
* `update-single-test.sh`
|
|
Given an already-compiled `zeek` for the given test, updates its `cpp` test suite alternative.
|
|
|
|
Some of these scripts could be made less messy if `btest` supported
|
|
a "dry run" option that reported the executions it would do for a given
|
|
test without actually undertaking them.
|
|
|
|
<br>
|
|
|
|
Known Issues
|
|
------------
|
|
|
|
Here we list various known issues with using the compiler:
|
|
<br>
|
|
|
|
* Compilation of compiled code can be quite slow when the C++ compilation
|
|
includes optimization,
|
|
taking many minutes on a beefy laptop. This slowness complicates
|
|
CI/CD approaches for always running compiled code against the test suite
|
|
when merging changes.
|
|
|
|
* Run-time error messages generally lack location information and information
|
|
about associated expressions/statements, making them hard to puzzle out.
|
|
This could be fixed, but would add execution overhead in passing around
|
|
the necessary strings / `Location` objects.
|
|
|
|
* To avoid subtle bugs, the compiler will refrain from compiling script elements (functions, hooks, event handlers) that include conditional code. In addition, when using `--optimize-files` it will not compile any functions appearing in a source file that includes conditional code (even if it's not in a function body).
|
|
|
|
* Code compiled with `-O gen-standalone-C++` will not execute any global
|
|
statements when invoked using the "stand-in" script. The right fix for
|
|
this is to shift from encapsulating global statements in a pseudo-function,
|
|
as currently done, to instead be in a pseudo-event handler.
|
|
|
|
* Code compiled with `-O gen-standalone-C++` likely has bugs if that
|
|
code requires initializing a global variable that specifies extend fields in
|
|
an extensible record (i.e., fields added using `redef`).
|
|
|
|
* The compiler will not compile bodies that include "when" statements
|
|
This is fairly involved to fix.
|
|
|
|
* The compiler will not compile bodies that include "type" switches.
|
|
This is not hard to fix.
|
|
|
|
* If a lambda generates an event that is not otherwise referred to, that
|
|
event will not be registered upon instantiating the lambda. This is not
|
|
particularly difficult to fix.
|
|
|
|
* A number of steps could be taken to increase the performance of
|
|
the optimized code. These include:
|
|
1. Switching the generated code to use the new ZVal-related interfaces.
|
|
2. Directly calling BiFs rather than using the `Invoke()` method to do so. This relates to the broader question of switching BiFs to be based on a notion of "inlined C++" code in Zeek functions, rather than using the standalone `bifcl` BiF compiler.
|
|
3. Switching the Event Engine over to queuing events with `ZVal` arguments rather than `ValPtr` arguments.
|
|
4. Making the compiler aware of certain BiFs that can be directly inlined (e.g., `network_time()`), a technique employed effectively by the ZAM compiler.
|
|
5. Inspecting the generated code for inefficiencies that the compiler could avoid.
|