zeek/src/script_opt/CPP
Tim Wojtulewicz ef9ffda2ef Merge remote-tracking branch 'origin/topic/vern/standalone-event-groups'
* origin/topic/vern/standalone-event-groups:
  tracking of event groups for compilation to standalone-C++
2025-09-17 14:28:44 -07:00
..
maint fix for more robustly finding BTests to assess for -O gen-C++ 2025-05-31 12:50:14 -07:00
AttrExprType.h Fix clang-tidy performance-enum-size warnings in headers 2025-06-23 08:35:24 -07:00
Attrs.cc regularized (some) types of pointers used in script optimization 2023-12-12 09:45:19 +01:00
Attrs.h factored CPP source's main header into collection of per-source-file headers 2024-10-18 17:37:33 -07:00
Compile.h Use .contains() instead of .find() or .count() 2025-09-02 16:42:52 +00:00
Consts.cc fix for tracking identifiers and aggregates when compiling to standalone-C++ 2025-09-15 13:57:35 -07:00
Consts.h factored CPP source's main header into collection of per-source-file headers 2024-10-18 17:37:33 -07:00
CPP-load.bif script optimization fixes: 2022-11-20 12:16:25 -08:00
DeclFunc.cc tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
DeclFunc.h tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
Driver.cc tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
Driver.h for -O gen-standalone-C++, make the presence of uncompilable functions fatal unless -O allow-cond is used 2025-09-11 13:30:40 -06:00
Emit.cc Reformat Zeek in Spicy style 2023-10-30 09:40:55 +01:00
Emit.h tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
Exprs.cc fix for '?' operator precedence when compiling scripts to C++ 2025-09-15 14:18:16 -07:00
Exprs.h Fix clang-tidy performance-enum-size warnings in headers 2025-06-23 08:35:24 -07:00
Func.cc remove non-functional column information from Location objects 2025-07-08 10:39:53 +02:00
Func.h tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
GenFunc.cc tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
GenFunc.h tracking of event groups for compilation to standalone-C++ 2025-09-17 14:28:13 -07:00
Inits.cc Merge remote-tracking branch 'origin/topic/vern/standalone-event-groups' 2025-09-17 14:28:44 -07:00
Inits.h factored CPP source's main header into collection of per-source-file headers 2024-10-18 17:37:33 -07:00
InitsInfo.cc full tracking of the characteristics of globals when compiling scripts to C++ 2025-09-15 14:21:32 -07:00
InitsInfo.h full tracking of the characteristics of globals when compiling scripts to C++ 2025-09-15 14:21:32 -07:00
ISSUES updates to notes for compile-to-C++ maintenance 2022-09-16 16:53:42 -07:00
README.md introduced simplified initialization for non-standalone -O gen-C++ code 2024-12-06 16:25:22 -08:00
Runtime.h Reformat Zeek in Spicy style 2023-10-30 09:40:55 +01:00
RuntimeInits.cc full tracking of the characteristics of globals when compiling scripts to C++ 2025-09-15 14:21:32 -07:00
RuntimeInits.h Merge remote-tracking branch 'origin/topic/vern/standalone-event-groups' 2025-09-17 14:28:44 -07:00
RuntimeInitSupport.cc Merge remote-tracking branch 'origin/topic/vern/standalone-event-groups' 2025-09-17 14:28:44 -07:00
RuntimeInitSupport.h Merge remote-tracking branch 'origin/topic/vern/standalone-event-groups' 2025-09-17 14:28:44 -07:00
RuntimeOps.cc fix for associating attributes with globals for -O gen-standalone-C++ 2025-09-15 14:28:07 -07:00
RuntimeOps.h fix for associating attributes with globals for -O gen-standalone-C++ 2025-09-15 14:28:07 -07:00
RuntimeVec.cc Fix clang-tidy cppcoreguidelines-macro-usage findings (macro functions) 2025-06-04 09:24:05 -07:00
RuntimeVec.h -O gen-C++ support for pattern vector comparisons 2025-03-07 09:55:15 -08:00
Stmts.cc Clang-tidy fixes for recent IDPtr changes 2025-09-03 15:34:29 -07:00
Stmts.h shift much of the internal use of ID* identifier pointers over to IDPtr objects 2025-09-03 11:19:31 -07:00
Tracker.cc Use .contains() instead of .find() or .count() 2025-09-02 16:42:52 +00:00
Tracker.h Reformat Zeek in Spicy style 2023-10-30 09:40:55 +01:00
Types.cc Use .contains() instead of .find() or .count() 2025-09-02 16:42:52 +00:00
Types.h factored CPP source's main header into collection of per-source-file headers 2024-10-18 17:37:33 -07:00
Util.cc skip optimization of functions with AST nodes unknown to script optimization 2024-11-29 16:12:05 -08:00
Util.h Reformat Zeek in Spicy style 2023-10-30 09:40:55 +01:00
Vars.cc shift much of the internal use of ID* identifier pointers over to IDPtr objects 2025-09-03 11:19:31 -07:00
Vars.h shift much of the internal use of ID* identifier pointers over to IDPtr objects 2025-09-03 11:19:31 -07:00

Compiling Zeek Scripts To C++: User's Guide

Overview - Workflows - Known Issues -


Overview

Zeek's script compiler is an experimental feature that translates Zeek scripts into C++, which is then compiled directly into the zeek binary in order to gain higher performance by removing the need for Zeek to use an interpreter to execute the scripts. Using this feature requires a somewhat complex workflow.

How much faster will your scripts run? There's no simple answer to that. It depends heavily on several factors:

  • What proportion of the processing during execution is spent in Zeek's Event Engine rather than executing scripts.

  • What proportion of the script's processing is spent executing built-in functions (BiFs). It might well be that most of your script processing actually occurs inside the Logging Framework, for example, and thus you won't see much improvement.

  • Those two factors add up to gains often on the order of only 10-15%, rather than something a lot more dramatic. On the other hand, using this feature you can afford to put significantly more functionality in Zeek scripts without worrying as much about introducing performance bottlenecks.

That said, I'm very interested in situations where the performance gains appear unsatisfying. Also note that when using the compiler, you can analyze the performance of your scripts using C++-oriented tools - the translated C++ code generally bears a clear relationship with the original Zeek script.

If you want to know how the compiler itself works, see the sketch at the beginning of Compile.h.


Workflows

Before building Zeek, see the first of the Known Issues below regarding compilation times. If your aim is to exploration of the functionality rather than production use, you might want to build Zeek using ./configure --enable-debug, which can reduce compilation times by 50x (!). Once you've built it, the following sketches how to create and use compiled scripts.

The main code generated by the compiler is taken from build/CPP-gen.cc. An empty version of this is generated when first building Zeek.

As a user, the most common workflow is to build a version of Zeek that has a given target script (target.zeek) compiled into it. This means all of the code pulled in by target.zeek, including the base scripts (or the "bare" subset if you invoke the compiler when running zeek -b). The following workflow assumes you are in the build/ subdirectory:

  1. ./src/zeek -O gen-C++ target.zeek
    The generated code is written to CPP-gen.cc.
  2. ninja or make to recompile Zeek
  3. ./src/zeek -O use-C++ target.zeek
    Executes with each function/hook/event handler pulled in by target.zeek replaced with its compiled version.

Instead of the last line above, you can use the following variants:

  1. ./src/zeek -O report-C++ target.zeek
    For each function body in target.zeek, reports which ones have compiled-to-C++ bodies available, and also any compiled-to-C++ bodies present in the zeek binary that target.zeek does not use. Useful for debugging.

The above workflows require the subsequent zeek execution to include the target.zeek script. You can avoid this by replacing the first step with:

  1. ./src/zeek -O gen-standalone-C++ --optimize-files=target.zeek target.zeek >target-stand-in.zeek

(and then building as in the 2nd step above). This option prints to stdout a (very short) "stand-in" Zeek script that you can load using target-stand-in.zeek to activate the compiled target.zeek without needing to include target.zeek in the invocation (nor the -O use-C++ option). After loading the stand-in script, you can still access types and functions declared in target.zeek.

Note: gen-standalone-C++ must be used with --optimize-files, as the compiler needs the latter to determine which global declarations the standalone code needs to initialize.

There are additional workflows relating to running the test suite: see src/script_opt/CPP/maint/README.


Known Issues

Here we list various known issues with using the compiler:

  • Compilation of compiled code can be quite slow when the C++ compilation includes optimization, taking many minutes on a beefy laptop. This slowness complicates CI/CD approaches for always running compiled code against the test suite when merging changes.

  • Run-time error messages generally lack location information and information about associated expressions/statements, making them hard to puzzle out. This could be fixed, but would add execution overhead in passing around the necessary strings / Location objects.

  • To avoid subtle bugs, the compiler will refrain from compiling script elements (functions, hooks, event handlers) that include conditional code. In addition, when using --optimize-files it will not compile any functions appearing in a source file that includes conditional code (even if it's not in a function body). You can override this refusal with -O allow-cond.

  • Code compiled with -O gen-standalone-C++ will not execute any global statements when invoked using the "stand-in" script. The right fix for this is to shift from encapsulating global statements in a pseudo-function, as currently done, to instead be in a pseudo-event handler.

  • Code compiled with -O gen-standalone-C++ likely has bugs if that code requires initializing a global variable that specifies extending fields in an extensible record (i.e., fields added using redef).

  • If a lambda generates an event that is not otherwise referred to, that event will not be registered upon instantiating the lambda. This is not particularly difficult to fix.

  • A number of steps could be taken to increase the performance of the optimized code. These include:

    1. Switching the generated code to use the new ZVal-related interfaces, including for vector operations.
    2. Directly calling BiFs rather than using the Invoke() method to do so. This relates to the broader question of switching BiFs to be based on a notion of "inlined C++" code in Zeek functions, rather than using the standalone bifcl BiF compiler.
    3. Switching the Event Engine over to queuing events with ZVal arguments rather than ValPtr arguments.
    4. Making the compiler aware of certain BiFs that can be directly inlined (e.g., network_time()), a technique employed effectively by the ZAM compiler.
    5. Inspecting the generated code for inefficiencies that the compiler could avoid.