updates for documentation of functionality for compiling scripts to C++

This commit is contained in:
Vern Paxson 2021-06-04 17:15:15 -07:00
parent 725aa558a7
commit 4ecf70f515

View file

@ -54,6 +54,13 @@ at the beginning of `Compile.h`.
Workflows
---------
_Before building Zeek_, see the first of the [_Known Issues_](#known-issues)
below regarding compilation times. If your aim is to exploration of the
functionality rather than production use, you might want to build Zeek
using `./configure --enable-debug`, which can reduce compilation times by
50x (!). Once you've built it, the following sketches how to create
and use compiled scripts.
The main code generated by the compiler is taken from
`build/CPP-gen.cc`. An empty version of this is generated when
first building Zeek.
@ -66,21 +73,17 @@ The following workflow assumes you are in the `build/` subdirectory:
1. `./src/zeek -O gen-C++ target.zeek`
The generated code is written to
`CPP-gen-addl.h`. (This name is a reflection of some more complicated
features and probably should be changed.) The compiler will also produce
a file `CPP-hashes.dat`, for use by an advanced feature.
2. `mv CPP-gen-addl.h CPP-gen.cc`
3. `touch CPP-gen-addl.h`
(Needed because `CPP-gen.cc`
expects the file to exist, again in support of more complicated features.)
4. `ninja` or `make` to recompile Zeek
5. `./src/zeek -O use-C++ target.zeek`
`CPP-gen.cc`. The compiler will also produce
a file `CPP-hashes.dat`, for use by an advanced feature, and an
empty `CPP-gen-addl.h` file (same).
2. `ninja` or `make` to recompile Zeek
3. `./src/zeek -O use-C++ target.zeek`
Executes with each function/hook/
event handler pulled in by `target.zeek` replaced with its compiled version.
Instead of the last line above, you can use the following variants:
5. `./src/zeek -O report-C++ target.zeek`
3. `./src/zeek -O report-C++ target.zeek`
For each function body in
`target.zeek`, reports which ones have compiled-to-C++ bodies available,
and also any compiled-to-C++ bodies present in the `zeek` binary that
@ -91,15 +94,21 @@ the `target.zeek` script. You can avoid this by replacing the first step with:
1. `./src/zeek -O gen-standalone-C++ target.zeek >target-stand-in.zeek`
and then continuing the next three steps. This option prints to _stdout_ a
(and then building as in the 2nd step above).
This option prints to _stdout_ a
(very short) "stand-in" Zeek script that you can load using
`-O use-C++ target-stand-in.zeek` to activate the compiled `target.zeek`
without needing to include `target.zeek` in the invocation.
`target-stand-in.zeek` to activate the compiled `target.zeek`
without needing to include `target.zeek` in the invocation (nor
the `-O use-C++` option). After loading the stand-in script,
you can still access types and functions declared in `target.zeek`.
Note: the implementation differences between `gen-C++` and `gen-standalone-C++`
wound up being modest enough that it might make sense to just always provide
the latter functionality, which it turns out does not introduce any
additional constraints compared to the current `gen-C++` functionality.
On the other hand, it's possible (not yet established) that code created
using `gen-C++` can be made to compile significantly faster than
standalone code.
There are additional workflows relating to running the test suite, which
we document only briefly here as they're likely going to change or go away
@ -128,7 +137,7 @@ Both of these _append_ to any existing `CPP-gen-addl.h` file, providing
a means for building it up to reflect a number of compilations.
The `update-C++` and `add-C++` options help support different
ways of building the `btest` test suie. They were meant to enable doing so
ways of building the `btest` test suite. They were meant to enable doing so
without requiring per-test-suite-element recompilations. However, experiences
to date have found that trying to avoid pointwise compilations incurs
additional headaches, so it's better to just bite off the cost of a large
@ -174,11 +183,6 @@ Known Issues
Here we list various known issues with using the compiler:
<br>
* Run-time error messages generally lack location information and information
about associated expressions/statements, making them hard to puzzle out.
This could be fixed, but would add execution overhead in passing around
the necessary strings / `Location` objects.
* Compilation of compiled code can be noticeably slow (if built using
`./configure --enable-debug`) or hugely slow (if not), with the latter
taking on the order of an hour on a beefy laptop. This slowness complicates
@ -186,6 +190,11 @@ CI/CD approaches for always running compiled code against the test suite
when merging changes. It's not presently clear how feasible it is to
speed this up.
* Run-time error messages generally lack location information and information
about associated expressions/statements, making them hard to puzzle out.
This could be fixed, but would add execution overhead in passing around
the necessary strings / `Location` objects.
* Subtle bugs can arise when compiling code that uses `@if` conditional
compilation. The compiled code will not directly use the wrong instance
of a script body (one that differs due to the `@if` conditional having a