This eliminates one place in which we currently need to mirror changes to the
script-land Cluster::Node record. Instead of keeping an exact in-core equivalent, the
Supervisor now treats the data structure as opaque, and stores the whole cluster
table as a JSON string.
We may replace the script-layer Supervisor::ClusterEndpoint in the future, using
Cluster::Node directly. But that's a more invasive change that will affect how
people invoke Supervisor::create() and similars.
Relying on JSON for serialization has the side-effect of removing the
Supervisor's earlier quirk of using 0/tcp, not 0/unknown, to indicate unused
ports in the Supervisor::ClusterEndpoint record.
If the script layer is able to access the current node's config via
Supervisor::node(), it can handle populating Cluster::nodes. That code
is much more straightforward than an equivalent in-core implementation
(especially with the upcoming change to the cluster table's implementation).
This introduces base/frameworks/cluster/supervisor.zeek and
Cluster::Supervisor::__init_cluster_nodes() for that purpose.
The @load of the Supervisor API in cluster/main.zeek isn't technically
necessary since we already load it explicitly even in init-bare.zeek,
but being explicit seems better.
* topic/christian/json-improvements:
Update NEWS file to cover JSON enhancements
Support JSON roundtripping via to_json()/from_json() for patterns
Support table deserialization in from_json()
Support map-based definition of ports in from_json()
Document the field_escape_pattern in the to_json() BiF
This needed a small tweak in the deserialization, since each roundtrip
would otherwise pad the prior pattern with an extra /^?(...)$?/.
This expands the language.set test to also verify serializing/unserializing for
sets, similarly to tables in the previous commit.
This allows additional data roundtripping through JSON since to_json() already
supports tables. There are some subtleties around the formatting of strings in
JSON object keys, for which this adds a bit of helper infrastructure.
This also expands the language.table test to verify the roundtrips, and adapts
bif.from_json to include a table in the test record.
The from_json() BiF and its underlying code in Val.cc currently expect ports
expressed as a string ('80/tcp' etc). Zeek's own serialization via ToJSON()
renders them as an object ('{"port":80, "proto":"tcp"}'). This adds support
for the latter format to from_json(), so serialized values can be read back.
* origin/topic/awelzel/3682-bad-pipe-op-3:
threading/Manager: Warn if threads are added after termination
iosource/Manager: Reap dry sources while computing timeout
threading/MsgThread: Decouple IO source and thread lifetimes
iosource/Manager: Do not manage lifetime of pkt_src
iosource/Manager: Honor manage_lifetime and dont_count for short-lived IO sources
The core.file-analyzer-violation test showed that it's possible to
create new threads (log writers) when Zeek is in the process of
terminating. This can result in the IO manager's deconstructor
deleting IO sources for threads that are still running.
This is sort of a scripting issue, so for now log a reporter warning
when it happens to have a bit of a bread-crumb what might be
going on. In the future it might make sense to plug APIs with
zeek_is_terminating().
MsgThread acting as an IO source can result in the situation where the
threading manager's heartbeat timer deletes a finished MsgThread instance,
but at the same time this thread is in the list of ready IO sources the
main loop is currently processing.
Fix this by decoupling the lifetime of the IO source part and properly
registering as lifetime managed IO sources with the IO manager.
Fixes#3682
Now that dry sources are properly reaped and freed, an offline packet
source would be deleted once dry, resulting in GetPktSrc() returning
a wild pointer. Don't manage the packet source lifetime and instead
free it during Manager destruction.
If an IO source is registered and becomes dry at runtime, the IO
manager would not honor its manage_lifetime or dont_count attribute
during collection, resulting in memory leaks.
This probably hasn't mattered so far as there's no IO sources registered
in-tree at runtime using manage_lifetime=true.
This is a fixup for 0cd023b839 which
currently causes ASAN coverage builds to fail for non-master branches
when due to a missing COVERALLS_REPO_TOKEN.
Instead of bailing out for non-master branches, pass `--dry-run` to the
coveralls-lcov invocation to test more of the script.
* origin/topic/robin/gh-3521-zeek-val:
Bump Spicy and documentation submodules.
Spicy: Provide runtime API to access Zeek-side globals.
Spicy: Reformat `zeek.spicy` with `spicy-format`.
Spicy: Extend exception hierarchy.
This allows to read Zeek global variables from inside Spicy code. The
main challenge here is supporting all of Zeek's data type in a
type-safe manner.
The most straight-forward API is a set of functions
`get_<type>(<id>)`, where `<type>` is the Zeek-side type
name (e.g., `count`, `string`, `bool`) and `<id>` is the fully scoped
name of the Zeek-side global (e.g., `MyModule::Boolean`). These
functions then return the corresponding Zeek value, converted in an
appropriate Spicy type. Example:
Zeek:
module Foo;
const x: count = 42;
const y: string = "xxx";
Spicy:
import zeek;
assert zeek::get_count("Foo::x") == 42;
assert zeek::get_string("Foo::y") == b"xxx"; # returns bytes(!)
For container types, the `get_*` function returns an opaque types that
can be used to access the containers' values. An additional set of
functions `as_<type>` allows converting opaque values of atomic
types to Spicy equivalents. Example:
Zeek:
module Foo;
const s: set[count] = { 1, 2 };
const t: table[count] of string = { [1] = "One", [2] = "Two" }
Spicy:
# Check set membership.
local set_ = zeek::get_set("Foo::s");
assert zeek::set_contains(set_, 1) == True
# Look up table element.
local table_ = zeek::get_table("Foo::t");
local value = zeek::table_lookup(t, 1);
assert zeek::as_string(value) == b"One"
There are also functions for accessing elements of Zeek-side vectors
and records.
If any of these `zeek::*` conversion functions fails (e.g., due to a
global of that name not existing), it will throw an exception.
Design considerations:
- We support only reading Zeek variables, not writing. This is
both to simplify the API, and also conceptually to avoid
offering backdoors into Zeek state that could end up with a very
tight coupling of Spicy and Zeek code.
- We accept that a single access might be relatively slow due to
name lookup and data conversion. This is primarily meant for
configuration-style data, not for transferring lots of dynamic
state over.
- In that spirit, we don't support deep-copying complex data types
from Zeek over to Spicy. This is (1) to avoid performance
problems when accidentally copying large containers over,
potentially even at every access; and (2) to avoid the two sides
getting out of sync if one ends up modifying a container without
the other being able to see it.
This reverts part of commit a0888b7e36 due
to inhibiting analyzer violations when parsing non SSH traffic when
the &restofdata path is entered.
@J-Gras reported the analyzer not being disabled when sending HTTP
traffic on port 22.
This adds the verbose analyzer.log baselines such that future improvements
of these scenarios become visible.
We move the current `TypeMismatch` into a new `ParameterMismatch`
exception that's derived from a more general `TypeMismatch` now that
can also be used for other, non-parameter mismatches.
* origin/topic/christian/ci-updates:
CMakeLists: Disable -Werror for 3rdparty/sqlite3.c
Bump zeek-3rdparty to pull in sqlite move to 3.46
CI: drop Fedora 38, add 40
We package vanilla sqlite from upstream and on Fedora 40 with sqlite 3.46
there's the following compiler warning:
In function 'sqlite3Strlen30',
inlined from 'sqlite3ColumnSetColl' at
../../src/3rdparty/sqlite3.c:122105:10:
../../src/3rdparty/sqlite3.c:35003:28: error: 'strlen' reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
35003 | return 0x3fffffff & (int)strlen(z);
| ^~~~~~~~~
In function 'sqlite3ColumnSetColl':
Disabling -Werror on sqlite3.c seems sensible given we have little
control over that code.
We now reject EVT files that attempt to replace the same built-in
analyzer multiple times as doing so would be ill-defined and not very
intuitive in what exactly it means.
Closes#3783.
When CCACHE_BASEDIR is set, ccache will rewrite absolute paths to
relative paths in order to allow compilation in different source
directories. We do not need this feature on Cirrus (the checkout
is always in /zeek) and using absolute paths avoids
confusion/normalization needs for the gcov -p results.
We could consider removing the global CCACHE_BASEDIR, but it'd
bust the ccache of every other task, too.