* topic/christian/telemetry-make-bifs-primary:
Telemetry framework: move BIFs to the primary-bif stage
Minor comment tweaks for init-frameworks-and-bifs.zeek
This stops invoking Telemetry::sync() via a scheduled event and instead
only invokes it on-demand. This makes metric collection network time
independent and lazier, too.
With Prometheus scrape requests being processed on Zeek's main thread
now, we can safely invoke the script layer Telemetry::sync() hook.
Closes#3947
This moves the Telemetry framework's BIF-defined functionalit from the
secondary-BIFs stage to the primary one. That is, this functionality is now
available from the end of init-bare.zeek, not only after the end of
init-frameworks-and-bifs.zeek.
This allows us to use script-layer telemetry in our Zeek's own code that get
pulled in during init-frameworks-and-bifs.
This change splits up the BIF features into functions, constants, and types,
because that's the granularity most workable in Func.cc and NetVar. It also now
defines the Telemetry::MetricsType enum once, not redundantly in BIFs and script
layer.
Due to subtle load ordering issues between the telemetry and cluster frameworks
this pushes the redef stage of Telemetry::metrics_port and address into
base/frameworks/telemetry/options.zeek, which is loaded sufficiently late in
init-frameworks-and-bifs.zeek to sidestep those issues. (When not doing this,
the effect is that the redef in telemetry/main.zeek doesn't yet find the
cluster-provided values, and Zeek does not end up listening on these ports.)
The need to add basic Zeek headers in script_opt/ZAM/ZBody.cc as a side-effect
of this is curious, but looks harmless.
Also includes baseline updates for the usual btests and adds a few doc strings.
This avoids the earlier problem of not tracking ports correctly in
scriptland, while still supporting `port` in EVT files and `%port` in
Spicy files.
As it turns out we are already following the same approach for file
analyzers' MIME types, so I'm applying the same pattern: it's one
event per port, without further customization points. That leaves the
patch pretty small after all while fixing the original issue.
With Cluster::Node$metrics_port being optional, there's not really
a need for the extra script. New rule, if a metrics_port is set, the
node will attempt to listen on it.
Users can still redef Telemetry::metrics_port *after*
base/frameworks/telemetry was loaded to change the port defined
in cluster-layout.zeek.
This eliminates one place in which we currently need to mirror changes to the
script-land Cluster::Node record. Instead of keeping an exact in-core equivalent, the
Supervisor now treats the data structure as opaque, and stores the whole cluster
table as a JSON string.
We may replace the script-layer Supervisor::ClusterEndpoint in the future, using
Cluster::Node directly. But that's a more invasive change that will affect how
people invoke Supervisor::create() and similars.
Relying on JSON for serialization has the side-effect of removing the
Supervisor's earlier quirk of using 0/tcp, not 0/unknown, to indicate unused
ports in the Supervisor::ClusterEndpoint record.
If the script layer is able to access the current node's config via
Supervisor::node(), it can handle populating Cluster::nodes. That code
is much more straightforward than an equivalent in-core implementation
(especially with the upcoming change to the cluster table's implementation).
This introduces base/frameworks/cluster/supervisor.zeek and
Cluster::Supervisor::__init_cluster_nodes() for that purpose.
The @load of the Supervisor API in cluster/main.zeek isn't technically
necessary since we already load it explicitly even in init-bare.zeek,
but being explicit seems better.
The previous "fix" caused significant performance degradation without
the signature ever having a chance to trigger. Moving it to policy
seems the best compromise, the alternative being outright removing it.
* origin/topic/johanna/netcontrol-updates:
Netcontrol: add rule_added_policy
Netcontrol: more logging in catch-and-release
Netcontrol: allow supplying explicit name to Debug plugin
This requires pool creation to spell out a spec explicitly, which the only code
using these types already does. There's no reason for pools to automatically
refer to proxies.
rule_added_policy allows the modification of rules just after they have
been added. This allows the implementation of some more complex features
- like changing rule states depending on insertion in other plugins.
Catch-and-release logs now include the plugin that is responsible for an
action. Furthermore, the catch-and-release log also includes instances
where a rule already existed, and where an error occurred during an
operation.
This change extends the arguments of NetControl::create_debug, and
allows the specification of an optional name argument, which can be used
instead of the default-generated name.
This is helpful when one wants to attach several plugins to verify
behavior in those cases.
This introduces a new hook into the Intel::seen() function that allows
users to directly interact with the result of a find() call via external
scripts.
This should solve the use-case brought up by @chrisanag1985 in
discussion #3256: Recording and acting on "no intel match found".
@Canon88 was recently asking on Slack about enabling HTTP logging for a
given connection only when an Intel match occurred and found that the
Intel::match() event would only occur on the manager. The
Intel::match_remote() event might be a workaround, but possibly running a
bit too late and also it's just an internal "detail" event that might not
be stable.
Another internal use case revolved around enabling packet recording
based on Intel matches which necessarily needs to happen on the worker
where the match happened. The proposed workaround is similar to the above
using Intel::match_remote().
This hook also provides an opportunity to rate-limit heavy hitter intel
items locally on the worker nodes, or even replacing the event approach
currently used with a customized approach.
With a bit of tweaking in the JavaScript plugin to support opaque types, this
will allow the delay functionality to work there, too.
Making the LogDelayToken an actual opaque seems reasonable, too. It's not
supposed to be user inspected.
This is a verbose, opinionated and fairly restrictive version of the log delay idea.
Main drivers are explicitly, foot-gun-avoidance and implementation simplicity.
Calling the new Log::delay() function is only allowed within the execution
of a Log::log_stream_policy() hook for the currently active log write.
Conceptually, the delay is placed between the execution of the global stream
policy hook and the individual filter policy hooks. A post delay callback
can be registered with every Log::delay() invocation. Post delay callbacks
can (1) modify a log record as they see fit, (2) veto the forwarding of the
log record to the log filters and (3) extend the delay duration by calling
Log::delay() again. The last point allows to delay a record by an indefinite
amount of time, rather than a fixed maximum amount. This should be rare and
is therefore explicit.
Log::delay() increases an internal reference count and returns an opaque
token value to be passed to Log::delay_finish() to release a delay reference.
Once all references are released, the record is forwarded to all filters
attached to a stream when the delay completes.
This functionality separates Log::log_stream_policy() and individual filter
policy hooks. One consequence is that a common use-case of filter policy hooks,
removing unproductive log records, may run after a record was delayed. Users
can lift their filtering logic to the stream level (or replicate the condition
before the delay decision). The main motivation here is that deciding on a
stream-level delay in per-filter hooks is too late. Attaching multiple filters
to a stream can additionally result in hard to understand behavior.
On the flip side, filter policy hooks are guaranteed to run after the delay
and can be used for further mangling or filtering of a delayed record.
There was some confusion around which value was used subsequent to a strip(),
but sub not respecting anchors make it appear to work. Also seems that the
`\(?` part seems redundant.