When a node restarts or a peering between two nodes starts over for other
reasons, the internal tracking in the Broker manager resets its state (since
it's per-peering), and thus the message overflow counter. The script layer was
unaware of this, and threw errors when trying to reset the corresponding counter
metric down to zero at sync time.
We now track past buffer overflows via a separate epoch table, using Broker peer
ID comparisons to identify new peerings, and set the counter to the sum of past
and current overflows.
I considered just making this a gauge, but it seems more helpful to be able to
look at a counter to see whether any messages have ever been dropped over the
lifetime of the node process.
As an aside, this now also avoids repeatedly creating the labels vector,
re-using the same one for each metric.
Thanks to @pbcullen for identifying this one!
This avoids creating pointless connection reattempts to ephemeral TCP
client-side ports, which have been cluttering up the Broker logs since 7.1.
(cherry picked from commit 549e678dff)
This provides ways to figure out for a given peer, or a given address/port pair,
whether the local node originally established the peering.
(cherry picked from commit b430d5235c)
The former defaults (30sec, 1min) can slow down cluster startup and recovery
considerably, and other systems have more aggressive intervals still.
(cherry picked from commit 68fadd0464)
At every site where we've dug into backpressure disconnect findings, it has been
the case that the default values were too small. 8192, so 4x the old default,
suffices at every site to drown out premature disconnects.
With metrics now available for the send buffers regardless of backpressure
overflow policy, this also switches the default from "disconnect" to
"drop_oldest" (for both peers and websockets), meaning that peerings remain
untouched but the oldest queued message simply gets dropped when a new message
is enqueued. With this policy, the number of backpressure overflows is then
simply the count of discarded messages, something that users can tune to see
drop to zero in everyday use. Another benefit is that marginal overflows cause
less message loss than when an entire buffer's worth (plus potentially more
in-flight messages) gets thrown out with a disconnect.
(cherry picked from commit 841a40ff88)
This hooks into Telemetry::sync() to update Broker-level metrics tracking the
peerings' send buffer state. We do this in the cluster framework so we can label
the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.
(cherry picked from commit 88a0cda8ca)
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.
This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.
This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.
To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
(cherry picked from commit f5fbad23ff)
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure disconnects.
The new zeek_broker_backpressure_disconnects counter, labeled by the neighboring
peer that the reporting node has determined to be unresponsive, counts the
number of unpeerings for this reason.
Here the node "worker" has observed node "proxy" falling behind once:
# HELP zeek_broker_backpressure_disconnects_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_disconnects_total counter
zeek_broker_backpressure_disconnects_total{endpoint="worker",peer="proxy"} 1
Includes small btest baseline update to reflect @load of a new script.
(cherry picked from commit ead6134501)
This translates backend-specific node identifiers (like Broker IDs) to
cluster nodes and their names, if available.
(cherry picked from commit 46a11ec37d)
This adds re-peering at the Broker level for peers that Broker decided to
unpeer. We keep this at the Broker level since this behavior is specific to
it (as opposed to other cluster backends).
Includes baseline updates for btests that pick up on the new script's @load.
(cherry picked from commit 0010e65f6d)
This moves the Telemetry framework's BIF-defined functionalit from the
secondary-BIFs stage to the primary one. That is, this functionality is now
available from the end of init-bare.zeek, not only after the end of
init-frameworks-and-bifs.zeek.
This allows us to use script-layer telemetry in our Zeek's own code that get
pulled in during init-frameworks-and-bifs.
This change splits up the BIF features into functions, constants, and types,
because that's the granularity most workable in Func.cc and NetVar. It also now
defines the Telemetry::MetricsType enum once, not redundantly in BIFs and script
layer.
Due to subtle load ordering issues between the telemetry and cluster frameworks
this pushes the redef stage of Telemetry::metrics_port and address into
base/frameworks/telemetry/options.zeek, which is loaded sufficiently late in
init-frameworks-and-bifs.zeek to sidestep those issues. (When not doing this,
the effect is that the redef in telemetry/main.zeek doesn't yet find the
cluster-provided values, and Zeek does not end up listening on these ports.)
The need to add basic Zeek headers in script_opt/ZAM/ZBody.cc as a side-effect
of this is curious, but looks harmless.
Also includes baseline updates for the usual btests and adds a few doc strings.
(cherry picked from commit 71f7e89974)
With Cluster::Node$metrics_port being optional, there's not really
a need for the extra script. New rule, if a metrics_port is set, the
node will attempt to listen on it.
Users can still redef Telemetry::metrics_port *after*
base/frameworks/telemetry was loaded to change the port defined
in cluster-layout.zeek.
(cherry picked from commit bf9704f339)
This eliminates one place in which we currently need to mirror changes to the
script-land Cluster::Node record. Instead of keeping an exact in-core equivalent, the
Supervisor now treats the data structure as opaque, and stores the whole cluster
table as a JSON string.
We may replace the script-layer Supervisor::ClusterEndpoint in the future, using
Cluster::Node directly. But that's a more invasive change that will affect how
people invoke Supervisor::create() and similars.
Relying on JSON for serialization has the side-effect of removing the
Supervisor's earlier quirk of using 0/tcp, not 0/unknown, to indicate unused
ports in the Supervisor::ClusterEndpoint record.
If the script layer is able to access the current node's config via
Supervisor::node(), it can handle populating Cluster::nodes. That code
is much more straightforward than an equivalent in-core implementation
(especially with the upcoming change to the cluster table's implementation).
This introduces base/frameworks/cluster/supervisor.zeek and
Cluster::Supervisor::__init_cluster_nodes() for that purpose.
The @load of the Supervisor API in cluster/main.zeek isn't technically
necessary since we already load it explicitly even in init-bare.zeek,
but being explicit seems better.
The previous "fix" caused significant performance degradation without
the signature ever having a chance to trigger. Moving it to policy
seems the best compromise, the alternative being outright removing it.
* origin/topic/johanna/netcontrol-updates:
Netcontrol: add rule_added_policy
Netcontrol: more logging in catch-and-release
Netcontrol: allow supplying explicit name to Debug plugin
This requires pool creation to spell out a spec explicitly, which the only code
using these types already does. There's no reason for pools to automatically
refer to proxies.
rule_added_policy allows the modification of rules just after they have
been added. This allows the implementation of some more complex features
- like changing rule states depending on insertion in other plugins.
Catch-and-release logs now include the plugin that is responsible for an
action. Furthermore, the catch-and-release log also includes instances
where a rule already existed, and where an error occurred during an
operation.
This change extends the arguments of NetControl::create_debug, and
allows the specification of an optional name argument, which can be used
instead of the default-generated name.
This is helpful when one wants to attach several plugins to verify
behavior in those cases.
This introduces a new hook into the Intel::seen() function that allows
users to directly interact with the result of a find() call via external
scripts.
This should solve the use-case brought up by @chrisanag1985 in
discussion #3256: Recording and acting on "no intel match found".
@Canon88 was recently asking on Slack about enabling HTTP logging for a
given connection only when an Intel match occurred and found that the
Intel::match() event would only occur on the manager. The
Intel::match_remote() event might be a workaround, but possibly running a
bit too late and also it's just an internal "detail" event that might not
be stable.
Another internal use case revolved around enabling packet recording
based on Intel matches which necessarily needs to happen on the worker
where the match happened. The proposed workaround is similar to the above
using Intel::match_remote().
This hook also provides an opportunity to rate-limit heavy hitter intel
items locally on the worker nodes, or even replacing the event approach
currently used with a customized approach.