This provides ways to figure out for a given peer, or a given address/port pair,
whether the local node originally established the peering.
(cherry picked from commit b430d5235c)
The former defaults (30sec, 1min) can slow down cluster startup and recovery
considerably, and other systems have more aggressive intervals still.
(cherry picked from commit 68fadd0464)
This differs in the upstream version in that it explicitly invokes
Telemetry::sync(), since 7.0.x doesn't have the on-demand invocation
of the hook at scrape & collection time.
(cherry picked from commit 35ab9d5c80)
This hooks into Telemetry::sync() to update Broker-level metrics tracking the
peerings' send buffer state. We do this in the cluster framework so we can label
the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.
(cherry picked from commit 88a0cda8ca)
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.
This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.
This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.
To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
(cherry picked from commit f5fbad23ff)
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure disconnects.
The new zeek_broker_backpressure_disconnects counter, labeled by the neighboring
peer that the reporting node has determined to be unresponsive, counts the
number of unpeerings for this reason.
Here the node "worker" has observed node "proxy" falling behind once:
# HELP zeek_broker_backpressure_disconnects_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_disconnects_total counter
zeek_broker_backpressure_disconnects_total{endpoint="worker",peer="proxy"} 1
Includes small btest baseline update to reflect @load of a new script.
(cherry picked from commit ead6134501)
This module is loaded by the telemetry framework, which we're now loading via
the cluster framework, i.e. also in bare mode. The resulting additional
thread (for creating reporter.log) trips up a number of btest baselines.
version.zeek doesn't use any of the string helper functions.
(cherry picked from commit d260a5b7a9)
This adds re-peering at the Broker level for peers that Broker decided to
unpeer. We keep this at the Broker level since this behavior is specific to
it (as opposed to other cluster backends).
Includes baseline updates for btests that pick up on the new script's @load.
(cherry picked from commit 0010e65f6d)
This moves the Telemetry framework's BIF-defined functionalit from the
secondary-BIFs stage to the primary one. That is, this functionality is now
available from the end of init-bare.zeek, not only after the end of
init-frameworks-and-bifs.zeek.
This allows us to use script-layer telemetry in our Zeek's own code that get
pulled in during init-frameworks-and-bifs.
This change splits up the BIF features into functions, constants, and types,
because that's the granularity most workable in Func.cc and NetVar. It also now
defines the Telemetry::MetricsType enum once, not redundantly in BIFs and script
layer.
Due to subtle load ordering issues between the telemetry and cluster frameworks
this pushes the redef stage of Telemetry::metrics_port and address into
base/frameworks/telemetry/options.zeek, which is loaded sufficiently late in
init-frameworks-and-bifs.zeek to sidestep those issues. (When not doing this,
the effect is that the redef in telemetry/main.zeek doesn't yet find the
cluster-provided values, and Zeek does not end up listening on these ports.)
The need to add basic Zeek headers in script_opt/ZAM/ZBody.cc as a side-effect
of this is curious, but looks harmless.
Also includes baseline updates for the usual btests and adds a few doc strings.
(cherry picked from commit 71f7e89974)
* origin/topic/awelzel/4198-4201-quic-maintenance:
QUIC/decrypt_crypto: Rename all_data to data
QUIC: Confirm before forwarding data to SSL
QUIC: Parse all QUIC packets in a UDP datagram
QUIC: Only slurp till packet end, not till &eod
(cherry picked from commit 44304973fb)
* origin/topic/vern/ZAM-field-assign-in-op:
pre-commit: Bump spicy-format to 0.23
fix for ZAM optimization of assigning a record field to result of "in" operation
(cherry picked from commit 991bc9644d)
With Cluster::Node$metrics_port being optional, there's not really
a need for the extra script. New rule, if a metrics_port is set, the
node will attempt to listen on it.
Users can still redef Telemetry::metrics_port *after*
base/frameworks/telemetry was loaded to change the port defined
in cluster-layout.zeek.
(cherry picked from commit bf9704f339)
* topic/christian/broker-prometheus-cpp:
Update the scripts.base.frameworks.telemetry.internal-metrics test
Revert "Temporarily disable the scripts/base/frameworks/telemetry/internal-metrics btest"
Bump Broker to pull in new Prometheus support and pass in Zeek's registry
This now uses different record fields, and for now we no longer have CAF
telemetry. We indicate we're running under test to get reliable ordering in the
baselined output.
This is a rework of b59bed9d06 moving
HILTI_JIT_PARALLELISM=1 into btest.cfg to make it default applicable to
btest -j users (and CI).
The background for this change is that spicyz may spawn up to nproc compiler
instances by default. Combined with btest -j, this may be nproc x nproc
instances worst case. Particularly with gcc, this easily overloads CI or
local systems, putting them into hard-to-recover-from thrashing/OOM states.
Exporting HILTI_JIT_PARALLELISM in the shell allows overriding.
When Zeek flips roles of a HTTP connection subsequent to the HTTP analyzer
being attached, that analyzer would not update its own ContentLine analyzer
state, resulting in the wrong ContentLine analyzer being switched into
plain delivery mode.
In debug builds, this would result in assertion failures, in production
builds, the HTTP analyzer would receive HTTP bodies as individual header
lines, or conversely, individual header lines would be delivered as a
large chunk from the ContentLine analyzer.
PCAPs were generated locally using tcprewrite to select well-known-http ports
for both endpoints, then editcap to drop the first SYN packet.
Kudos to @JordanBarnartt for keeping at it.
Closes#3789