Mostly due to spending too much time wondering why nodes didn't connect
when there was a mismatch between "manager" and "manager-1" in the
cluster layout. Remove manager from test-all-policy-cluster test to
avoid connection attempts in this test.
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure disconnects.
The new zeek_broker_backpressure_disconnects counter, labeled by the neighboring
peer that the reporting node has determined to be unresponsive, counts the
number of unpeerings for this reason.
Here the node "worker" has observed node "proxy" falling behind once:
# HELP zeek_broker_backpressure_disconnects_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_disconnects_total counter
zeek_broker_backpressure_disconnects_total{endpoint="worker",peer="proxy"} 1
Includes small btest baseline update to reflect @load of a new script.
This eliminates one place in which we currently need to mirror changes to the
script-land Cluster::Node record. Instead of keeping an exact in-core equivalent, the
Supervisor now treats the data structure as opaque, and stores the whole cluster
table as a JSON string.
We may replace the script-layer Supervisor::ClusterEndpoint in the future, using
Cluster::Node directly. But that's a more invasive change that will affect how
people invoke Supervisor::create() and similars.
Relying on JSON for serialization has the side-effect of removing the
Supervisor's earlier quirk of using 0/tcp, not 0/unknown, to indicate unused
ports in the Supervisor::ClusterEndpoint record.
If the script layer is able to access the current node's config via
Supervisor::node(), it can handle populating Cluster::nodes. That code
is much more straightforward than an equivalent in-core implementation
(especially with the upcoming change to the cluster table's implementation).
This introduces base/frameworks/cluster/supervisor.zeek and
Cluster::Supervisor::__init_cluster_nodes() for that purpose.
The @load of the Supervisor API in cluster/main.zeek isn't technically
necessary since we already load it explicitly even in init-bare.zeek,
but being explicit seems better.
This requires pool creation to spell out a spec explicitly, which the only code
using these types already does. There's no reason for pools to automatically
refer to proxies.
This field isn't required by a worker and it's certainly not used by a
worker to listen on that specific interface. It also isn't required to
be set consistently and its use in-tree limited to the old load-balancing
script.
There's a bif called packet_source() which on a worker will provide
information about the actually used packet source.
Relates to zeek/zeek#2877.
* topic/awelzel/3112-log-suffix-left-over-log-rotation:
cluster/logger: Fix leftover-log-rotation in multi-logger setups
cluster/logger: Fix global var reference
Populating log_metadata during zeek_init() is too late for the
leftover-log-rotation functionality, so do it at script parse time.
Also, prepend archiver_ to the log_metadata table and encoding function
due to being in the global namespace and to align with the
archiver_rotation_format_func. This hasn't been in a released
version yet, so fine to rename still.
Closes#3112
These have been discussed in the context of "@if &analyze" [1] and
am much in favor for not disabling/removing ~100 lines (more than
fits on a single terminal) out from the middle of a file. There's no
performance impact for having these handlers enabled unconditionally.
Also, any future work on "@if &analyze" will look at them again which
we could also skip.
This also reverts back to the behavior where the Cluster::LOG stream
is created even in non cluster setups like in previous Zeek versions.
As long as no one writes to it there's essentially no difference. If
someone does write to Cluster::LOG, I'd argue not black holing these
messages is better. Schema generators using Log::active_streams will
continue to discover Cluster::LOG even if they run in non-cluster
mode.
https://github.com/zeek/zeek/pull/3062#discussion_r1200498905
* jgras/topic/jgras/cluster-active-node-count-fix:
Fix get_active_node_count for node types not present.
Changed over to explicit existence check instead to avoid the set()
creation upon missed lookups.
When multiple loggers are configured in a Supervisor controlled cluster
configuration, encode extra information into the rotated filename to
identify which logger produced the log.
This is similar to the approach taken for ZeekControl, re-using the
log_suffix terminology, but as there's only a single zeek-archiver
process and no postprocessors and no other side-channel for additional
information, we encode extra metadata into the filename. zeek-archiver
is extended to recognize the special metadata part of the filename.
This also solves the issue that multiple loggers in a supervisor setup
overwrite each others log files within a single log-queue directory.
This set contains the topics to reach all cluster nodes. Due to broker's
forwarding mechanism, we cannot define a single broadcast topic, as it
would create routing loops.
- Restored a deprecated version of 'supervisor_rotation_format_func'
during merge.
* origin/topic/vlad/expose_supervisor_rotation_func:
Rename supervisor_rotation_format_func to archiver_rotation_format_func, and expose it for non-supervised setups
(Adding a NEWS entry.)
* origin/topic/christian/364-logfilter-hooks:
Update testing/btest/scripts/base/frameworks/logging/hooks.zeek
Btests for log filter policy hooks
Btest baseline updates to reflect new logging policy hooks
Migrate existing use of filter predicates to policy hooks
Support for log filter policy hooks