Commit graph

125 commits

Author SHA1 Message Date
Christian Kreibich
ace5c11048 Bugfix: accurately track Broker buffer overflows w/ multiple peerings
When a node restarts or a peering between two nodes starts over for other
reasons, the internal tracking in the Broker manager resets its state (since
it's per-peering), and thus the message overflow counter. The script layer was
unaware of this, and threw errors when trying to reset the corresponding counter
metric down to zero at sync time.

We now track past buffer overflows via a separate epoch table, using Broker peer
ID comparisons to identify new peerings, and set the counter to the sum of past
and current overflows.

I considered just making this a gauge, but it seems more helpful to be able to
look at a counter to see whether any messages have ever been dropped over the
lifetime of the node process.

As an aside, this now also avoids repeatedly creating the labels vector,
re-using the same one for each metric.

Thanks to @pbcullen for identifying this one!
2025-05-07 17:30:45 -07:00
Christian Kreibich
458b887df1 Lower listen/connect retry intervals in Broker and the cluster framework to 1sec
The former defaults (30sec, 1min) can slow down cluster startup and recovery
considerably, and other systems have more aggressive intervals still.

(cherry picked from commit 68fadd0464)
2025-04-29 16:47:13 -07:00
Christian Kreibich
8b9b16d7a8 Add cluster framework telemetry for Broker's send-buffer use
This hooks into Telemetry::sync() to update Broker-level metrics tracking the
peerings' send buffer state. We do this in the cluster framework so we can label
the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.

(cherry picked from commit 88a0cda8ca)
2025-04-29 15:19:38 -07:00
Christian Kreibich
90ecf7ff0d Add backpressure disconnect notification to cluster.log and via telemetry
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure disconnects.

The new zeek_broker_backpressure_disconnects counter, labeled by the neighboring
peer that the reporting node has determined to be unresponsive, counts the
number of unpeerings for this reason.

Here the node "worker" has observed node "proxy" falling behind once:

# HELP zeek_broker_backpressure_disconnects_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_disconnects_total counter
zeek_broker_backpressure_disconnects_total{endpoint="worker",peer="proxy"} 1

Includes small btest baseline update to reflect @load of a new script.

(cherry picked from commit ead6134501)
2025-04-08 15:09:44 -07:00
Christian Kreibich
06fa47e21d Add Cluster::nodeid_to_node() helper function
This translates backend-specific node identifiers (like Broker IDs) to
cluster nodes and their names, if available.

(cherry picked from commit 46a11ec37d)
2025-04-08 15:09:44 -07:00
Christian Kreibich
11701d4734 No need to namespace Cluster:: functions in their own namespace
(cherry picked from e81856a4af)
2025-04-08 14:50:50 -07:00
Christian Kreibich
fa6361af56 Management framework: propagate metrics port from agent
This propagates the metrics port from the node config passed through the
supervisor all the way into the script layer.
2024-07-08 23:05:24 -07:00
Christian Kreibich
737b1a2013 Remove the Supervisor's internal ClusterEndpoint struct.
This eliminates one place in which we currently need to mirror changes to the
script-land Cluster::Node record. Instead of keeping an exact in-core equivalent, the
Supervisor now treats the data structure as opaque, and stores the whole cluster
table as a JSON string.

We may replace the script-layer Supervisor::ClusterEndpoint in the future, using
Cluster::Node directly. But that's a more invasive change that will affect how
people invoke Supervisor::create() and similars.

Relying on JSON for serialization has the side-effect of removing the
Supervisor's earlier quirk of using 0/tcp, not 0/unknown, to indicate unused
ports in the Supervisor::ClusterEndpoint record.
2024-07-02 14:52:17 -07:00
Christian Kreibich
a98ec6b08b Provide a script-layer equivalent to Supervisor::__init_cluster().
If the script layer is able to access the current node's config via
Supervisor::node(), it can handle populating Cluster::nodes. That code
is much more straightforward than an equivalent in-core implementation
(especially with the upcoming change to the cluster table's implementation).
This introduces base/frameworks/cluster/supervisor.zeek and
Cluster::Supervisor::__init_cluster_nodes() for that purpose.

The @load of the Supervisor API in cluster/main.zeek isn't technically
necessary since we already load it explicitly even in init-bare.zeek,
but being explicit seems better.
2024-07-02 14:52:13 -07:00
Tim Wojtulewicz
e93e4cc26d Add a services.json endpoint for Prometheus service discovery 2024-05-31 13:30:31 -07:00
Christian Kreibich
873d734c79 Do not default PoolSpec topics to the empty string.
Similar to `node_topic`, we already spell out a topic in the existing use and
there's no obviously meaningful default value.
2024-02-05 18:03:08 -08:00
Christian Kreibich
8437012346 Do not default to proxy nodes in Broker::PoolSpec
This requires pool creation to spell out a spec explicitly, which the only code
using these types already does. There's no reason for pools to automatically
refer to proxies.
2024-02-05 17:51:11 -08:00
Arne Welzel
cd24acdfc8 time machine: Mark leftovers for removal in v7.1
I suspect we could just drop these directly, but lets follow the
deprecation cycle.
2023-11-07 16:06:16 +01:00
Arne Welzel
d88b147ac9 cluster: Deprecate the Cluster::Node$interface field
This field isn't required by a worker and it's certainly not used by a
worker to listen on that specific interface. It also isn't required to
be set consistently and its use in-tree limited to the old load-balancing
script.

There's a bif called packet_source() which on a worker will provide
information about the actually used packet source.

Relates to zeek/zeek#2877.
2023-11-07 16:06:16 +01:00
Tim Wojtulewicz
4229af6820 Remove deprecations tagged for v6.1 2023-06-14 10:07:22 -07:00
Arne Welzel
7a043e5e8f all: Fix typos identified by typos pre-commit hook 2023-06-13 17:57:32 +02:00
Arne Welzel
f53aefdd5b Merge branch 'topic/awelzel/3112-log-suffix-left-over-log-rotation'
* topic/awelzel/3112-log-suffix-left-over-log-rotation:
  cluster/logger: Fix leftover-log-rotation in multi-logger setups
  cluster/logger: Fix global var reference
2023-06-13 17:33:56 +02:00
Arne Welzel
6d1991fb6a cluster/logger: Fix leftover-log-rotation in multi-logger setups
Populating log_metadata during zeek_init() is too late for the
leftover-log-rotation functionality, so do it at script parse time.

Also, prepend archiver_ to the log_metadata table and encoding function
due to being in the global namespace and to align with the
archiver_rotation_format_func. This hasn't been in a released
version yet, so fine to rename still.

Closes #3112
2023-06-13 10:47:20 +02:00
Arne Welzel
27432c457c cluster/logger: Fix global var reference 2023-06-13 10:47:20 +02:00
Arne Welzel
eef7acc1e9 cluster/main: Remove extra @if ( Cluster::is_enabled() )
These have been discussed in the context of "@if &analyze" [1] and
am much in favor for not disabling/removing ~100 lines (more than
fits on a single terminal) out from the middle of a file. There's no
performance impact for having these handlers enabled unconditionally.
Also, any future work on "@if &analyze" will look at them again which
we could also skip.

This also reverts back to the behavior where the Cluster::LOG stream
is created even in non cluster setups like in previous Zeek versions.
As long as no one writes to it there's essentially no difference. If
someone does write to Cluster::LOG, I'd argue not black holing these
messages is better. Schema generators using Log::active_streams will
continue to discover Cluster::LOG even if they run in non-cluster
mode.

https://github.com/zeek/zeek/pull/3062#discussion_r1200498905
2023-06-06 15:20:10 +02:00
Tim Wojtulewicz
5a3abbe364 Revert "Merge remote-tracking branch 'origin/topic/vern/at-if-analyze'"
This reverts commit 4e797ddbbc, reversing
changes made to 3ac28ba5a2.
2023-05-31 09:20:33 +02:00
Vern Paxson
890010915a change base scripts to use run-time if's or @if ... &analyze 2023-05-19 13:26:27 -07:00
Arne Welzel
c2a07476cc Merge remote-tracking branch 'jgras/topic/jgras/cluster-active-node-count-fix'
* jgras/topic/jgras/cluster-active-node-count-fix:
  Fix get_active_node_count for node types not present.

Changed over to explicit existence check instead to avoid the set()
creation upon missed lookups.
2023-05-17 10:37:00 +02:00
Jan Grashoefer
e4f654c14c Fix get_active_node_count for node types not present. 2023-05-16 17:47:50 +02:00
Arne Welzel
9330a74fe1 Merge remote-tracking branch 'origin/topic/awelzel/zeek-archiver-multiple-loggers'
* origin/topic/awelzel/zeek-archiver-multiple-loggers:
  cluster/supervisor: Multi-logger awareness
  Bump zeek-archiver submodule
2023-05-09 15:20:53 +02:00
Arne Welzel
c813872915 cluster/supervisor: Multi-logger awareness
When multiple loggers are configured in a Supervisor controlled cluster
configuration, encode extra information into the rotated filename to
identify which logger produced the log.

This is similar to the approach taken for ZeekControl, re-using the
log_suffix terminology, but as there's only a single zeek-archiver
process and no postprocessors and no other side-channel for additional
information, we encode extra metadata into the filename. zeek-archiver
is extended to recognize the special metadata part of the filename.

This also solves the issue that multiple loggers in a supervisor setup
overwrite each others log files within a single log-queue directory.
2023-05-05 12:27:25 +02:00
Jan Grashoefer
88c86cc7d4 Add hook into cluster connection setup. 2023-04-21 19:04:52 +02:00
Jan Grashoefer
c7626d797f Add broadcast_topics set.
This set contains the topics to reach all cluster nodes. Due to broker's
forwarding mechanism, we cannot define a single broadcast topic, as it
would create routing loops.
2023-04-21 19:04:52 +02:00
Jan Grashoefer
3db8bb4a44 Generalize Cluster::worker_count. 2023-04-21 19:04:39 +02:00
Robin Sommer
3a9320dab3
Merge remote-tracking branch 'origin/topic/awelzel/2528-cluster-layout-content-warning'
* origin/topic/awelzel/2528-cluster-layout-content-warning:
  cluster: Add warning about cluster-layout.zeek content
2022-11-07 11:28:57 +01:00
Arne Welzel
28336709b8 cluster: Add warning about cluster-layout.zeek content
Relates to #2528, #991.
2022-11-03 14:02:43 +01:00
Josh Soref
21e0d777b3 Spelling fixes: scripts
* accessing
* across
* adding
* additional
* addresses
* afterwards
* analyzer
* ancillary
* answer
* associated
* attempts
* because
* belonging
* buffer
* cleanup
* committed
* connects
* database
* destination
* destroy
* distinguished
* encoded
* entries
* entry
* hopefully
* image
* include
* incorrect
* information
* initial
* initiate
* interval
* into
* java
* negotiation
* nodes
* nonexistent
* ntlm
* occasional
* omitted
* otherwise
* ourselves
* paragraphs
* particular
* perform
* received
* receiver
* referring
* release
* repetitions
* request
* responded
* retrieval
* running
* search
* separate
* separator
* should
* synchronization
* target
* that
* the
* threshold
* timeout
* transaction
* transferred
* transmission
* triggered
* vetoes
* virtual

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-02 17:36:39 -04:00
Arne Welzel
8c5896a74d scripts: Migrate table iteration to blank identifiers
No obvious hot-cases. Maybe the describe_file() ones or the intel ones
if/when there are hot intel hits.
2022-10-24 10:36:09 +02:00
Tim Wojtulewicz
fb16ce3711 Remove other general deprecations 2022-06-30 19:17:13 +00:00
Tim Wojtulewicz
a6378531db Remove trailing whitespace from script files 2021-10-20 09:57:09 -07:00
Jon Siwek
7bf885b0b8 Merge remote-tracking branch 'origin/topic/vlad/expose_supervisor_rotation_func'
- Restored a deprecated version of 'supervisor_rotation_format_func'
  during merge.

* origin/topic/vlad/expose_supervisor_rotation_func:
  Rename supervisor_rotation_format_func to archiver_rotation_format_func, and expose it for non-supervised setups
2021-03-26 17:18:52 -07:00
Vlad Grigorescu
acfb21c5a6 Rename supervisor_rotation_format_func to archiver_rotation_format_func, and expose it for non-supervised setups
Closes #1463
2021-03-26 15:26:48 -05:00
Jon Siwek
e82824b638 Fix various broken links in script documentation 2021-01-28 17:46:58 -08:00
Jon Siwek
321a027d07 Remove unusable/broken RocksDB code and options
The Broker RockSDB data store backend was previously unusable
and broken, so all code and options related to it are now removed.
2021-01-11 11:12:59 -08:00
Jon Siwek
7cf08d4e58 Merge remote-tracking branch 'origin/topic/neverlord/1336'
* origin/topic/neverlord/1336:
  Fix subtle race on data store initialization
2020-12-23 10:36:09 -08:00
Dominik Charousset
8d726ed07a Fix subtle race on data store initialization 2020-12-22 21:15:17 +01:00
Seth Hall
cd330c801d
Apply suggestions from code review
Co-authored-by: Jon Siwek <jsiwek@corelight.com>
2020-10-13 16:48:15 -04:00
Seth Hall
e78386d6e5
Update scripts/base/frameworks/cluster/main.zeek
Co-authored-by: Jon Siwek <jsiwek@corelight.com>
2020-10-13 16:46:26 -04:00
Seth Hall
cf8671d078 Make defining a port number for hosts in a cluster that only connect outbound optional 2020-10-12 10:46:28 -04:00
Robin Sommer
b0bf9f02c8 Merge remote-tracking branch 'origin/topic/christian/364-logfilter-hooks' into master
(Adding a NEWS entry.)

* origin/topic/christian/364-logfilter-hooks:
  Update testing/btest/scripts/base/frameworks/logging/hooks.zeek
  Btests for log filter policy hooks
  Btest baseline updates to reflect new logging policy hooks
  Migrate existing use of filter predicates to policy hooks
  Support for log filter policy hooks
2020-10-07 08:44:50 +00:00
Arne Welzel
1f5ab4878b logging/ascii: Support leftover log rotation in non-supervisor setups
We have a use case to rotate leftover log files in a non-supervisor
setup. There doesn't seem to be a strict requirement on supervisor
functionality. Allow enabling leftover log rotation through
LogAscii::enable_leftover_log_rotation and redef this for the
logger node in a supervisor setup individually.
2020-10-02 20:38:48 +02:00
Christian Kreibich
1bd658da8f Support for log filter policy hooks
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:

    hook(rec: any, id: Log::ID, filter: Log::Filter);

The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.

The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:

    hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
            {
            if ( some_veto_reason(rec) )
                break;
            }

By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.

The existing predicates are deprecated for removal in 4.1 but continue
to work.
2020-09-30 12:32:45 -07:00
Jon Siwek
99d9a3a48c Fix closing timestamp of rotated log files in supervised-cluster mode 2020-08-25 17:06:10 -07:00
Robin Sommer
c3f4971eb2 Merge remote-tracking branch 'origin/topic/johanna/table-changes'
* origin/topic/johanna/table-changes: (26 commits)
  TableSync: try to make test more robust & add debug output
  Increase timeouts to see if FreeBSD will be happy with this.
  Try to make FreeBSD test happy with larger timeout.
  TableSync: refactor common functionality into function
  TableSync: don't raise &on_change, smaller fixes
  TableSync: rename auto_store -> table_store
  SyncTables: address feedback part 1 - naming (broker and zeek)
  BrokerStore <-> Zeek Tables: cleanup and bug workaround
  Zeek Table<->Brokerstore: cleanup, documentation, small fixes
  BrokerStore<->Zeek table: adopt to recent Zeek API changes
  BrokerStore<->Zeek Tables Fix a few small test failures.
  BrokerStore<->Zeek tables: allow setting storage location & tests
  BrokerStore<->Zeek tables: &backend works for in-memory stores.
  BrokerStore<->Zeek table - introdude &backend attribute
  BrokerStore<->Zeek tables: test for clones synchronizing to a master
  BrokerStore<->Zeek tables: load persistent tables on startup.
  Brokerstore<->Tables: attribute conflicts
  Zeek/Brokerstore updates: expiration
  Zeek/Brokerstore updates: add test that includes updates from clones
  Zeek/Brokerstore updates: first working end-to-end test
  ...
2020-07-21 15:39:39 +00:00
Johanna Amann
930a5c8ebd TableSync: rename auto_store -> table_store 2020-07-17 11:40:59 -07:00