Commit graph

1358 commits

Author SHA1 Message Date
Tim Wojtulewicz
f1a7376e0a Return generic result for get operations that includes error messages 2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
4695060d75 Allow opening and closing backends to be async 2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
7ad6a05f5b Add infrastructure for asynchronous storage operations 2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
d07d27453a Add infrastructure for automated expiration of storage entries
This is used for backends that don't support expiration natively.
2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
8dee733a7d Change args to Storage::put to be a record
The number of args being passed to the put() methods was getting to be
fairly long, with more on the horizon. Changing to a record means simplifying
things a little bit.
2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
69d940533d Pass key/value types for validation when opening backends 2025-03-18 09:32:34 -07:00
Tim Wojtulewicz
2ea0f3e70a Lay out initial parts for the Storage framework
This includes a manager, component manager, BIF and script code, and
parts to support new storage backend plugins.
2025-03-18 09:32:34 -07:00
Arne Welzel
6032741868 cluster/websocket: Implement WebSocket server 2025-03-10 17:07:30 +01:00
Johanna Amann
b8c135d7cb Remove violating analyzer from services field again
This reverts some of the recent DPD changes; specifically violations
trigger removal from the services field, again, by default.

Discussion in GH-4521
2025-03-04 15:10:49 +00:00
Dominik Charousset
20b3eca257 Integrate review feedback 2025-02-15 16:37:24 +01:00
Dominik Charousset
30615f425e Hook into Broker logs via its new API
The new Broker API allows us to provide a custom logger to Broker that
pulls previously unattainable context information out of Broker to put
them into broker.log for users of Zeek.

Since Broker log events happen asynchronously, we cache them in a queue
and use a flare to notify Zeek of activity. Furthermore, the Broker
manager now implements the `ProcessFd` function to avoid unnecessary
polling of the new log queue. As a side effect, data stores are polled
less as well.
2025-02-08 16:28:02 +01:00
Johanna Amann
fc233fd8d0 Merge remote-tracking branch 'origin/topic/johanna/dpd-changes'
* origin/topic/johanna/dpd-changes:
  DPD: failed services logging alignment
  DPD: update test baselines; change options for external tests.
  DPD: change policy script for service violation logging; add NEWS
  DPD changes - small script fixes and renames.
  Update public and private test suite for DPD changes.
  Allow to track service violations in conn.log.
  Make conn.log service field ordered
  DPD: change handling of pre-confirmation violations, remove max_violations
  DPD: log analyzers that have confirmed
  IRC analyzer - make protocol confirmation more robust.
2025-02-07 07:35:30 +00:00
Johanna Amann
e3493bc110 DPD changes - small script fixes and renames.
This addresses review feedback of GH-4200. No functional changes.
2025-02-05 13:55:43 +00:00
Arne Welzel
769044e8e1 cluster/Backend: Pass node_id via Init() 2025-02-05 10:39:56 +01:00
Johanna Amann
6324445d62 Merge remote-tracking branch 'origin/master' into topic/johanna/dpd-changes
This also includes some test baseline updates, due to recent QUIC
changes.

* origin/master: (39 commits)
  Update doc submodule [nomail] [skip ci]
  Bump cluster testsuite to pull in resilience to agent connection timing [skip ci]
  IPv6 support for detect-external-names and testcase
  Add  `skip_resp_host_port_pairs` option.
  util/init_random_seed: write_file implies deterministic
  external/subdir-btest.cfg: Set OPENSSL_ENABLE_SHA1_SIGNATURES=1
  btest/x509_verify: Drop OpenSSL 1.0 hack
  testing/btest: Use OPENSSL_ENABLE_SHA1_SIGNATURES
  Add ZAM baseline for new scripts.base.protocols.quic.analyzer-confirmations btest
  QUIC/decrypt_crypto: Rename all_data to data
  QUIC: Confirm before forwarding data to SSL
  QUIC: Parse all QUIC packets in a UDP datagram
  QUIC: Only slurp till packet end, not till &eod
  Remove unused SupervisedNode::InitCluster declaration
  Update doc submodule [nomail] [skip ci]
  Bump cluster testsuite to pull in updated Prometheus tests
  Make enc_part value from kerberos response available to scripts
  Management framework: move up addition of agent IPs into deployable cluster configs
  Support multiple instances per host addr in auto metrics generation
  When auto-generating metrics ports for worker nodes, get them more uniform across instances.
  ...
2025-02-05 09:31:16 +00:00
Johanna Amann
2f712c3c24 Allow to track service violations in conn.log.
This introduces ian options, DPD::track_removed_services_in_connection.
It adds failed services to the services column, prefixed with a
"-".

Alternatively, this commit also adds
policy/protocols/conn/failed-services.zeek, which provides the same
information in a new column in conn.log.
2025-01-30 16:59:44 +00:00
Johanna Amann
c72c1cba6f DPD: change handling of pre-confirmation violations, remove max_violations
This commit revamps the handling of analyzer violations that happen
before an analyzer confirms the protocol.

The current state is that an analyzer is disabled after 5 violations, if
it has not been confirmed. If it has been confirmed, it is disabled
after a single violation.

The reason for this is a historic mistake. In Zeek up to versions 1.5,
analyzers were unconditianally removed when they raised the first
protocol violation.

When this script was ported to the new layout for Zeek 2.0 in
b4b990cfb5, a logic error was introduced
that caused analyzers to no longer be disabled if they were not
confirmed.

This was the state for ~8 years, till the DPD::max_violations options
was added, which instates the current approach of disabling unconfirmed
analyzers after 5 violations. Sadly, there is not much discussion about
this change - from my hazy memory, I think this was discovered during
performance tests and the new behavior was added without checking into
the history of previous changes.

This commit reinstates the originally intended behavior of DPD. When an
analyzer that has not been confirmed raises a protocol violation, it is
immediately removed from the connection. This also makes a lot of sense
- this allows the analyzer to be in a "tasting" phase at the beginning
of the connection, and to error out quickly once it realizes that it was
attached to a connection not containing the desired protocol.

This change also removes the DPD::max_violations option, as it no longer
serves any purpose after this change. (In practice, the option remains
with an &deprecated warning, but it is no longer used for anything).

There are relatively minimal test-baseline changes due to this; they are
mostly triggered by the removal of the data structure and by less
analyzer errors being thrown, as unconfirmed analyzers are disabled
after the first error.
2025-01-30 16:59:44 +00:00
Johanna Amann
e6ed61c47a DPD: log analyzers that have confirmed
This switches the DPD logic to always log analyzers that raised a
protocol confirmation.

The logic is that, once a protocol has been confirmed - and thus there
probably is log output - it does not make sense to later remove it from
the log. It does make sense to somehow flag it as failed - but that
seems like a secondary step.
2025-01-30 16:59:44 +00:00
Tim Wojtulewicz
c1a8f8b763 Fix errors from rst linting on the generated docs 2025-01-24 11:41:36 -07:00
Benjamin Bannier
e8960e0efc Fix incorrect uses of zeek:see
This fixes instances where `zeek:see` was used incorrectly so it was not
rendered correctly. All these instances have been found by looking for
`zeek:see` in the generated HTML where it should not be visible anymore.

I also removed a doc reference to `paraglob_add` which never existed.
2025-01-01 15:35:59 +01:00
Evan Typanski
77273a676d Document get_tag to ensure that name exists
This caused confusion and I don't think it's very intuitive. If called
with a name that does not exist, this returns without a value, not even
an error value. Changing that seems like it could be more deprecation
work.
2024-12-18 16:13:13 -05:00
Tim Wojtulewicz
1158757b2b Merge remote-tracking branch 'origin/topic/awelzel/move-broker-to-cluster-publish'
* origin/topic/awelzel/move-broker-to-cluster-publish:
  netcontrol: Move to Cluster::publish()
  openflow: Move to Cluster::publish()
  netcontrol/catch-and-release: Move to Cluster::publish()
  config: Move to Cluster::publish()
  ssl/validate-certs: Move to Cluster::publish()
  irc: Move to Cluster::publish()
  ftp: Move to Cluster::publish()
  dhcp: Move to cluster publish
  notice: Move to Cluster::publish()
  intel: Move to Cluster::publish()
  sumstats: Move to Cluster::publish()
2024-12-12 13:18:21 -07:00
Tim Wojtulewicz
25554fa668 Merge remote-tracking branch 'origin/topic/awelzel/fix-cluster-publish-any'
* origin/topic/awelzel/fix-cluster-publish-any:
  cluster/Backend: Handle unspecified table/set
  cluster: Fix Cluster::publish() of Broker::Data
  cluster: Be noisy when attempting to connect to an unknown node
2024-12-12 13:17:08 -07:00
Arne Welzel
3d55341690 netcontrol: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
b2df78c0bb openflow: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
66f6149662 config: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
a9243bafcc notice: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
347faf5e86 intel: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
f58a2c2ca8 sumstats: Move to Cluster::publish() 2024-12-12 17:54:42 +01:00
Arne Welzel
271fc15041 cluster: Be noisy when attempting to connect to an unknown node
Mostly due to spending too much time wondering why nodes didn't connect
when there was a mismatch between "manager" and "manager-1" in the
cluster layout. Remove manager from test-all-policy-cluster test to
avoid connection attempts in this test.
2024-12-12 13:01:04 +01:00
Justin Azoff
10438408a5 Pre-compute the node topics for all pool entries.
A zeek script profile showed a small percentage of time spent in
Cluster::node_topic, but this never changes and can be cached.
2024-12-11 15:57:01 -05:00
Arne Welzel
a2249f7ecb cluster: Add Cluster::node_id(), allow redef of node_topic(), nodeid_topic()
This provides a way for non-broker cluster backends to override a
node's identifier and its own topics that it listens on by default.
2024-12-10 20:33:02 +01:00
Christian Kreibich
1c42bfc715 Merge branch 'topic/christian/disconnect-slow-peers'
* topic/christian/disconnect-slow-peers:
  Bump cluster testsuite to pull in Broker backpressure tests
  Expand documentation of Broker events.
  Add sleep() BiF.
  Add backpressure disconnect notification to cluster.log and via telemetry
  Remove unneeded @loads from base/misc/version.zeek
  Add Cluster::nodeid_to_node() helper function
  Support re-peering with Broker peers that fall behind
  Add Zeek-level configurability of Broker slow-peer disconnects
  Bump Broker to pull in disconnect feature and infinite-loop fix
  No need to namespace Cluster:: functions in their own namespace
2024-12-09 23:33:35 -08:00
Tim Wojtulewicz
ccefd66d37 Move python signatures to a separate file 2024-12-09 11:08:30 -07:00
Christian Kreibich
ead6134501 Add backpressure disconnect notification to cluster.log and via telemetry
This adds a Broker-specific script to the cluster framework, loaded only when
Zeek is running in cluster mode. It adds logging in cluster.log as well as
telemetry via a metrics counter for Broker-observed backpressure disconnects.

The new zeek_broker_backpressure_disconnects counter, labeled by the neighboring
peer that the reporting node has determined to be unresponsive, counts the
number of unpeerings for this reason.

Here the node "worker" has observed node "proxy" falling behind once:

# HELP zeek_broker_backpressure_disconnects_total Number of Broker peering drops due to a neighbor falling too far behind in message I/O
# TYPE zeek_broker_backpressure_disconnects_total counter
zeek_broker_backpressure_disconnects_total{endpoint="worker",peer="proxy"} 1

Includes small btest baseline update to reflect @load of a new script.
2024-12-06 15:18:05 -08:00
Christian Kreibich
46a11ec37d Add Cluster::nodeid_to_node() helper function
This translates backend-specific node identifiers (like Broker IDs) to
cluster nodes and their names, if available.
2024-12-06 15:18:05 -08:00
Christian Kreibich
0010e65f6d Support re-peering with Broker peers that fall behind
This adds re-peering at the Broker level for peers that Broker decided to
unpeer. We keep this at the Broker level since this behavior is specific to
it (as opposed to other cluster backends).

Includes baseline updates for btests that pick up on the new script's @load.
2024-12-06 15:18:05 -08:00
Dominik Charousset
4c4eb4b8e2 Add Zeek-level configurability of Broker slow-peer disconnects 2024-12-06 15:18:05 -08:00
Christian Kreibich
e81856a4af No need to namespace Cluster:: functions in their own namespace 2024-12-06 15:18:05 -08:00
Tim Wojtulewicz
bbd7f56dcc Add signatures for Python bytecode for 3.8-3.14 2024-12-06 13:45:46 -07:00
Johanna Amann
7b582bc345 Merge remote-tracking branch 'origin/topic/johanna/sqlite-pragmas'
* origin/topic/johanna/sqlite-pragmas:
  Options for SQLite log writer, eliminate duplicate definitions
  Test synchronous/journal mode options for SQLite log writer
  Added default options for synchronous and journal mode
  Support for synchronous and journal_mode
2024-11-27 08:32:08 +00:00
Johanna Amann
d592942ccb Test synchronous/journal mode options for SQLite log writer
Also adds some small tweaks and adds the new feature to NEWS.
2024-11-26 12:26:38 +00:00
Arne Welzel
fc12be1f17 cluster/setup-connections: Switch to Cluster::subscribe(), short-circuit broker
For the time being, this is easiest, otherwise we'd need to
conditionally load a broker-specific policy script based on
Cluster::backend being set.
2024-11-26 12:58:23 +01:00
Arne Welzel
ef04a199c8 cluster: Add Cluster scoped bifs
... and a broker based test using Cluster::publish() and
Cluster::subscribe().
2024-11-26 12:58:23 +01:00
Mymaqn
3ca56f7e0f Added default options for synchronous and journal mode
Added enum options SQLITE_SYNCHRONOUS_DEFAULT and SQLITE_JOURNAL_MODE_DEFAULT
and changed the default to be these instead.
2024-11-26 11:08:30 +00:00
Mymaqn
6e026ba313 Support for synchronous and journal_mode 2024-11-26 11:08:18 +00:00
Arne Welzel
97f05b2f8c Merge remote-tracking branch 'origin/topic/awelzel/pluggable-cluster-backends-part1'
* origin/topic/awelzel/pluggable-cluster-backends-part1:
  btest: Test Broker::make_event() together with Cluster::publish_hrw()
  btest: Add cluster dir, minimal test for enum value
  broker: Add shim plugin adding a backend component
  zeek-setup: Instantiate backend::manager
  cluster: Add to src/CMakeLists.txt
  cluster: Add Components and ComponentManager for new components
  cluster/Backend: Interface for cluster backends
  cluster/Serializer: Interface for event and log serializers
  logging: Introduce logging/Types.h
  SerialTypes/Field: Allow default construction and add move constructor
  DebugLogger: Add cluster debugging stream
  plugin: Add component enums for pluggable cluster backends
  broker: Pass frame to MakeEvent()
2024-11-22 12:53:23 +01:00
Arne Welzel
fb23a06f6f cluster/Backend: Interface for cluster backends 2024-11-22 10:43:50 +01:00
Arne Welzel
91f5945f92 sumstat/non-cluster: Move last epoch processing to zeek_done()
@Sheco reported that standalone epoch processing may exclude scheduled
events when the final sumstat epoch runs before. For example, this easily
happens when attempting to do sumstat observations within connection_state_remove().

Delay final epoch processing to zeek_done() instead.

This doesn't deal with the clustered version - this would need something
more elaborate and potentially a mechanism to delay the shutdown of
other cluster nodes until/after sumstat processing completed.
2024-11-18 15:58:01 +01:00
Arne Welzel
aabc4a4114 sumstats: Remove copy() for Broker::publish() calls
Serialization happens immediately at Broker::publish() time, there
should be no caching issues.
2024-11-14 12:59:22 +01:00