Mirror/zeek - git.uphillsecurity.com: We code.

mirror of https://github.com/zeek/zeek.git synced 2025-10-02 14:48:21 +00:00

Author	SHA1	Message	Date
Christian Kreibich	1dcd13a019	Fix a typo.	2025-06-05 17:51:54 -07:00
Johanna Amann	58613f0313	Introduce new c$failed_analyzers field This field is used internally to trace which analyzers already had a violation. This is mostly used to prevent duplicate logging. In the past, c$service_violation was used for a similar purpose - however it has slightly different semantics. Where c$failed_analyzers tracks analyzers that were removed due to a violation, c$service_violation tracks violations - and doesn't care if an analyzer was actually removed due to it.	2025-06-04 12:07:13 +01:00
Johanna Amann	42ba2fcca0	Settle on analyzer.log for the dpd.log replacement This commit renames analyzer-failed.log to analyzer.log, and updates the respective news entry.	2025-06-03 17:33:36 +01:00
Johanna Amann	130c89a0a7	dpd->analyzer.log change - rename files To address review feedback in GH-4362: rename analyzer-failed-log.zeek to loggig.zeek, analyzer-debug-log.zeek to debug-logging.zeek and dpd-log.zeek to deprecated-dpd-log.zeek. Includes respective test, NEWS, etc updates.	2025-06-03 16:32:52 +01:00
Johanna Amann	af77a7a83b	Analyzer failure logging: tweaks and test fixes The main part of this commit are changes in tests. A lot of the tests that previously relied on analyzer.log or dpd.log now use the new analyzer-failed.log. I verified all the changes and, as far as I can tell, everything behaves as it should. This includes the external test baselines. This change also enables logging of file and packet analyzer to analyzer_failed.log and fixes some small behavior issues. The analyzer_failed event is no longer raised when the removal of an analyzer is vetoed. If an analyzer is no longer active when an analyzer violation is raised, currently the analyzer_failed event is raised. This can, e.g., happen when an analyzer error happens at the very end of the connection. This makes the behavior more similar to what happened in the past, and also intuitively seems to make sense. A bug introduced in the failed service logging was fixed.	2025-06-03 15:56:42 +01:00
Johanna Amann	8c814fa88c	Introduce analyzer-failed.log, as a replacement for dpd.log Analyzer-failed.log is, essentially, the replacement for dpd.log. The name should make more sense, as it does now log analyzer failures. For protocol analyzers specifically, these are failures that lead to the analyzer being disabled.	2025-06-03 15:17:26 +01:00
Johanna Amann	c55e21da71	Rename analyzer.log to analyzer.debug log; move to policy The current analyzer.log is more useful for debugging than for operational purposes. Hence this is disabled by default, moved to a policy script, and the log is renamed to analyzer-debug.log. Furthermore, logging of analyzer confirmations and disabling analyzers are now enabled by default.	2025-06-03 15:17:26 +01:00
Johanna Amann	6183c5086b	Move dpd.log to policy script This is the first phase of moving from the current dpd log to a more modern logfile, without some of the weirdnesses that the current dpd log contains. Tests will not pass in the current state; this is just splitting out functionality.	2025-06-03 15:17:26 +01:00
Arne Welzel	7eb849ddf4	intel: Add indicator_inserted and indicator_removed hooks This change adds two new hooks to the Intel framework that can be used to intercept added and removed indicators and their type. These hooks are fairly low-level. One immediate use-case is to count the number of indicators loaded per Intel::Type and enable and disable the corresponding event groups of the intel/seen scripts. I attempted to gauge the overhead and while it's definitely there, loading a file with ~500k DOMAIN entries takes somewhere around ~0.5 seconds hooks when populated via the min_data_store store mechanism. While that doesn't sound great, it actually takes the manager on my system 2.5 seconds to serialize and Cluster::publish() the min_data_store alone and its doing that serially for every active worker. Mostly to say that the bigger overhead in that area on the manager doing redundant work per worker. Co-authored-by: Mohan Dhawan <mohan@corelight.com>	2025-06-02 09:50:48 +02:00
Arne Welzel	544d571089	cluster/websocket: Deprecate $listen_host, introduce $listen_addr This only changes the script-layer API, but keeps the std::string host in the C++ layer's ServerOptions. Mostly because the ixwebsocket library takes host as std::string. Also, maybe at some point we'd want to support something scheme-based like unix:///var/run/zeek.sock and placing that in a string could not be totally wrong. Add tests for IPV6, too.	2025-05-30 11:02:41 +02:00
Arne Welzel	a61aff010f	cluster/websocket: Propagate code and reason to websocket_client_lost() This allows to get visibility into the reason why ixwebsocket or the client decided to disconnect. Closed #4440	2025-05-13 18:26:03 +02:00
Arne Welzel	aaddeb19ad	cluster/websocket: Support configurable ping interval Primarily for testing purposes and maybe the hard-coded 5 seconds is too aggressive for some deployments, so makes sense for it to be configurable.	2025-05-13 18:26:03 +02:00
Christian Kreibich	738ce1c235	Bugfix: accurately track Broker buffer overflows w/ multiple peerings When a node restarts or a peering between two nodes starts over for other reasons, the internal tracking in the Broker manager resets its state (since it's per-peering), and thus the message overflow counter. The script layer was unaware of this, and threw errors when trying to reset the corresponding counter metric down to zero at sync time. We now track past buffer overflows via a separate epoch table, using Broker peer ID comparisons to identify new peerings, and set the counter to the sum of past and current overflows. I considered just making this a gauge, but it seems more helpful to be able to look at a counter to see whether any messages have ever been dropped over the lifetime of the node process. As an aside, this now also avoids repeatedly creating the labels vector, re-using the same one for each metric. Thanks to @pbcullen for identifying this one!	2025-05-07 17:27:38 -07:00
Christian Kreibich	68fadd0464	Lower listen/connect retry intervals in Broker and the cluster framework to 1sec The former defaults (30sec, 1min) can slow down cluster startup and recovery considerably, and other systems have more aggressive intervals still.	2025-04-25 10:22:35 -07:00
Christian Kreibich	841a40ff88	Switch Broker's default backpressure policy to drop_oldest, bump buffer sizes At every site where we've dug into backpressure disconnect findings, it has been the case that the default values were too small. 8192, so 4x the old default, suffices at every site to drown out premature disconnects. With metrics now available for the send buffers regardless of backpressure overflow policy, this also switches the default from "disconnect" to "drop_oldest" (for both peers and websockets), meaning that peerings remain untouched but the oldest queued message simply gets dropped when a new message is enqueued. With this policy, the number of backpressure overflows is then simply the count of discarded messages, something that users can tune to see drop to zero in everyday use. Another benefit is that marginal overflows cause less message loss than when an entire buffer's worth (plus potentially more in-flight messages) gets thrown out with a disconnect.	2025-04-25 10:22:35 -07:00
Christian Kreibich	5008f586ea	Deprecate Broker::congestion_queue_size and stop using it internally Since a reorg in the Broker library (commit b04195183) that revamped flow control and that we pulled in with Zeek 5.0, this setting hasn't done anything. Broker's endpoint::make_subscriber() and endpoint::make_status_subscriber() take a queue size argument (with a default value) that simply gets dropped in the eventual subscriber::make() call. See: `b041951835 (diff-5c0d2baa7981caeb6a4080708ddca6ad929746d10c73d66598e46d7c2c03c8deL34-R178)`	2025-04-25 10:22:35 -07:00
Christian Kreibich	88a0cda8ca	Add cluster framework telemetry for Broker's send-buffer use This hooks into Telemetry::sync() to update Broker-level metrics tracking the peerings' send buffer state. We do this in the cluster framework so we can label the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.	2025-04-25 09:14:33 -07:00
Christian Kreibich	f5fbad23ff	Add peer buffer update tracking to the Broker manager's event_observer This implements basic tracking of each peering's current fill level, the maximum level over a recent time interval (via a new Broker::buffer_stats_reset_interval tunable, defaulting to 1min), and the number of times a buffer overflows. For the disconnect policy this is the number of depeerings, but for drop_newest and drop_oldest it implies the number of messages lost. This doesn't use "proper" telemetry metrics for a few reasons: this tracking is Broker-specific, so we need to track each peering via endpoint_ids, while we want the metrics to use Cluster node name labels, and the latter live in the script layer. Using broker::endpoint_id directly as keys also means we rely on their ability to hash in STL containers, which should be fast. This does not track the buffer levels for Broker "clients" (as opposed to "peers"), i.e. WebSockets, since we currently don't have a way to name these, and we don't want to use ephemeral Broker IDs in their telemetry. To make the stats accessible to the script layer the Broker manager (via a new helper class that lives in the event_observer) maintains a TableVal mapping Broker IDs to a new BrokerPeeringStats record. The table's members get updated every time that table is requested. This minimizes new val instantiation and allows the script layer to customize the BrokerPeeringStats record by redefing, updating fields, etc. Since we can't use Zeek vals outside the main thread, this requires some care so all table updates happen only in the Zeek-side table updater, PeerBufferState::GetPeeringStatsTable().	2025-04-24 22:47:18 -07:00
Arne Welzel	011029addc	cluster/websocket: Make websocket dispatcher queue size configurable Limit the number WebSocket events queued from external clients to dispatcher instances to produce back pressure to the clients if Zeek's IO loop is overloaded.	2025-04-23 14:27:43 +02:00
Arne Welzel	ab25e5d24b	broker/main: Reference Cluster::publish() for auto_publish() deprecation In hindsight, this is the better thing to do and with Zeek 7.2 we should be confident enough that it'll work.	2025-04-23 14:27:43 +02:00
Arne Welzel	a7423104e1	broker/main: Deprecate Broker::listen_websocket() Optimistically deprecate Broker::listen_websocket() and promote Cluster::listen_websocket() instead.	2025-04-23 14:27:43 +02:00
Arne Welzel	3d3b7a0759	cluster/Backend: Add ProcessError() Allow backends to pass errors to a strategy. Locally, these raise Cluster::Backend::error() events that are logged to the reporter as errors.	2025-04-23 14:19:08 +02:00
Christian Kreibich	549e678dff	Use Broker peering directionality when re-peering after backpressure overflows This avoids creating pointless connection reattempts to ephemeral TCP client-side ports, which have been cluttering up the Broker logs since 7.1.	2025-04-21 14:08:42 -07:00
Christian Kreibich	b430d5235c	Expand Broker APIs to allow tracking directionality of peering establishment This provides ways to figure out for a given peer, or a given address/port pair, whether the local node originally established the peering.	2025-04-21 14:08:42 -07:00
Tim Wojtulewicz	cb1ef47a31	Add STORAGE_ prefixes for backends and serializers	2025-04-14 10:11:13 -07:00
Tim Wojtulewicz	e545fe8256	Ground work for pluggable storage serializers	2025-04-14 10:02:35 -07:00
Arne Welzel	6bc36e8cf8	broker/main: Adapt enum values to agree with comm.bif Logic to detect this error already existed, but due to enum identifiers not having a value set, it never triggered before. Should probably backport this one.	2025-04-04 15:36:42 +02:00
Arne Welzel	14697ea6ba	Merge remote-tracking branch 'origin/topic/neverlord/broker-logging' * origin/topic/neverlord/broker-logging: Integrate review feedback Hook into Broker logs via its new API	2025-03-31 18:53:43 +02:00
Tim Wojtulewicz	c7015e8250	Split storage.bif file into events/sync/async, add more comments	2025-03-18 10:20:34 -07:00
Tim Wojtulewicz	f40947f6ac	Update comments in script files, run zeek-format on all of them	2025-03-18 10:20:34 -07:00
Tim Wojtulewicz	9ed3e33f97	Completely rework return values from storage operations	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	a485b1d237	Make backend options a record, move actual options to be sub-records	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	28951dccf1	Split sync and async into separate script-land namespaces	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	f1a7376e0a	Return generic result for get operations that includes error messages	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	4695060d75	Allow opening and closing backends to be async	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	7ad6a05f5b	Add infrastructure for asynchronous storage operations	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	d07d27453a	Add infrastructure for automated expiration of storage entries This is used for backends that don't support expiration natively.	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	8dee733a7d	Change args to Storage::put to be a record The number of args being passed to the put() methods was getting to be fairly long, with more on the horizon. Changing to a record means simplifying things a little bit.	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	69d940533d	Pass key/value types for validation when opening backends	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	2ea0f3e70a	Lay out initial parts for the Storage framework This includes a manager, component manager, BIF and script code, and parts to support new storage backend plugins.	2025-03-18 09:32:34 -07:00
Arne Welzel	6032741868	cluster/websocket: Implement WebSocket server	2025-03-10 17:07:30 +01:00
Johanna Amann	b8c135d7cb	Remove violating analyzer from services field again This reverts some of the recent DPD changes; specifically violations trigger removal from the services field, again, by default. Discussion in GH-4521	2025-03-04 15:10:49 +00:00
Dominik Charousset	20b3eca257	Integrate review feedback	2025-02-15 16:37:24 +01:00
Dominik Charousset	30615f425e	Hook into Broker logs via its new API The new Broker API allows us to provide a custom logger to Broker that pulls previously unattainable context information out of Broker to put them into broker.log for users of Zeek. Since Broker log events happen asynchronously, we cache them in a queue and use a flare to notify Zeek of activity. Furthermore, the Broker manager now implements the `ProcessFd` function to avoid unnecessary polling of the new log queue. As a side effect, data stores are polled less as well.	2025-02-08 16:28:02 +01:00
Johanna Amann	fc233fd8d0	Merge remote-tracking branch 'origin/topic/johanna/dpd-changes' * origin/topic/johanna/dpd-changes: DPD: failed services logging alignment DPD: update test baselines; change options for external tests. DPD: change policy script for service violation logging; add NEWS DPD changes - small script fixes and renames. Update public and private test suite for DPD changes. Allow to track service violations in conn.log. Make conn.log service field ordered DPD: change handling of pre-confirmation violations, remove max_violations DPD: log analyzers that have confirmed IRC analyzer - make protocol confirmation more robust.	2025-02-07 07:35:30 +00:00
Johanna Amann	e3493bc110	DPD changes - small script fixes and renames. This addresses review feedback of GH-4200. No functional changes.	2025-02-05 13:55:43 +00:00
Arne Welzel	769044e8e1	cluster/Backend: Pass node_id via Init()	2025-02-05 10:39:56 +01:00
Johanna Amann	6324445d62	Merge remote-tracking branch 'origin/master' into topic/johanna/dpd-changes This also includes some test baseline updates, due to recent QUIC changes. * origin/master: (39 commits) Update doc submodule [nomail] [skip ci] Bump cluster testsuite to pull in resilience to agent connection timing [skip ci] IPv6 support for detect-external-names and testcase Add `skip_resp_host_port_pairs` option. util/init_random_seed: write_file implies deterministic external/subdir-btest.cfg: Set OPENSSL_ENABLE_SHA1_SIGNATURES=1 btest/x509_verify: Drop OpenSSL 1.0 hack testing/btest: Use OPENSSL_ENABLE_SHA1_SIGNATURES Add ZAM baseline for new scripts.base.protocols.quic.analyzer-confirmations btest QUIC/decrypt_crypto: Rename all_data to data QUIC: Confirm before forwarding data to SSL QUIC: Parse all QUIC packets in a UDP datagram QUIC: Only slurp till packet end, not till &eod Remove unused SupervisedNode::InitCluster declaration Update doc submodule [nomail] [skip ci] Bump cluster testsuite to pull in updated Prometheus tests Make enc_part value from kerberos response available to scripts Management framework: move up addition of agent IPs into deployable cluster configs Support multiple instances per host addr in auto metrics generation When auto-generating metrics ports for worker nodes, get them more uniform across instances. ...	2025-02-05 09:31:16 +00:00
Johanna Amann	2f712c3c24	Allow to track service violations in conn.log. This introduces ian options, DPD::track_removed_services_in_connection. It adds failed services to the services column, prefixed with a "-". Alternatively, this commit also adds policy/protocols/conn/failed-services.zeek, which provides the same information in a new column in conn.log.	2025-01-30 16:59:44 +00:00
Johanna Amann	c72c1cba6f	DPD: change handling of pre-confirmation violations, remove max_violations This commit revamps the handling of analyzer violations that happen before an analyzer confirms the protocol. The current state is that an analyzer is disabled after 5 violations, if it has not been confirmed. If it has been confirmed, it is disabled after a single violation. The reason for this is a historic mistake. In Zeek up to versions 1.5, analyzers were unconditianally removed when they raised the first protocol violation. When this script was ported to the new layout for Zeek 2.0 in `b4b990cfb5`, a logic error was introduced that caused analyzers to no longer be disabled if they were not confirmed. This was the state for ~8 years, till the DPD::max_violations options was added, which instates the current approach of disabling unconfirmed analyzers after 5 violations. Sadly, there is not much discussion about this change - from my hazy memory, I think this was discovered during performance tests and the new behavior was added without checking into the history of previous changes. This commit reinstates the originally intended behavior of DPD. When an analyzer that has not been confirmed raises a protocol violation, it is immediately removed from the connection. This also makes a lot of sense - this allows the analyzer to be in a "tasting" phase at the beginning of the connection, and to error out quickly once it realizes that it was attached to a connection not containing the desired protocol. This change also removes the DPD::max_violations option, as it no longer serves any purpose after this change. (In practice, the option remains with an &deprecated warning, but it is no longer used for anything). There are relatively minimal test-baseline changes due to this; they are mostly triggered by the removal of the data structure and by less analyzer errors being thrown, as unconfirmed analyzers are disabled after the first error.	2025-01-30 16:59:44 +00:00

1 2 3 4 5 ...

1341 commits