Mirror/zeek - git.uphillsecurity.com: We code.

mirror of https://github.com/zeek/zeek.git synced 2025-10-02 06:38:20 +00:00

Author	SHA1	Message	Date
Arne Welzel	473723cc47	Attr: Deprecate using &default and &optional together on record fields If &default implies re-initialization of the field, using them together doesn't make much sense.	2025-07-30 10:26:06 +02:00
Tim Wojtulewicz	7e3ed2010d	Add flag to force synchronous mode when calling storage script-land functions	2025-07-23 13:14:34 -07:00
Tim Wojtulewicz	a0ffe7f748	Add storage metrics for operations, expirations, data transferred	2025-07-18 14:28:04 -07:00
Benjamin Bannier	d5fd29edcd	Prefer explicit construction to coercion in record initialization While we support initializing records via coercion from an expression list, e.g., local x: X = [$x1=1, $x2=2]; this can sometimes obscure the code to readers, e.g., when assigning to value declared and typed elsewhere. The language runtime has a similar overhead since instead of just constructing a known type it needs to check at runtime that the coercion from the expression list is valid; this can be slower than just writing the readible code in the first place, see #4559. With this patch we use explicit construction, e.g., local x = X($x1=1, $x2=2);	2025-07-11 16:28:37 -07:00
Arne Welzel	df581c59b4	scripts: Use tpe instead of type_, again The .rst generation doesn't escape the trailing `_` and the docs build gets upset due to using `type` as a reference target then. For the better or worse, revert to using tpe. Though I acknowledge this means we need to be careful with trailing underscores because our docs build is so fragile. Partly reverts `b9eabbabba`.	2025-07-03 20:25:34 +02:00
Benjamin Bannier	b9eabbabba	Bump pre-commit hooks	2025-07-01 10:39:47 +02:00
Arne Welzel	1d931b5a2f	cluster/WebSocket: Include X-Application-Name in cluster.log A bit ad-hoc formatting for the log, but that's mostly because cluster.log only has message field and I don't think having a dedicated application_name column is worth it. That could also be added by custom scripts if it's really wanted for a given deployment.	2025-06-30 17:55:24 +02:00
Arne Welzel	26f5166d7a	cluster/telemetry: Move topic_normalization redef to zeromq	2025-06-26 15:22:11 +02:00
Arne Welzel	4c34274a6c	cluster: Introduce telemetry component	2025-06-25 16:59:49 +02:00
Arne Welzel	4b472f2771	Merge remote-tracking branch 'origin/topic/awelzel/telemetry-endpoint-to-node-rename' * origin/topic/awelzel/telemetry-endpoint-to-node-rename: telemetry: Rename endpoint label to node label	2025-06-25 09:33:55 +02:00
Arne Welzel	eea194ddd8	telemetry: Rename endpoint label to node label Using a label named "endpoint" is not intuitive and requires explaining to users that it's really just the Cluster::node value. Change the label to "node", so that we don't need to do the explaining. This probably breaks some existing users of the Prometheus metrics, but after looking more at metrics recently, "endpoint" really is a thorn in my eye.	2025-06-25 09:33:01 +02:00
Christian Kreibich	1dcd13a019	Fix a typo.	2025-06-05 17:51:54 -07:00
Johanna Amann	58613f0313	Introduce new c$failed_analyzers field This field is used internally to trace which analyzers already had a violation. This is mostly used to prevent duplicate logging. In the past, c$service_violation was used for a similar purpose - however it has slightly different semantics. Where c$failed_analyzers tracks analyzers that were removed due to a violation, c$service_violation tracks violations - and doesn't care if an analyzer was actually removed due to it.	2025-06-04 12:07:13 +01:00
Johanna Amann	42ba2fcca0	Settle on analyzer.log for the dpd.log replacement This commit renames analyzer-failed.log to analyzer.log, and updates the respective news entry.	2025-06-03 17:33:36 +01:00
Johanna Amann	130c89a0a7	dpd->analyzer.log change - rename files To address review feedback in GH-4362: rename analyzer-failed-log.zeek to loggig.zeek, analyzer-debug-log.zeek to debug-logging.zeek and dpd-log.zeek to deprecated-dpd-log.zeek. Includes respective test, NEWS, etc updates.	2025-06-03 16:32:52 +01:00
Johanna Amann	af77a7a83b	Analyzer failure logging: tweaks and test fixes The main part of this commit are changes in tests. A lot of the tests that previously relied on analyzer.log or dpd.log now use the new analyzer-failed.log. I verified all the changes and, as far as I can tell, everything behaves as it should. This includes the external test baselines. This change also enables logging of file and packet analyzer to analyzer_failed.log and fixes some small behavior issues. The analyzer_failed event is no longer raised when the removal of an analyzer is vetoed. If an analyzer is no longer active when an analyzer violation is raised, currently the analyzer_failed event is raised. This can, e.g., happen when an analyzer error happens at the very end of the connection. This makes the behavior more similar to what happened in the past, and also intuitively seems to make sense. A bug introduced in the failed service logging was fixed.	2025-06-03 15:56:42 +01:00
Johanna Amann	8c814fa88c	Introduce analyzer-failed.log, as a replacement for dpd.log Analyzer-failed.log is, essentially, the replacement for dpd.log. The name should make more sense, as it does now log analyzer failures. For protocol analyzers specifically, these are failures that lead to the analyzer being disabled.	2025-06-03 15:17:26 +01:00
Johanna Amann	c55e21da71	Rename analyzer.log to analyzer.debug log; move to policy The current analyzer.log is more useful for debugging than for operational purposes. Hence this is disabled by default, moved to a policy script, and the log is renamed to analyzer-debug.log. Furthermore, logging of analyzer confirmations and disabling analyzers are now enabled by default.	2025-06-03 15:17:26 +01:00
Johanna Amann	6183c5086b	Move dpd.log to policy script This is the first phase of moving from the current dpd log to a more modern logfile, without some of the weirdnesses that the current dpd log contains. Tests will not pass in the current state; this is just splitting out functionality.	2025-06-03 15:17:26 +01:00
Arne Welzel	7eb849ddf4	intel: Add indicator_inserted and indicator_removed hooks This change adds two new hooks to the Intel framework that can be used to intercept added and removed indicators and their type. These hooks are fairly low-level. One immediate use-case is to count the number of indicators loaded per Intel::Type and enable and disable the corresponding event groups of the intel/seen scripts. I attempted to gauge the overhead and while it's definitely there, loading a file with ~500k DOMAIN entries takes somewhere around ~0.5 seconds hooks when populated via the min_data_store store mechanism. While that doesn't sound great, it actually takes the manager on my system 2.5 seconds to serialize and Cluster::publish() the min_data_store alone and its doing that serially for every active worker. Mostly to say that the bigger overhead in that area on the manager doing redundant work per worker. Co-authored-by: Mohan Dhawan <mohan@corelight.com>	2025-06-02 09:50:48 +02:00
Arne Welzel	544d571089	cluster/websocket: Deprecate $listen_host, introduce $listen_addr This only changes the script-layer API, but keeps the std::string host in the C++ layer's ServerOptions. Mostly because the ixwebsocket library takes host as std::string. Also, maybe at some point we'd want to support something scheme-based like unix:///var/run/zeek.sock and placing that in a string could not be totally wrong. Add tests for IPV6, too.	2025-05-30 11:02:41 +02:00
Arne Welzel	a61aff010f	cluster/websocket: Propagate code and reason to websocket_client_lost() This allows to get visibility into the reason why ixwebsocket or the client decided to disconnect. Closed #4440	2025-05-13 18:26:03 +02:00
Arne Welzel	aaddeb19ad	cluster/websocket: Support configurable ping interval Primarily for testing purposes and maybe the hard-coded 5 seconds is too aggressive for some deployments, so makes sense for it to be configurable.	2025-05-13 18:26:03 +02:00
Christian Kreibich	738ce1c235	Bugfix: accurately track Broker buffer overflows w/ multiple peerings When a node restarts or a peering between two nodes starts over for other reasons, the internal tracking in the Broker manager resets its state (since it's per-peering), and thus the message overflow counter. The script layer was unaware of this, and threw errors when trying to reset the corresponding counter metric down to zero at sync time. We now track past buffer overflows via a separate epoch table, using Broker peer ID comparisons to identify new peerings, and set the counter to the sum of past and current overflows. I considered just making this a gauge, but it seems more helpful to be able to look at a counter to see whether any messages have ever been dropped over the lifetime of the node process. As an aside, this now also avoids repeatedly creating the labels vector, re-using the same one for each metric. Thanks to @pbcullen for identifying this one!	2025-05-07 17:27:38 -07:00
Christian Kreibich	68fadd0464	Lower listen/connect retry intervals in Broker and the cluster framework to 1sec The former defaults (30sec, 1min) can slow down cluster startup and recovery considerably, and other systems have more aggressive intervals still.	2025-04-25 10:22:35 -07:00
Christian Kreibich	841a40ff88	Switch Broker's default backpressure policy to drop_oldest, bump buffer sizes At every site where we've dug into backpressure disconnect findings, it has been the case that the default values were too small. 8192, so 4x the old default, suffices at every site to drown out premature disconnects. With metrics now available for the send buffers regardless of backpressure overflow policy, this also switches the default from "disconnect" to "drop_oldest" (for both peers and websockets), meaning that peerings remain untouched but the oldest queued message simply gets dropped when a new message is enqueued. With this policy, the number of backpressure overflows is then simply the count of discarded messages, something that users can tune to see drop to zero in everyday use. Another benefit is that marginal overflows cause less message loss than when an entire buffer's worth (plus potentially more in-flight messages) gets thrown out with a disconnect.	2025-04-25 10:22:35 -07:00
Christian Kreibich	5008f586ea	Deprecate Broker::congestion_queue_size and stop using it internally Since a reorg in the Broker library (commit b04195183) that revamped flow control and that we pulled in with Zeek 5.0, this setting hasn't done anything. Broker's endpoint::make_subscriber() and endpoint::make_status_subscriber() take a queue size argument (with a default value) that simply gets dropped in the eventual subscriber::make() call. See: `b041951835 (diff-5c0d2baa7981caeb6a4080708ddca6ad929746d10c73d66598e46d7c2c03c8deL34-R178)`	2025-04-25 10:22:35 -07:00
Christian Kreibich	88a0cda8ca	Add cluster framework telemetry for Broker's send-buffer use This hooks into Telemetry::sync() to update Broker-level metrics tracking the peerings' send buffer state. We do this in the cluster framework so we can label the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.	2025-04-25 09:14:33 -07:00
Christian Kreibich	f5fbad23ff	Add peer buffer update tracking to the Broker manager's event_observer This implements basic tracking of each peering's current fill level, the maximum level over a recent time interval (via a new Broker::buffer_stats_reset_interval tunable, defaulting to 1min), and the number of times a buffer overflows. For the disconnect policy this is the number of depeerings, but for drop_newest and drop_oldest it implies the number of messages lost. This doesn't use "proper" telemetry metrics for a few reasons: this tracking is Broker-specific, so we need to track each peering via endpoint_ids, while we want the metrics to use Cluster node name labels, and the latter live in the script layer. Using broker::endpoint_id directly as keys also means we rely on their ability to hash in STL containers, which should be fast. This does not track the buffer levels for Broker "clients" (as opposed to "peers"), i.e. WebSockets, since we currently don't have a way to name these, and we don't want to use ephemeral Broker IDs in their telemetry. To make the stats accessible to the script layer the Broker manager (via a new helper class that lives in the event_observer) maintains a TableVal mapping Broker IDs to a new BrokerPeeringStats record. The table's members get updated every time that table is requested. This minimizes new val instantiation and allows the script layer to customize the BrokerPeeringStats record by redefing, updating fields, etc. Since we can't use Zeek vals outside the main thread, this requires some care so all table updates happen only in the Zeek-side table updater, PeerBufferState::GetPeeringStatsTable().	2025-04-24 22:47:18 -07:00
Arne Welzel	011029addc	cluster/websocket: Make websocket dispatcher queue size configurable Limit the number WebSocket events queued from external clients to dispatcher instances to produce back pressure to the clients if Zeek's IO loop is overloaded.	2025-04-23 14:27:43 +02:00
Arne Welzel	ab25e5d24b	broker/main: Reference Cluster::publish() for auto_publish() deprecation In hindsight, this is the better thing to do and with Zeek 7.2 we should be confident enough that it'll work.	2025-04-23 14:27:43 +02:00
Arne Welzel	a7423104e1	broker/main: Deprecate Broker::listen_websocket() Optimistically deprecate Broker::listen_websocket() and promote Cluster::listen_websocket() instead.	2025-04-23 14:27:43 +02:00
Arne Welzel	3d3b7a0759	cluster/Backend: Add ProcessError() Allow backends to pass errors to a strategy. Locally, these raise Cluster::Backend::error() events that are logged to the reporter as errors.	2025-04-23 14:19:08 +02:00
Christian Kreibich	549e678dff	Use Broker peering directionality when re-peering after backpressure overflows This avoids creating pointless connection reattempts to ephemeral TCP client-side ports, which have been cluttering up the Broker logs since 7.1.	2025-04-21 14:08:42 -07:00
Christian Kreibich	b430d5235c	Expand Broker APIs to allow tracking directionality of peering establishment This provides ways to figure out for a given peer, or a given address/port pair, whether the local node originally established the peering.	2025-04-21 14:08:42 -07:00
Tim Wojtulewicz	cb1ef47a31	Add STORAGE_ prefixes for backends and serializers	2025-04-14 10:11:13 -07:00
Tim Wojtulewicz	e545fe8256	Ground work for pluggable storage serializers	2025-04-14 10:02:35 -07:00
Arne Welzel	6bc36e8cf8	broker/main: Adapt enum values to agree with comm.bif Logic to detect this error already existed, but due to enum identifiers not having a value set, it never triggered before. Should probably backport this one.	2025-04-04 15:36:42 +02:00
Arne Welzel	14697ea6ba	Merge remote-tracking branch 'origin/topic/neverlord/broker-logging' * origin/topic/neverlord/broker-logging: Integrate review feedback Hook into Broker logs via its new API	2025-03-31 18:53:43 +02:00
Tim Wojtulewicz	c7015e8250	Split storage.bif file into events/sync/async, add more comments	2025-03-18 10:20:34 -07:00
Tim Wojtulewicz	f40947f6ac	Update comments in script files, run zeek-format on all of them	2025-03-18 10:20:34 -07:00
Tim Wojtulewicz	9ed3e33f97	Completely rework return values from storage operations	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	a485b1d237	Make backend options a record, move actual options to be sub-records	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	28951dccf1	Split sync and async into separate script-land namespaces	2025-03-18 10:20:33 -07:00
Tim Wojtulewicz	f1a7376e0a	Return generic result for get operations that includes error messages	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	4695060d75	Allow opening and closing backends to be async	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	7ad6a05f5b	Add infrastructure for asynchronous storage operations	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	d07d27453a	Add infrastructure for automated expiration of storage entries This is used for backends that don't support expiration natively.	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	8dee733a7d	Change args to Storage::put to be a record The number of args being passed to the put() methods was getting to be fairly long, with more on the horizon. Changing to a record means simplifying things a little bit.	2025-03-18 09:32:34 -07:00
Tim Wojtulewicz	69d940533d	Pass key/value types for validation when opening backends	2025-03-18 09:32:34 -07:00

1 2 3 4 5 ...

1352 commits