Commit graph

3004 commits

Author SHA1 Message Date
Arne Welzel
1d931b5a2f cluster/WebSocket: Include X-Application-Name in cluster.log
A bit ad-hoc formatting for the log, but that's mostly because cluster.log
only has message field and I don't think having a dedicated application_name
column is worth it. That could also be added by custom scripts if it's really
wanted for a given deployment.
2025-06-30 17:55:24 +02:00
Arne Welzel
26f5166d7a cluster/telemetry: Move topic_normalization redef to zeromq 2025-06-26 15:22:11 +02:00
Arne Welzel
22958f7cdf Merge remote-tracking branch 'origin/topic/awelzel/1474-cluster-telemetry'
* origin/topic/awelzel/1474-cluster-telemetry:
  btest/cluster/telemetry: Add smoke testing for telemetry
  cluster/WebSocket: Fetch X-Application-Name header as app label
  cluster/WebSocket: Pass X-Application-Name to dispatcher
  broker/WebSocketShim: Add calls to Telemetry hooks
  cluster/WebSocket: Configure telemetry for WebSocket backends
  broker: Hook up generic cluster telemetry
  cluster: Introduce telemetry component

One bug fix removing static from a variable that shouldn't be static.
2025-06-26 14:54:01 +02:00
bhaskarbhar
722381366b
Update init-bare.zeek 2025-06-25 22:51:43 +05:30
root
da89e7ee6e Renamed 2025-06-25 21:10:08 +05:30
Arne Welzel
4c34274a6c cluster: Introduce telemetry component 2025-06-25 16:59:49 +02:00
Christian Kreibich
0c64f6a7b9 Establish plugin infrastructure for ConnKey factories.
ConnKey factories are intermediaries that encapsulate the details of how to
instantiate ConnKeys, which codify the hash input for connection lookups.
2025-06-25 13:18:07 +02:00
Arne Welzel
4b472f2771 Merge remote-tracking branch 'origin/topic/awelzel/telemetry-endpoint-to-node-rename'
* origin/topic/awelzel/telemetry-endpoint-to-node-rename:
  telemetry: Rename endpoint label to node label
2025-06-25 09:33:55 +02:00
Arne Welzel
eea194ddd8 telemetry: Rename endpoint label to node label
Using a label named "endpoint" is not intuitive and requires explaining to
users that it's really just the Cluster::node value. Change the label to
"node", so that we don't need to do the explaining.

This probably breaks some existing users of the Prometheus metrics, but after
looking more at metrics recently, "endpoint" really is a thorn in my eye.
2025-06-25 09:33:01 +02:00
bhaskarbhar
04d6fa3cb7 Add get_tags_by_category BIF method 2025-06-24 13:47:49 -07:00
Arne Welzel
4f1fc296b6 DNS: Implement NAPTR RR support
My phone is sending NAPTR queries and we reported an unknown RR type 35
in weird.log for the response, so figured I'd just add it.
2025-06-24 17:43:27 +02:00
Arne Welzel
f5063bfcd4 Merge remote-tracking branch 'origin/topic/awelzel/4522-bdat-last-reply-fix'
* origin/topic/awelzel/4522-bdat-last-reply-fix:
  smtp: Fix last_reply column in smtp.log for BDAT LAST
2025-06-11 17:25:21 +02:00
Tim Wojtulewicz
ed51738668 Move netbios_ssn_session_timeout to a script-level constant 2025-06-10 11:58:20 -07:00
Arne Welzel
d650589ad4 smtp: Fix last_reply column in smtp.log for BDAT LAST
The response to BDAT LAST was never recognized, resulting in the
BDAT LAST commands not being logged in a timely fashion and receiving
the wrong status.

This likely doesn't handle complex pipeline scenarios, but it fixes
the wrong behavior for smtp_reply() not handling simple BDAT commands
responses.

Thanks @cccs-jsjm for the report!

Closes #4522
2025-06-06 10:40:49 +02:00
Christian Kreibich
1dcd13a019 Fix a typo. 2025-06-05 17:51:54 -07:00
Johanna Amann
58613f0313 Introduce new c$failed_analyzers field
This field is used internally to trace which analyzers already had a
violation. This is mostly used to prevent duplicate logging.

In the past, c$service_violation was used for a similar purpose -
however it has slightly different semantics. Where c$failed_analyzers
tracks analyzers that were removed due to a violation,
c$service_violation tracks violations - and doesn't care if an analyzer
was actually removed due to it.
2025-06-04 12:07:13 +01:00
Johanna Amann
42ba2fcca0 Settle on analyzer.log for the dpd.log replacement
This commit renames analyzer-failed.log to analyzer.log, and updates the
respective news entry.
2025-06-03 17:33:36 +01:00
Johanna Amann
130c89a0a7 dpd->analyzer.log change - rename files
To address review feedback in GH-4362: rename analyzer-failed-log.zeek
to loggig.zeek, analyzer-debug-log.zeek to debug-logging.zeek and
dpd-log.zeek to deprecated-dpd-log.zeek.

Includes respective test, NEWS, etc updates.
2025-06-03 16:32:52 +01:00
Johanna Amann
af77a7a83b Analyzer failure logging: tweaks and test fixes
The main part of this commit are changes in tests. A lot of the tests
that previously relied on analyzer.log or dpd.log now use the new
analyzer-failed.log.

I verified all the changes and, as far as I can tell, everything
behaves as it should. This includes the external test baselines.

This change also enables logging of file and packet analyzer to
analyzer_failed.log and fixes some small behavior issues.

The analyzer_failed event is no longer raised when the removal of an
analyzer is vetoed.

If an analyzer is no longer active when an analyzer violation is raised,
currently the analyzer_failed event is raised. This can, e.g., happen
when an analyzer error happens at the very end of the connection. This
makes the behavior more similar to what happened in the past, and also
intuitively seems to make sense.

A bug introduced in the failed service logging was fixed.
2025-06-03 15:56:42 +01:00
Johanna Amann
8c814fa88c Introduce analyzer-failed.log, as a replacement for dpd.log
Analyzer-failed.log is, essentially, the replacement for dpd.log. The
name should make more sense, as it does now log analyzer failures. For
protocol analyzers specifically, these are failures that lead to the
analyzer being disabled.
2025-06-03 15:17:26 +01:00
Johanna Amann
c55e21da71 Rename analyzer.log to analyzer.debug log; move to policy
The current analyzer.log is more useful for debugging than for
operational purposes. Hence this is disabled by default, moved to a
policy script, and the log is renamed to analyzer-debug.log.

Furthermore, logging of analyzer confirmations and disabling analyzers
are now enabled by default.
2025-06-03 15:17:26 +01:00
Johanna Amann
6183c5086b Move dpd.log to policy script
This is the first phase of moving from the current dpd log to a more
modern logfile, without some of the weirdnesses that the current dpd log
contains.

Tests will not pass in the current state; this is just splitting out
functionality.
2025-06-03 15:17:26 +01:00
Arne Welzel
0a34b39e7a Merge remote-tracking branch 'origin/topic/awelzel/4177-4178-custom-event-metadata-part-2'
* origin/topic/awelzel/4177-4178-custom-event-metadata-part-2:
  Event: Bail on add_missing_remote_network_timestamp without add_network_timestamp
  btest/plugin: Test custom metadata publish
  NEWS: Add note about generic event metadata
  cluster: Remove deprecated Event constructor
  cluster: Remove some explicit timestamp handling
  broker/Manager: Fetch and forward all metadata from events
  Event/init-bare: Add add_missing_remote_network_timestamp logic
  cluster/Backend/DoProcessEvent: Use generic metadata, not just timestamps
  cluster/Event: Support moving args and metadata from event
  cluster/serializer/broker: Support generic metadata
  cluster/Event: Generic metadata support
  Event: Use -1.0 for undefined/unset timestamps
  cluster: Use shorter obj_desc versions
  Desc: Add obj_desc() / obj_desc_short() overloads for IntrusivePtr
2025-06-02 17:33:22 +02:00
Arne Welzel
96f2d5d369 Event/init-bare: Add add_missing_remote_network_timestamp logic
Make defaulting to the local network timestamp for remote events opt-in.
2025-06-02 17:31:36 +02:00
Arne Welzel
7eb849ddf4 intel: Add indicator_inserted and indicator_removed hooks
This change adds two new hooks to the Intel framework that can be used
to intercept added and removed indicators and their type.

These hooks are fairly low-level. One immediate use-case is to count the
number of indicators loaded per Intel::Type and enable and disable the
corresponding event groups of the intel/seen scripts.

I attempted to gauge the overhead and while it's definitely there, loading
a file with ~500k DOMAIN entries takes somewhere around ~0.5 seconds hooks
when populated via the min_data_store store mechanism. While that
doesn't sound great, it actually takes the manager on my system 2.5
seconds to serialize and Cluster::publish() the min_data_store alone
and its doing that serially for every active worker. Mostly to say that
the bigger overhead in that area on the manager doing redundant work
per worker.

Co-authored-by: Mohan Dhawan <mohan@corelight.com>
2025-06-02 09:50:48 +02:00
Arne Welzel
93813a5079 logging/ascii/json: Make TS_MILLIS signed, add TS_MILLIS_UNSIGNED
It seems TS_MILLIS is specifically for Elasticsearch and starting with
Elasticsearch 8.2 epoch_millis does (again?) support negative epoch_millis,
so make Zeek produce that by default.

If this breaks a given deployment, they can switch Zeek back to TS_MILLIS_UNSIGNED.

https://discuss.elastic.co/t/migration-from-es-6-8-to-7-17-issues-with-negative-date-epoch-timestamp/335259
https://github.com/elastic/elasticsearch/pull/80208

Thanks for @timo-mue for reporting!

Closes #4494
2025-05-30 17:23:29 +02:00
Arne Welzel
544d571089 cluster/websocket: Deprecate $listen_host, introduce $listen_addr
This only changes the script-layer API, but keeps the std::string host
in the C++ layer's ServerOptions. Mostly because the ixwebsocket library
takes host as std::string. Also, maybe at  some point we'd want to
support something scheme-based like unix:///var/run/zeek.sock and placing
that in a string could not be totally wrong.

Add tests for IPV6, too.
2025-05-30 11:02:41 +02:00
Evan Typanski
b4429a995a spicy-redis: Separate error replies from success 2025-05-27 09:31:25 -04:00
Evan Typanski
d5b121db14 spicy-redis: Cleanup scripts and tests
- Recomputes checksums for pcaps to keep clean
- Removes some tests that had big pcaps or weren't necessary
- Cleans up scripting names and minor points
- Comments out Spicy code that causes a build failure now with a TODO to
  uncomment it
2025-05-27 09:29:13 -04:00
Evan Typanski
11777bd6d5 spciy-redis: Bring Redis analyzer into Zeek proper 2025-05-27 09:28:12 -04:00
Evan Typanski
7f28ec8bc5 spicy-redis: Add dpd signature and clean pcaps 2025-05-27 09:28:12 -04:00
Evan Typanski
f0e9f46c7c spicy-redis: Add some commands and touch up parsing 2025-05-27 09:28:12 -04:00
Evan Typanski
22bda56af3 spicy-redis: Add some script logic for logging
Also "rebrands" from RESP to Redis.
2025-05-27 09:28:12 -04:00
Evan Typanski
757cbbf902 spicy-redis: Separate client/server
This makes the parser more official and splits the client/server out
from each other. Apparently they're different enough to be separate.
2025-05-27 09:28:12 -04:00
Evan Typanski
f0f2969a66 spicy-redis: Touchup logging and Spicy issues 2025-05-27 09:28:12 -04:00
Evan Typanski
97d26a689d spicy-redis: Add synchronization and pipeline support
Also adds some command support
2025-05-27 09:28:12 -04:00
Evan Typanski
4210e62e57 spicy-redis: Begin Spicy Redis analyzer 2025-05-27 09:28:12 -04:00
Arne Welzel
53b0f0ad64 Event: Deprecate default network timestamp metadata
This deprecates the Event constructor and the ``ts`` parameter of Enqueue()
Instead, versions are introduced that take a detail::MetadataVectorPtr which
can hold the network timestamp metadata and is meant to be allocated by the
caller instead of automatically during Enqueue() or within the Event
constructor.

This also introduces a BifConst ``EventMetadata::add_network_timestamp`` to
opt-in adding network timestamps to events globally. It's disabled by
default as there are not a lot of known use cases that need this.
2025-05-23 19:32:23 +02:00
Arne Welzel
cc7dc60c1e EventRegistry/zeek.bif/init-bare: Add event metadata infrastructure
Introduce a new EventMetadata module and members on EventMgr to register
event metadata types.
2025-05-23 19:31:58 +02:00
Christian Kreibich
fdecfba6b4 Merge branch 'smoot-improve-from_json' of github.com:/stevesmoot/zeek
* 'smoot-improve-from_json' of github.com:/stevesmoot/zeek:
  update baseline for zam
  Update src/zeek.bif
  Change from_json to return an error rather than print it.
2025-05-19 11:06:29 -07:00
Jan Grashoefer
84cc4b890d Add STLS command to POP3 DPD signature 2025-05-14 16:37:25 +02:00
Arne Welzel
a61aff010f cluster/websocket: Propagate code and reason to websocket_client_lost()
This allows to get visibility into the reason why ixwebsocket or the
client decided to disconnect.

Closed #4440
2025-05-13 18:26:03 +02:00
Arne Welzel
aaddeb19ad cluster/websocket: Support configurable ping interval
Primarily for testing purposes and maybe the hard-coded 5 seconds is too
aggressive for some deployments, so makes sense for it to be
configurable.
2025-05-13 18:26:03 +02:00
Christian Kreibich
738ce1c235 Bugfix: accurately track Broker buffer overflows w/ multiple peerings
When a node restarts or a peering between two nodes starts over for other
reasons, the internal tracking in the Broker manager resets its state (since
it's per-peering), and thus the message overflow counter. The script layer was
unaware of this, and threw errors when trying to reset the corresponding counter
metric down to zero at sync time.

We now track past buffer overflows via a separate epoch table, using Broker peer
ID comparisons to identify new peerings, and set the counter to the sum of past
and current overflows.

I considered just making this a gauge, but it seems more helpful to be able to
look at a counter to see whether any messages have ever been dropped over the
lifetime of the node process.

As an aside, this now also avoids repeatedly creating the labels vector,
re-using the same one for each metric.

Thanks to @pbcullen for identifying this one!
2025-05-07 17:27:38 -07:00
Tim Wojtulewicz
2cf8497bf7 Merge remote-tracking branch 'origin/topic/timw/update-ct-ca-lists'
* origin/topic/timw/update-ct-ca-lists:
  External tests: add removed logs to CT list to prevent baseline changes
  Update Mozilla CA list and CT list to NSS 3.110
2025-04-29 08:53:04 -07:00
Kshitiz Bartariya
40935c31b1 Ignore case when matching prefix in http analyzer 2025-04-25 10:33:11 -07:00
Christian Kreibich
68fadd0464 Lower listen/connect retry intervals in Broker and the cluster framework to 1sec
The former defaults (30sec, 1min) can slow down cluster startup and recovery
considerably, and other systems have more aggressive intervals still.
2025-04-25 10:22:35 -07:00
Christian Kreibich
841a40ff88 Switch Broker's default backpressure policy to drop_oldest, bump buffer sizes
At every site where we've dug into backpressure disconnect findings, it has been
the case that the default values were too small. 8192, so 4x the old default,
suffices at every site to drown out premature disconnects.

With metrics now available for the send buffers regardless of backpressure
overflow policy, this also switches the default from "disconnect" to
"drop_oldest" (for both peers and websockets), meaning that peerings remain
untouched but the oldest queued message simply gets dropped when a new message
is enqueued. With this policy, the number of backpressure overflows is then
simply the count of discarded messages, something that users can tune to see
drop to zero in everyday use.  Another benefit is that marginal overflows cause
less message loss than when an entire buffer's worth (plus potentially more
in-flight messages) gets thrown out with a disconnect.
2025-04-25 10:22:35 -07:00
Christian Kreibich
5008f586ea Deprecate Broker::congestion_queue_size and stop using it internally
Since a reorg in the Broker library (commit b04195183) that revamped flow
control and that we pulled in with Zeek 5.0, this setting hasn't done
anything. Broker's endpoint::make_subscriber() and
endpoint::make_status_subscriber() take a queue size argument (with a default
value) that simply gets dropped in the eventual subscriber::make() call. See:

b041951835 (diff-5c0d2baa7981caeb6a4080708ddca6ad929746d10c73d66598e46d7c2c03c8deL34-R178)
2025-04-25 10:22:35 -07:00
Christian Kreibich
c1a5f70df8 Merge branch 'topic/christian/broker-backpressure-metrics'
* topic/christian/broker-backpressure-metrics:
  Add basic btest to verify that Broker peering telemetry is available.
  Add cluster framework telemetry for Broker's send-buffer use
  Add peer buffer update tracking to the Broker manager's event_observer
  Rename the Broker manager's LoggerAdapter
  Avoid race in the cluster/broker/publish-any btest
2025-04-25 10:04:09 -07:00