Commit graph

48 commits

Author SHA1 Message Date
Arne Welzel
1a87ebab72 cluster: Add on_subscribe() and on_unsubscribe() hooks
Closes #4176
2025-08-01 14:06:19 +02:00
Arne Welzel
c8307487d1 btest/cluster/zeromq: Add tests for overload behavior
The overload-drop.zeek and overload-no-drop.zeek tests have proxy,
worker-1 and worker-2 publish to the manager topic. For the drop
case, we verify that both, the senders, but also the manager drops
events. For the no-drop test, the HWMs are set such that all events
are buffered.

The overload-worker-proxy-topic*.zeek tests are similar, but instead
of publishing to the manager topic, proxy, worker-1 and worker-2 publish
to the proxy and worker topics to overload each other. This had
previously resulted in lockups and these tests verify that this doesn't
happen anymore.
2025-07-29 11:23:53 +02:00
Arne Welzel
1d931b5a2f cluster/WebSocket: Include X-Application-Name in cluster.log
A bit ad-hoc formatting for the log, but that's mostly because cluster.log
only has message field and I don't think having a dedicated application_name
column is worth it. That could also be added by custom scripts if it's really
wanted for a given deployment.
2025-06-30 17:55:24 +02:00
Arne Welzel
5c6a6d9427 cluster/websocket: Fix and test for invalid X-Application-Name 2025-06-30 13:22:31 +02:00
Arne Welzel
0e1431eef4 btest/cluster/telemetry: Add smoke testing for telemetry 2025-06-25 17:13:01 +02:00
Arne Welzel
89c0b0faf3 cluster/zeromq: Hook up and enable IPV6 by default
ZeroMQ's IPv6 support isn't enabled by default, resulting in
"No such device" errors when attempting to listen on an IPv6
address. This change adds a ipv6 option to the ZeroMQ module
and enables it by default. Further, adds a test configuring
everything to listen on IPv6 ::1 as well, and one test to provoke
the original error. This also regularizes some error messages.

The addr_to_uri() calls weren't actually needed, but they apparently do
not hurt and the result is easier on the eyes, so use them :-)
2025-06-24 17:12:45 +02:00
Arne Welzel
77f1337b4c btest/cluster/websocket: Add cert-less test 2025-06-16 13:47:33 +02:00
Arne Welzel
544d571089 cluster/websocket: Deprecate $listen_host, introduce $listen_addr
This only changes the script-layer API, but keeps the std::string host
in the C++ layer's ServerOptions. Mostly because the ixwebsocket library
takes host as std::string. Also, maybe at  some point we'd want to
support something scheme-based like unix:///var/run/zeek.sock and placing
that in a string could not be totally wrong.

Add tests for IPV6, too.
2025-05-30 11:02:41 +02:00
Arne Welzel
277c3f5245 btest: Add test for Cluster::hello zero-timestamp 2025-05-26 16:08:27 +02:00
Arne Welzel
a15df5fc11 btest/cluster: Use generic cluster-layout.zeek 2025-05-20 20:30:01 +02:00
Arne Welzel
6d2bd93f1f btest/cluster/websocket: Update tests for new event signature 2025-05-13 18:26:03 +02:00
Arne Welzel
a61aff010f cluster/websocket: Propagate code and reason to websocket_client_lost()
This allows to get visibility into the reason why ixwebsocket or the
client decided to disconnect.

Closed #4440
2025-05-13 18:26:03 +02:00
Arne Welzel
3ec3205074 btest/cluster/generic/publish-any: Apply Christian's fix from broker/publish-any 2025-05-07 17:18:01 +02:00
Arne Welzel
82731992d9 wstest/terminate-while-queueing: Patch close_socket()
I believe there's a bug/usage issue in the websockets library
where during send(), EOF is detected and stored, but the receiving
thread is then discarding the last received frame. Avoid the bug
by replacing the close_socket() implementation of the websockets
library just for that test and leave detecting the EOF condition
to the receiving thread.
2025-05-07 16:33:54 +02:00
Arne Welzel
ca02316671 cluster/websocket: Stop and wait for reply thread during Terminate()
The terminate-while-queueing test added for #4428 failed spuriously
indicating that sometimes WebSocket clients receive code 1000 instead of 1001.
This happens if the ixwebsocket server is shutdown before the reply thread had a
chance to process queued close messages.

Fix by signaling and waiting for the dispatcher's reply thread to terminate
before returning from Terminate().
2025-05-07 12:45:01 +02:00
Arne Welzel
3be7a9ce91 Merge remote-tracking branch 'origin/topic/awelzel/double-commented-btest-lines'
* origin/topic/awelzel/double-commented-btest-lines:
  testing/btest: Fix double commented @TEST- lines
2025-05-06 14:21:03 +02:00
Arne Welzel
bb06af601f Websocket: Close onloop during Terminate()
Terminate() is called when Zeek shuts down. If WebSocket client threads
were blocked in QueueForProcessing() due to reaching queue limits, these
previously would not exit QueueForProcessing() and instead block
indefinitely, resulting in the ixwebsocket library blocking and its
garbage collection thread running at 100%. Not great.

Closing the onloop instance will unblock the WebSocket client threads
for a timely shutdown.

Closes #4420
2025-05-06 14:19:08 +02:00
Arne Welzel
0e327a0c12 testing/btest: Fix double commented @TEST- lines
sed -i 's/^# # @/# @/g'
2025-05-06 14:06:29 +02:00
Arne Welzel
7092db6318 broker/Data/data_to_val: Fail on vectors/lists with holes
Instead of simply removing holes from vectors or lists when converting
from Val to Broker format, error out as the receiver has no chance to
reconstruct where the hole might have been.

We could encode holes with broker::none, but this will put unnecessary
burden on language bindings and users due to the potential optionality.
Think a std::vector<uint64_t> that technically needs to be a
std::vector<std::optional<uint64_t>> to represent optional elements
properly.

Closes #3045
2025-04-28 18:23:37 +02:00
Christian Kreibich
c1a5f70df8 Merge branch 'topic/christian/broker-backpressure-metrics'
* topic/christian/broker-backpressure-metrics:
  Add basic btest to verify that Broker peering telemetry is available.
  Add cluster framework telemetry for Broker's send-buffer use
  Add peer buffer update tracking to the Broker manager's event_observer
  Rename the Broker manager's LoggerAdapter
  Avoid race in the cluster/broker/publish-any btest
2025-04-25 10:04:09 -07:00
Arne Welzel
43a1bab960 btest/cluster/websocket: Move no-subscriptions test
...and also add one for broker.
2025-04-25 10:01:23 +00:00
Christian Kreibich
89780514fa Avoid race in the cluster/broker/publish-any btest
On very busy machines the hardwired scheduling of the ping batches could move
around among the arriving pongs, causing baseline deviations. We now wait for
each batch to complete before triggering the next one.
2025-04-24 13:09:10 -07:00
Arne Welzel
2a6beae50b btest/cluster: Testing cleanup 2025-04-24 09:35:53 +02:00
Arne Welzel
23f0370e91 cluster/websocket: Short-circuit clients without subscriptions 2025-04-24 08:14:56 +02:00
Arne Welzel
f2e60fdaff btest/cluster: Add broker logging test for sanity
Not very related to the PR, but created to help provoke an issue
with the broker changes.
2025-04-23 14:27:43 +02:00
Arne Welzel
011029addc cluster/websocket: Make websocket dispatcher queue size configurable
Limit the number WebSocket events queued from external clients to
dispatcher instances to produce back pressure to the clients if
Zeek's IO loop is overloaded.
2025-04-23 14:27:43 +02:00
Arne Welzel
6bd624d9b2 cluster/zeromq: Attempt publish during termination
Explicitly notify the internal thread about the shutdown via the
inproc socket pair. This ensures that the internal thread processes
all previous messages on the inproc socket before terminating.

This fixes the scenario where a backend is created, a few messages published
and then immediately terminated as can be done with WebSocket clients.
Previously, some of the messages published might have still been in the
inproc socket's queue and were simply discarded.

Adds the same test for Broker and ZeroMQ backends.
2025-04-23 14:27:43 +02:00
Arne Welzel
0c8f52664d btest/cluster/websocket: Add tests using broker
Add tests to verify Cluster::listen_websocket() with the Broker backend
is functional.
2025-04-23 14:27:43 +02:00
Arne Welzel
3319615c65 btest/cluster/websocket: Move ZeroMQ test and use wstest.py
Adapt the test to be the same as Broker, to have "expected" behavior.
2025-04-23 14:27:43 +02:00
Arne Welzel
85b8c8866b testing/btest/*zeek: Comment all @TEST lines 2025-04-17 16:30:23 +02:00
Arne Welzel
50b26fcea8 btest/cluster/websocket: ZeroMQ backend test
This test ensures that WebSocket clients connected to the same node see
each other's messages.
2025-03-24 18:36:52 +01:00
Arne Welzel
2963c49f27 cluster/zeromq: Fix node_topic() and nodeid_topic()
Due to prefix matching, worker-1's node_topic() also matched worker-10,
worker-11, etc. Suffix the node topic with a `.`. The original implementation
came from NATS, where subjects are separated by `.`.

Adapt nodeid_topic() for consistency.
2025-03-24 18:36:26 +01:00
Arne Welzel
888af244b2 btest/cluster/websocket: Harden multi-client tests
These test were very sensible to the speed at which ZeroMQ distributes
subscriptions in the cluster and showed to be unreliably when testing with
zeek/btest#113.

The main fix here is to have individual WebSocket clients subscribe to unique
topics, e.g /test/client-0 and /test/client-1, instead of just a shared topic.

This ensures the WebSocket handshake completes only when they observed their
own subscriptions and not prematurely when observing the shared topic.

This seems mainly relevant for tests: In the real world one shouldn't
rely on subscription visibility - you miss messages if you're too late
to the party.
2025-03-24 18:36:26 +01:00
Arne Welzel
3885871e7d cluster/zeromq: Fix unsubscription visibility
When two workers connect to zeek.cluster.worker, the central ZeroMQ
proxy would not propagate unsubscription information to other nodes
once they both left. Set ZMQ_XPUB_VERBOSER on the proxies XPUB socket
for visibility.
2025-03-24 18:36:16 +01:00
Arne Welzel
6032741868 cluster/websocket: Implement WebSocket server 2025-03-10 17:07:30 +01:00
Arne Welzel
9c5c0f40e1 cluster/zeromq: Fix Unsubscribe() bug caused by \x00 prefix 2025-02-05 10:39:56 +01:00
Arne Welzel
6d1259423e cluster/serializer/broker: Fix handler lookup
Handler overwrites operator bool, so need to explicitly test for nullptr
rather than not having any handlers defined.
2025-02-05 10:39:56 +01:00
Arne Welzel
21e33fdcd9 btest/cluster: Bump timeouts to 30 seconds
ZAM startup may take a long time, particularly in CI environments, so
bump it up from 10 to 30 seconds.
2024-12-13 18:28:43 +01:00
Arne Welzel
fdf783df65 cluster/Backend: Handle unspecified table/set
Same as what we do in Broker. Use the expected type if publishing
a table() or set() parameter.

This fixes issues when switching sumstats to Cluster::publish()
2024-12-12 17:54:42 +01:00
Arne Welzel
d9a74cf32d cluster: Fix Cluster::publish() of Broker::Data
The broker serializer leverages the existing data_to_val() function.
During unserialization, if the destination type is any, the logic
simply wraps the broker::data value into a Broker::Data record.
Therefore, events with any parameters are currently exposed to
the Broker::Data type.

There is a bigger issue in that re-publishing such Broker::Data
instances would encode them as a normal record. Explicitly prevent
this by serializing the contained data value directly instead, similar
to what Broker already did when publishing a record.
2024-12-12 17:54:37 +01:00
Arne Welzel
0ad3210177 Broker::publish: Warn on using Broker::publish() when inactive
This is mostly for transitioning base scripts to Cluster::publish() and
avoid silent surprises why certain things don't work when using ZeroMQ.
2024-12-11 17:20:42 +01:00
Arne Welzel
d816bfb249 btest/generic: Add publish_hrw(), publish_rr() and logging tests
They currently use zeromq, but technically they should be valid for
any other backend, too, even broker.
2024-12-10 20:33:02 +01:00
Arne Welzel
35c79ab2e3 cluster/backend/zeromq: Add ZeroMQ based cluster backend
This is a cluster backend implementation using a central XPUB/XSUB proxy
that by default runs on the manager node. Logging is implemented leveraging
PUSH/PULL sockets between logger and other nodes, rather than going
through XPUB/XSUB.

The test-all-policy-cluster baseline changed: Previously, Broker::peer()
would be called from setup-connections.zeek, causing the IO loop to be
alive. With the ZeroMQ backend, the IO loop is only alive when
Cluster::init() is called, but that doesn't happen anymore.
2024-12-10 20:33:02 +01:00
Arne Welzel
210b54799e cluster: Move publish_hrw() and publish_rr() to cluster.bif
From this point on, Cluster::publish_hrw() and Cluster::publish_rr()
go through cluster/Backend.cc code.
2024-12-10 20:33:02 +01:00
Arne Welzel
fdde1e9841 cluster/serializer: Add binary-serialization-format
This is a serializer for log records that is using SerialTypes
for serializing and un-serializing rather. Essentially, this is
similar to what broker does except for the envelope.
2024-12-04 12:40:35 +01:00
Arne Welzel
9ec872d161 cluster/serializer: Add Broker based event serializers
This adds the first event serializers that use
broker functionality. Binary and JSON formats.
2024-11-26 12:58:23 +01:00
Arne Welzel
ef04a199c8 cluster: Add Cluster scoped bifs
... and a broker based test using Cluster::publish() and
Cluster::subscribe().
2024-11-26 12:58:23 +01:00
Arne Welzel
de9d39cd01 btest: Add cluster dir, minimal test for enum value 2024-11-22 10:43:55 +01:00