Commit graph

9 commits

Author SHA1 Message Date
Arne Welzel
01666df3d7 cluster/zeromq: Fix Cluster::subscribe() block if not initialized
If Cluster::init() hasn't been invoked yet, Cluster::subscribe() with the
ZeroMQ backend would block because the main_inproc socket didn't
yet have a connection from the child thread. Prevent this by connecting
the main and child socket pair at construction time.

This will queue the subscriptions and start processing them once the
child thread has started.
2025-09-25 18:58:35 +02:00
Arne Welzel
a318463c1c cluster/zeromq: Improve EINTR handling
When using ZeroMQ also within the Supervisor process, zmq::poll() and
recv() were observed to return EINTR, handle these.
2025-09-25 13:52:12 +02:00
Arne Welzel
c8307487d1 btest/cluster/zeromq: Add tests for overload behavior
The overload-drop.zeek and overload-no-drop.zeek tests have proxy,
worker-1 and worker-2 publish to the manager topic. For the drop
case, we verify that both, the senders, but also the manager drops
events. For the no-drop test, the HWMs are set such that all events
are buffered.

The overload-worker-proxy-topic*.zeek tests are similar, but instead
of publishing to the manager topic, proxy, worker-1 and worker-2 publish
to the proxy and worker topics to overload each other. This had
previously resulted in lockups and these tests verify that this doesn't
happen anymore.
2025-07-29 11:23:53 +02:00
Arne Welzel
89c0b0faf3 cluster/zeromq: Hook up and enable IPV6 by default
ZeroMQ's IPv6 support isn't enabled by default, resulting in
"No such device" errors when attempting to listen on an IPv6
address. This change adds a ipv6 option to the ZeroMQ module
and enables it by default. Further, adds a test configuring
everything to listen on IPv6 ::1 as well, and one test to provoke
the original error. This also regularizes some error messages.

The addr_to_uri() calls weren't actually needed, but they apparently do
not hurt and the result is easier on the eyes, so use them :-)
2025-06-24 17:12:45 +02:00
Arne Welzel
2963c49f27 cluster/zeromq: Fix node_topic() and nodeid_topic()
Due to prefix matching, worker-1's node_topic() also matched worker-10,
worker-11, etc. Suffix the node topic with a `.`. The original implementation
came from NATS, where subjects are separated by `.`.

Adapt nodeid_topic() for consistency.
2025-03-24 18:36:26 +01:00
Arne Welzel
3885871e7d cluster/zeromq: Fix unsubscription visibility
When two workers connect to zeek.cluster.worker, the central ZeroMQ
proxy would not propagate unsubscription information to other nodes
once they both left. Set ZMQ_XPUB_VERBOSER on the proxies XPUB socket
for visibility.
2025-03-24 18:36:16 +01:00
Arne Welzel
9c5c0f40e1 cluster/zeromq: Fix Unsubscribe() bug caused by \x00 prefix 2025-02-05 10:39:56 +01:00
Arne Welzel
21e33fdcd9 btest/cluster: Bump timeouts to 30 seconds
ZAM startup may take a long time, particularly in CI environments, so
bump it up from 10 to 30 seconds.
2024-12-13 18:28:43 +01:00
Arne Welzel
35c79ab2e3 cluster/backend/zeromq: Add ZeroMQ based cluster backend
This is a cluster backend implementation using a central XPUB/XSUB proxy
that by default runs on the manager node. Logging is implemented leveraging
PUSH/PULL sockets between logger and other nodes, rather than going
through XPUB/XSUB.

The test-all-policy-cluster baseline changed: Previously, Broker::peer()
would be called from setup-connections.zeek, causing the IO loop to be
alive. With the ZeroMQ backend, the IO loop is only alive when
Cluster::init() is called, but that doesn't happen anymore.
2024-12-10 20:33:02 +01:00