Commit graph

38 commits

Author SHA1 Message Date
Arne Welzel
01666df3d7 cluster/zeromq: Fix Cluster::subscribe() block if not initialized
If Cluster::init() hasn't been invoked yet, Cluster::subscribe() with the
ZeroMQ backend would block because the main_inproc socket didn't
yet have a connection from the child thread. Prevent this by connecting
the main and child socket pair at construction time.

This will queue the subscriptions and start processing them once the
child thread has started.
2025-09-25 18:58:35 +02:00
Arne Welzel
a318463c1c cluster/zeromq: Improve EINTR handling
When using ZeroMQ also within the Supervisor process, zmq::poll() and
recv() were observed to return EINTR, handle these.
2025-09-25 13:52:12 +02:00
Tim Wojtulewicz
b592b6c998 Use .contains() instead of .find() or .count() 2025-09-02 16:42:52 +00:00
Arne Welzel
d2bb86f8b4 cluster/zeromq: Metric for msg errors 2025-07-29 11:23:53 +02:00
Arne Welzel
073de9f5fd cluster/zeromq: Drop events when overloaded
When either the XPUB socket's hwm is reached, or the onloop queue is
full, drop the events. Users can set ths xpub_sndhwm and
onloop_queue_hwm to 0 to avoid these drops at the risk of unbounded
memory growth.
2025-07-29 11:23:53 +02:00
Arne Welzel
5de9296c77 cluster/zeromq: Comments and move lookups to InitPostScript() 2025-07-29 11:23:53 +02:00
Arne Welzel
85d5dda028 cluster/zeromq: Rework lambdas to member functions 2025-07-29 11:23:53 +02:00
Arne Welzel
5dc4586b70 cluster/zeromq: Support local XPUB/XSUB hwm and buf configurability 2025-07-29 11:23:53 +02:00
Tim Wojtulewicz
1f87382302 Fix some missing #includes resulting from removal of ghc::filesystem 2025-07-14 11:23:54 -07:00
Arne Welzel
1afd497c0c cluster/zeromq: Short-circuit DoPublishLogWrite() when not initialized
After moving the log_push initialization from the constructor to the
DoInit() method, it's now possible that DoPublishLogWrites() is invoked
even if DoInit() was never called. Handle this by short-circuiting. This
is sort of an error, but can happen during tests if scripts are loaded
somewhat arbitrarily.
2025-06-24 17:12:45 +02:00
Arne Welzel
89c0b0faf3 cluster/zeromq: Hook up and enable IPV6 by default
ZeroMQ's IPv6 support isn't enabled by default, resulting in
"No such device" errors when attempting to listen on an IPv6
address. This change adds a ipv6 option to the ZeroMQ module
and enables it by default. Further, adds a test configuring
everything to listen on IPv6 ::1 as well, and one test to provoke
the original error. This also regularizes some error messages.

The addr_to_uri() calls weren't actually needed, but they apparently do
not hurt and the result is easier on the eyes, so use them :-)
2025-06-24 17:12:45 +02:00
Arne Welzel
a20a2fe6e0 cluster/zeromq: Move log_push creation to DoInit()
The log_push socket should be affected by the IPV6 option, so need to
delay its creation a bit.
2025-06-20 11:17:49 +02:00
Tim Wojtulewicz
850b20e12b Move Deferred class from ZeroMQ to util 2025-06-05 10:21:50 -07:00
Tim Wojtulewicz
460fe24a9a Fix clang-tidy cppcoreguidelines-macro-usage findings (macro functions) 2025-06-04 09:24:05 -07:00
Tim Wojtulewicz
f4c47d0357 Fix clang-tidy performance-enum-size warnings 2025-05-30 08:12:29 -07:00
Arne Welzel
643b926625 cluster/zeromq: Implement DoReadyToPublishCallback()
The ZeroMQ heuristic for "ready to publish" is to create an unique and
ephemeral subscription using the XSUB socket and observe it arrive on the
XPUB socket. At this point, visibility into other node's subscriptions
is provided.
2025-04-25 09:57:06 +00:00
Arne Welzel
63d31d7d9f zeromq: Call super class DoTerminate() after stopping thread
The internal ZeroMQ thread would call QueueForProcessing() thereby
accessing the onloop member. As ThreadedBackend::DoTerminate() unsets it,
this was a) reported as a data race by TSAN and b) potentially caused
missed events that were still to be queued.
2025-04-24 09:35:20 +02:00
Arne Welzel
6bd624d9b2 cluster/zeromq: Attempt publish during termination
Explicitly notify the internal thread about the shutdown via the
inproc socket pair. This ensures that the internal thread processes
all previous messages on the inproc socket before terminating.

This fixes the scenario where a backend is created, a few messages published
and then immediately terminated as can be done with WebSocket clients.
Previously, some of the messages published might have still been in the
inproc socket's queue and were simply discarded.

Adds the same test for Broker and ZeroMQ backends.
2025-04-23 14:27:43 +02:00
Tim Wojtulewicz
1169fcf2a2 Move byte_buffer types from cluster and storage into util 2025-04-14 10:11:13 -07:00
Arne Welzel
3946856f06 cluster/Backend: Add name and lookup component tag
This adds two new accessors on Backend, Name() and Tag() that can
be used for introspection of a Backend instance.
2025-04-11 10:01:30 +02:00
Arne Welzel
bfffc8dac8 cluster/zeromq: Improve XPUB stall behavior, add a metric
Instead of fprintf, track the number of occurrences via a metric and
change the sleep loop to a blocking send instead.
2025-03-26 14:23:09 +01:00
Arne Welzel
eb1f9f9a42 cluster/zeromq: Catch log_push.send() exception 2025-03-10 17:07:30 +01:00
Arne Welzel
b82dcfafa4 cluster/zeromq: Catch exceptions as const zmq::error_t& 2025-03-10 17:07:30 +01:00
Arne Welzel
8a1abfa8ef cluster/zeromq: No assert on inproc handling
This might happen if we didn't succeed in completely sending a multipart
message and stop early.
2025-03-10 17:07:30 +01:00
Arne Welzel
aad512c616 cluster/zeromq: Support configuring IO threads for proxy thread 2025-03-10 17:07:30 +01:00
Arne Welzel
ba7b605a97 cluster/zeromq: Move variable lookups from DoInit() to DoInitPostScript() 2025-03-10 17:07:30 +01:00
Arne Welzel
540d9da5ef cluster/zeromq: Handle EINTR at shutdown
Read ::signal_val and early exit a DoPublish() in case termination
happened while blocked in inproc.send()
2025-03-10 17:07:30 +01:00
Arne Welzel
94ec3af2b0 cluster/zeromq: Queue one message at a time
Queueing multiple messages can easily overload the IO loop without
creating any backpressure.
2025-03-10 17:07:30 +01:00
Arne Welzel
827eccb732 cluster/zeromq: Adapt for OnLoopProcess changes 2025-03-10 17:07:30 +01:00
Arne Welzel
6008e67008 cluster/zeromq: Call DoTerminate() in destructor
Normal life-cycle is that Terminate() / DoTerminate() is called
by zeek-setup code. If that doesn't happen, shutdown and join
threads during destructor.

try { } catch (...) suggested by Benjamin.
2025-02-05 16:39:44 +01:00
Arne Welzel
2c6d934ef4 cluster/zeromq: Use lambda for thread trampoline 2025-02-05 16:38:24 +01:00
Arne Welzel
16c745cee4 cluster/zeromq: Do not call util::fmt() from thread
...util::fmt() uses a static buffer, so this is problematic.

I've dabbled a bit replacing std::thread with using threading::BasicThread
which would offer Fmt(), but this makes things more complicated. Primarily
as BasicThread is registered with the thread manager and the shutdown
interactions become entangled. The thread might be terminated before the
backend, or vice-versa. Seems nicer for the thread to be owned by the backend.
2025-02-05 16:38:24 +01:00
Arne Welzel
9c5c0f40e1 cluster/zeromq: Fix Unsubscribe() bug caused by \x00 prefix 2025-02-05 10:39:56 +01:00
Arne Welzel
e8f87019c6 cluster: Add SubscribeCallback support
This allows callers of Subscribe() to pass in a callback that will be invoked
once the subscription is established or failed to establish. It is the
backend's responsibility to execute the callback on the main thread either
synchronously, or preferably asynchronously at a later point, by
scheduling a task on the IO main loop.

This turns on ZMQ_XPUB_VERBOSE for ZeroMQ so that notifications about
subscriptions are raised even if the subscriptions has previously been
observed.
2025-02-05 10:39:56 +01:00
Arne Welzel
fa22f91ca4 cluster/zeromq: Fix XSUB threading issues
It is not safe to use the same socket from different threads, but the
current code used the xsub socket directly from the main thread (to setup
subscriptions) and from the internal thread for polling and reading.

Leverage the PAIR socket already in use for forwarding publish operations
to the internal thread also for subscribe and unsubscribe.

The failure mode is/was a bit annoying. Essentially, closing of the
context would hang indefinitely in zmq_ctx_term().
2025-02-05 10:39:56 +01:00
Arne Welzel
df78a94c76 cluster/zeromq: Use NodeId(), drop my_node_id 2025-02-05 10:39:56 +01:00
Arne Welzel
0b7a660a34 cluster/Backend: Make backend event processing customizable
This allows configurability at the code level to decide what to do with
a received remote events and events produced by a backend. For now, only
enqueue events into the process's script layer, but for the WebSocket
interface, the action would be to send out the event on a WebSocket
connection instead.
2025-02-05 10:39:56 +01:00
Arne Welzel
35c79ab2e3 cluster/backend/zeromq: Add ZeroMQ based cluster backend
This is a cluster backend implementation using a central XPUB/XSUB proxy
that by default runs on the manager node. Logging is implemented leveraging
PUSH/PULL sockets between logger and other nodes, rather than going
through XPUB/XSUB.

The test-all-policy-cluster baseline changed: Previously, Broker::peer()
would be called from setup-connections.zeek, causing the IO loop to be
alive. With the ZeroMQ backend, the IO loop is only alive when
Cluster::init() is called, but that doesn't happen anymore.
2024-12-10 20:33:02 +01:00