The overload-drop.zeek and overload-no-drop.zeek tests have proxy,
worker-1 and worker-2 publish to the manager topic. For the drop
case, we verify that both, the senders, but also the manager drops
events. For the no-drop test, the HWMs are set such that all events
are buffered.
The overload-worker-proxy-topic*.zeek tests are similar, but instead
of publishing to the manager topic, proxy, worker-1 and worker-2 publish
to the proxy and worker topics to overload each other. This had
previously resulted in lockups and these tests verify that this doesn't
happen anymore.
A bit ad-hoc formatting for the log, but that's mostly because cluster.log
only has message field and I don't think having a dedicated application_name
column is worth it. That could also be added by custom scripts if it's really
wanted for a given deployment.
ZeroMQ's IPv6 support isn't enabled by default, resulting in
"No such device" errors when attempting to listen on an IPv6
address. This change adds a ipv6 option to the ZeroMQ module
and enables it by default. Further, adds a test configuring
everything to listen on IPv6 ::1 as well, and one test to provoke
the original error. This also regularizes some error messages.
The addr_to_uri() calls weren't actually needed, but they apparently do
not hurt and the result is easier on the eyes, so use them :-)
This only changes the script-layer API, but keeps the std::string host
in the C++ layer's ServerOptions. Mostly because the ixwebsocket library
takes host as std::string. Also, maybe at some point we'd want to
support something scheme-based like unix:///var/run/zeek.sock and placing
that in a string could not be totally wrong.
Add tests for IPV6, too.
I believe there's a bug/usage issue in the websockets library
where during send(), EOF is detected and stored, but the receiving
thread is then discarding the last received frame. Avoid the bug
by replacing the close_socket() implementation of the websockets
library just for that test and leave detecting the EOF condition
to the receiving thread.
The terminate-while-queueing test added for #4428 failed spuriously
indicating that sometimes WebSocket clients receive code 1000 instead of 1001.
This happens if the ixwebsocket server is shutdown before the reply thread had a
chance to process queued close messages.
Fix by signaling and waiting for the dispatcher's reply thread to terminate
before returning from Terminate().
Terminate() is called when Zeek shuts down. If WebSocket client threads
were blocked in QueueForProcessing() due to reaching queue limits, these
previously would not exit QueueForProcessing() and instead block
indefinitely, resulting in the ixwebsocket library blocking and its
garbage collection thread running at 100%. Not great.
Closing the onloop instance will unblock the WebSocket client threads
for a timely shutdown.
Closes#4420
Instead of simply removing holes from vectors or lists when converting
from Val to Broker format, error out as the receiver has no chance to
reconstruct where the hole might have been.
We could encode holes with broker::none, but this will put unnecessary
burden on language bindings and users due to the potential optionality.
Think a std::vector<uint64_t> that technically needs to be a
std::vector<std::optional<uint64_t>> to represent optional elements
properly.
Closes#3045
* topic/christian/broker-backpressure-metrics:
Add basic btest to verify that Broker peering telemetry is available.
Add cluster framework telemetry for Broker's send-buffer use
Add peer buffer update tracking to the Broker manager's event_observer
Rename the Broker manager's LoggerAdapter
Avoid race in the cluster/broker/publish-any btest
On very busy machines the hardwired scheduling of the ping batches could move
around among the arriving pongs, causing baseline deviations. We now wait for
each batch to complete before triggering the next one.
Limit the number WebSocket events queued from external clients to
dispatcher instances to produce back pressure to the clients if
Zeek's IO loop is overloaded.
Explicitly notify the internal thread about the shutdown via the
inproc socket pair. This ensures that the internal thread processes
all previous messages on the inproc socket before terminating.
This fixes the scenario where a backend is created, a few messages published
and then immediately terminated as can be done with WebSocket clients.
Previously, some of the messages published might have still been in the
inproc socket's queue and were simply discarded.
Adds the same test for Broker and ZeroMQ backends.
Due to prefix matching, worker-1's node_topic() also matched worker-10,
worker-11, etc. Suffix the node topic with a `.`. The original implementation
came from NATS, where subjects are separated by `.`.
Adapt nodeid_topic() for consistency.
These test were very sensible to the speed at which ZeroMQ distributes
subscriptions in the cluster and showed to be unreliably when testing with
zeek/btest#113.
The main fix here is to have individual WebSocket clients subscribe to unique
topics, e.g /test/client-0 and /test/client-1, instead of just a shared topic.
This ensures the WebSocket handshake completes only when they observed their
own subscriptions and not prematurely when observing the shared topic.
This seems mainly relevant for tests: In the real world one shouldn't
rely on subscription visibility - you miss messages if you're too late
to the party.
When two workers connect to zeek.cluster.worker, the central ZeroMQ
proxy would not propagate unsubscription information to other nodes
once they both left. Set ZMQ_XPUB_VERBOSER on the proxies XPUB socket
for visibility.
Same as what we do in Broker. Use the expected type if publishing
a table() or set() parameter.
This fixes issues when switching sumstats to Cluster::publish()
The broker serializer leverages the existing data_to_val() function.
During unserialization, if the destination type is any, the logic
simply wraps the broker::data value into a Broker::Data record.
Therefore, events with any parameters are currently exposed to
the Broker::Data type.
There is a bigger issue in that re-publishing such Broker::Data
instances would encode them as a normal record. Explicitly prevent
this by serializing the contained data value directly instead, similar
to what Broker already did when publishing a record.
This is a cluster backend implementation using a central XPUB/XSUB proxy
that by default runs on the manager node. Logging is implemented leveraging
PUSH/PULL sockets between logger and other nodes, rather than going
through XPUB/XSUB.
The test-all-policy-cluster baseline changed: Previously, Broker::peer()
would be called from setup-connections.zeek, causing the IO loop to be
alive. With the ZeroMQ backend, the IO loop is only alive when
Cluster::init() is called, but that doesn't happen anymore.
This is a serializer for log records that is using SerialTypes
for serializing and un-serializing rather. Essentially, this is
similar to what broker does except for the envelope.