Provide a mechanism to allow a cluster backend report when it is ready
for publish operations. This is primarily useful for ZeroMQ which has
sender-side filtering and is only really ready for publishing when it
has learned about subscriptions from other nodes.
* origin/topic/awelzel/comment-out-broker-websocket-shim-two-endpoint-tests:
broker/WebSocketShim/tests: Comment out two endpoint tests
broker/WebSocketShim/tests: Replace hard-coded timeout values with vars
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.
This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.
This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.
To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
On very busy machines the hardwired scheduling of the ping batches could move
around among the arriving pongs, causing baseline deviations. We now wait for
each batch to complete before triggering the next one.
The internal ZeroMQ thread would call QueueForProcessing() thereby
accessing the onloop member. As ThreadedBackend::DoTerminate() unsets it,
this was a) reported as a data race by TSAN and b) potentially caused
missed events that were still to be queued.
* origin/topic/awelzel/websocket-empty-subscriptions:
cluster/websocket: Short-circuit clients without subscriptions
cluster/websocket: Factor out active subscription handling
* origin/topic/timw/no-islower-before-toupper:
Statically lookup field offsets for connection values in UDP and ICMP analyzers
Skip calling islower before toupper
Ubuntu 20.04's default Python doesn't deal well with the type annotations
used in btest/Files/wstest.py. Given that it's about to be EOL, just remove it.
Limit the number WebSocket events queued from external clients to
dispatcher instances to produce back pressure to the clients if
Zeek's IO loop is overloaded.
Explicitly notify the internal thread about the shutdown via the
inproc socket pair. This ensures that the internal thread processes
all previous messages on the inproc socket before terminating.
This fixes the scenario where a backend is created, a few messages published
and then immediately terminated as can be done with WebSocket clients.
Previously, some of the messages published might have still been in the
inproc socket's queue and were simply discarded.
Adds the same test for Broker and ZeroMQ backends.