Vern didn't have ZeroMQ installed and the test was skipped for him.
Generally would recommend anyone working on core Zeek to install
libzmq-dev or the equivalent for their environment, but until it is a
real required dependency, loosen the requirements on the test.
* origin/topic/timw/update-ct-ca-lists:
External tests: add removed logs to CT list to prevent baseline changes
Update Mozilla CA list and CT list to NSS 3.110
Instead of simply removing holes from vectors or lists when converting
from Val to Broker format, error out as the receiver has no chance to
reconstruct where the hole might have been.
We could encode holes with broker::none, but this will put unnecessary
burden on language bindings and users due to the potential optionality.
Think a std::vector<uint64_t> that technically needs to be a
std::vector<std::optional<uint64_t>> to represent optional elements
properly.
Closes#3045
Coverity complains proc is set under a lock, but accessed in Process()
without a lock. Fix this by setting it in Close() also without locking.
The proc member should only ever be accessed my the main thread.
* origin/topic/timw/seven-two-news:
Updates for the various Broker changes
Add versions of bundled dependencies
Fix a few typos.
Additional user contributions for NEWS
NEWS addition for cluster backends
NEWS additions for 7.2
Reformat 7.2 NEWS entries for consistent line lengths
* topic/christian/broker-tuning:
Lower listen/connect retry intervals in Broker and the cluster framework to 1sec
Bump cluster testsuite
Switch Broker's default backpressure policy to drop_oldest, bump buffer sizes
Deprecate Broker::congestion_queue_size and stop using it internally
At every site where we've dug into backpressure disconnect findings, it has been
the case that the default values were too small. 8192, so 4x the old default,
suffices at every site to drown out premature disconnects.
With metrics now available for the send buffers regardless of backpressure
overflow policy, this also switches the default from "disconnect" to
"drop_oldest" (for both peers and websockets), meaning that peerings remain
untouched but the oldest queued message simply gets dropped when a new message
is enqueued. With this policy, the number of backpressure overflows is then
simply the count of discarded messages, something that users can tune to see
drop to zero in everyday use. Another benefit is that marginal overflows cause
less message loss than when an entire buffer's worth (plus potentially more
in-flight messages) gets thrown out with a disconnect.
Since a reorg in the Broker library (commit b04195183) that revamped flow
control and that we pulled in with Zeek 5.0, this setting hasn't done
anything. Broker's endpoint::make_subscriber() and
endpoint::make_status_subscriber() take a queue size argument (with a default
value) that simply gets dropped in the eventual subscriber::make() call. See:
b041951835 (diff-5c0d2baa7981caeb6a4080708ddca6ad929746d10c73d66598e46d7c2c03c8deL34-R178)
* topic/christian/broker-backpressure-metrics:
Add basic btest to verify that Broker peering telemetry is available.
Add cluster framework telemetry for Broker's send-buffer use
Add peer buffer update tracking to the Broker manager's event_observer
Rename the Broker manager's LoggerAdapter
Avoid race in the cluster/broker/publish-any btest
This hooks into Telemetry::sync() to update Broker-level metrics tracking the
peerings' send buffer state. We do this in the cluster framework so we can label
the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.
The ZeroMQ heuristic for "ready to publish" is to create an unique and
ephemeral subscription using the XSUB socket and observe it arrive on the
XPUB socket. At this point, visibility into other node's subscriptions
is provided.
Provide a mechanism to allow a cluster backend report when it is ready
for publish operations. This is primarily useful for ZeroMQ which has
sender-side filtering and is only really ready for publishing when it
has learned about subscriptions from other nodes.
* origin/topic/awelzel/comment-out-broker-websocket-shim-two-endpoint-tests:
broker/WebSocketShim/tests: Comment out two endpoint tests
broker/WebSocketShim/tests: Replace hard-coded timeout values with vars
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.
This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.
This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.
To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
On very busy machines the hardwired scheduling of the ping batches could move
around among the arriving pongs, causing baseline deviations. We now wait for
each batch to complete before triggering the next one.