I believe there's a bug/usage issue in the websockets library
where during send(), EOF is detected and stored, but the receiving
thread is then discarding the last received frame. Avoid the bug
by replacing the close_socket() implementation of the websockets
library just for that test and leave detecting the EOF condition
to the receiving thread.
The terminate-while-queueing test added for #4428 failed spuriously
indicating that sometimes WebSocket clients receive code 1000 instead of 1001.
This happens if the ixwebsocket server is shutdown before the reply thread had a
chance to process queued close messages.
Fix by signaling and waiting for the dispatcher's reply thread to terminate
before returning from Terminate().
In GH-4422 it was pointed out that the protocols/conn/failed-service-logging.zeek
policy script only works when
`DPD::track_removed_services_in_connection=T` is set.
This was caused by a logic error in the script. This commit fixes this
logic error and introduces an additional test that checks that
failed-service-logging works even when the option is not set to true.
Terminate() is called when Zeek shuts down. If WebSocket client threads
were blocked in QueueForProcessing() due to reaching queue limits, these
previously would not exit QueueForProcessing() and instead block
indefinitely, resulting in the ixwebsocket library blocking and its
garbage collection thread running at 100%. Not great.
Closing the onloop instance will unblock the WebSocket client threads
for a timely shutdown.
Closes#4420
Ensure the client side also uses the initial destination connection ID
for decryption purposes instead of the one from the current long header
packet. PCAP from local WiFi hotspot.
Instead of sending the accumulated CRYPTO frames after processing an
INITIAL packet, add logic to determine the total length of the TLS
Client or Server Hello (by peeking into the first 4 byte). Once all
CRYPTO frames have arrived, flush the reassembled data to the TLS
analyzer at once.
The payload is already consumed within the InitialPacket unit. Consuming
it again resulted in UDP datagrams with multiple packets to ignore
the remaining packets in the same UDP datagram. The baseline changes
showing I being followed by a new H indicates that the INITIAL packet
was followed by a HANDSHAKE packet, but previously Zeek discarded
these.
* origin/topic/awelzel/event-publish-hook:
NEWS: Add HookPublishEvent() note
btest/plugin: Test for PublishEventHook()
broker and cluster: Wire up HookPublishEvent
plugin: Add HookPublishEvent hook
Vern didn't have ZeroMQ installed and the test was skipped for him.
Generally would recommend anyone working on core Zeek to install
libzmq-dev or the equivalent for their environment, but until it is a
real required dependency, loosen the requirements on the test.
Instead of simply removing holes from vectors or lists when converting
from Val to Broker format, error out as the receiver has no chance to
reconstruct where the hole might have been.
We could encode holes with broker::none, but this will put unnecessary
burden on language bindings and users due to the potential optionality.
Think a std::vector<uint64_t> that technically needs to be a
std::vector<std::optional<uint64_t>> to represent optional elements
properly.
Closes#3045
* topic/christian/broker-backpressure-metrics:
Add basic btest to verify that Broker peering telemetry is available.
Add cluster framework telemetry for Broker's send-buffer use
Add peer buffer update tracking to the Broker manager's event_observer
Rename the Broker manager's LoggerAdapter
Avoid race in the cluster/broker/publish-any btest
This hooks into Telemetry::sync() to update Broker-level metrics tracking the
peerings' send buffer state. We do this in the cluster framework so we can label
the resulting metrics with Zeek cluster node names, not Broker's endpoint IDs.
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.
This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.
This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.
To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
On very busy machines the hardwired scheduling of the ping batches could move
around among the arriving pongs, causing baseline deviations. We now wait for
each batch to complete before triggering the next one.
Limit the number WebSocket events queued from external clients to
dispatcher instances to produce back pressure to the clients if
Zeek's IO loop is overloaded.