Commit graph

31 commits

Author SHA1 Message Date
Christian Kreibich
841a40ff88 Switch Broker's default backpressure policy to drop_oldest, bump buffer sizes
At every site where we've dug into backpressure disconnect findings, it has been
the case that the default values were too small. 8192, so 4x the old default,
suffices at every site to drown out premature disconnects.

With metrics now available for the send buffers regardless of backpressure
overflow policy, this also switches the default from "disconnect" to
"drop_oldest" (for both peers and websockets), meaning that peerings remain
untouched but the oldest queued message simply gets dropped when a new message
is enqueued. With this policy, the number of backpressure overflows is then
simply the count of discarded messages, something that users can tune to see
drop to zero in everyday use.  Another benefit is that marginal overflows cause
less message loss than when an entire buffer's worth (plus potentially more
in-flight messages) gets thrown out with a disconnect.
2025-04-25 10:22:35 -07:00
Christian Kreibich
5008f586ea Deprecate Broker::congestion_queue_size and stop using it internally
Since a reorg in the Broker library (commit b04195183) that revamped flow
control and that we pulled in with Zeek 5.0, this setting hasn't done
anything. Broker's endpoint::make_subscriber() and
endpoint::make_status_subscriber() take a queue size argument (with a default
value) that simply gets dropped in the eventual subscriber::make() call. See:

b041951835 (diff-5c0d2baa7981caeb6a4080708ddca6ad929746d10c73d66598e46d7c2c03c8deL34-R178)
2025-04-25 10:22:35 -07:00
Christian Kreibich
f5fbad23ff Add peer buffer update tracking to the Broker manager's event_observer
This implements basic tracking of each peering's current fill level, the maximum
level over a recent time interval (via a new Broker::buffer_stats_reset_interval
tunable, defaulting to 1min), and the number of times a buffer overflows. For
the disconnect policy this is the number of depeerings, but for drop_newest and
drop_oldest it implies the number of messages lost.

This doesn't use "proper" telemetry metrics for a few reasons: this tracking is
Broker-specific, so we need to track each peering via endpoint_ids, while we
want the metrics to use Cluster node name labels, and the latter live in the
script layer. Using broker::endpoint_id directly as keys also means we rely on
their ability to hash in STL containers, which should be fast.

This does not track the buffer levels for Broker "clients" (as opposed to
"peers"), i.e. WebSockets, since we currently don't have a way to name these,
and we don't want to use ephemeral Broker IDs in their telemetry.

To make the stats accessible to the script layer the Broker manager (via a new
helper class that lives in the event_observer) maintains a TableVal mapping
Broker IDs to a new BrokerPeeringStats record. The table's members get updated
every time that table is requested. This minimizes new val instantiation and
allows the script layer to customize the BrokerPeeringStats record by redefing,
updating fields, etc. Since we can't use Zeek vals outside the main thread, this
requires some care so all table updates happen only in the Zeek-side table
updater, PeerBufferState::GetPeeringStatsTable().
2025-04-24 22:47:18 -07:00
Arne Welzel
ab25e5d24b broker/main: Reference Cluster::publish() for auto_publish() deprecation
In hindsight, this is the better thing to do and with Zeek 7.2 we should
be confident enough that it'll work.
2025-04-23 14:27:43 +02:00
Arne Welzel
a7423104e1 broker/main: Deprecate Broker::listen_websocket()
Optimistically deprecate Broker::listen_websocket() and promote
Cluster::listen_websocket() instead.
2025-04-23 14:27:43 +02:00
Christian Kreibich
b430d5235c Expand Broker APIs to allow tracking directionality of peering establishment
This provides ways to figure out for a given peer, or a given address/port pair,
whether the local node originally established the peering.
2025-04-21 14:08:42 -07:00
Arne Welzel
6bc36e8cf8 broker/main: Adapt enum values to agree with comm.bif
Logic to detect this error already existed, but due to enum identifiers
not having a value set, it never triggered before.

Should probably backport this one.
2025-04-04 15:36:42 +02:00
Arne Welzel
14697ea6ba Merge remote-tracking branch 'origin/topic/neverlord/broker-logging'
* origin/topic/neverlord/broker-logging:
  Integrate review feedback
  Hook into Broker logs via its new API
2025-03-31 18:53:43 +02:00
Dominik Charousset
30615f425e Hook into Broker logs via its new API
The new Broker API allows us to provide a custom logger to Broker that
pulls previously unattainable context information out of Broker to put
them into broker.log for users of Zeek.

Since Broker log events happen asynchronously, we cache them in a queue
and use a flare to notify Zeek of activity. Furthermore, the Broker
manager now implements the `ProcessFd` function to avoid unnecessary
polling of the new log queue. As a side effect, data stores are polled
less as well.
2025-02-08 16:28:02 +01:00
Tim Wojtulewicz
c1a8f8b763 Fix errors from rst linting on the generated docs 2025-01-24 11:41:36 -07:00
Dominik Charousset
4c4eb4b8e2 Add Zeek-level configurability of Broker slow-peer disconnects 2024-12-06 15:18:05 -08:00
Arne Welzel
6abb9d7eda broker/Eventhandler: Deprecate Broker::auto_publish() for v8.1
Relates to #3637
2024-11-14 12:59:22 +01:00
Tim Wojtulewicz
128bf3fe9f Remove Broker metrics configuration values and methods 2024-05-31 13:30:31 -07:00
Tim Wojtulewicz
97a35011a7 Add necessary script-land changes 2024-05-31 13:30:31 -07:00
Josh Soref
21e0d777b3 Spelling fixes: scripts
* accessing
* across
* adding
* additional
* addresses
* afterwards
* analyzer
* ancillary
* answer
* associated
* attempts
* because
* belonging
* buffer
* cleanup
* committed
* connects
* database
* destination
* destroy
* distinguished
* encoded
* entries
* entry
* hopefully
* image
* include
* incorrect
* information
* initial
* initiate
* interval
* into
* java
* negotiation
* nodes
* nonexistent
* ntlm
* occasional
* omitted
* otherwise
* ourselves
* paragraphs
* particular
* perform
* received
* receiver
* referring
* release
* repetitions
* request
* responded
* retrieval
* running
* search
* separate
* separator
* should
* synchronization
* target
* that
* the
* threshold
* timeout
* transaction
* transferred
* transmission
* triggered
* vetoes
* virtual

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-02 17:36:39 -04:00
Dominik Charousset
6565b4862d Add missing bits for Broker::metrics_import_topics 2022-08-22 17:10:07 +02:00
Arne Welzel
c2ca92d772 Try adding Broker::metrics_import_topics, stuck 2022-08-08 17:20:13 +02:00
Robin Sommer
d99f041ac5
Add WebSocket support for exchanging events with external clients.
This exposes Broker's new WebSocket support in Zeek. To enable it,
call `Broker::listen_websocket()`. Zeek will then start listening on
port 9997 for incoming WebSocket connections.

See the Broker documentation for a description of the message format
expected over these WebSocket connections.
2022-06-02 10:31:52 +02:00
Christian Kreibich
90b7c6961e Simplify the supervisor's listen() on default address/port 2021-08-18 12:35:49 -07:00
Dominik Charousset
7767c3d36c Sync new broker options, fix name inconsistencies 2021-05-25 17:22:45 +02:00
Dominik Charousset
f9cd05f00b Integrate new Broker metric exporter parameters 2021-05-24 17:20:48 +02:00
Jon Siwek
43e54c7930 GH-780: Prevent log batches from indefinite buffering
Logs that got sent sparsely or burstily would get buffered for long
periods of time since the logic to flush them only does so on the next
log write.  In the worst case, a subsequent log write could never happen
and cause a log entry to be indefinitely buffered.

This fix introduces a recurring event/timer to simply flush all pending
logs at frequency of Broker::log_batch_interval.
2020-02-05 13:06:52 -08:00
Jon Siwek
5b64c35185 Switch default CAF scheduler policy to work sharing
It may generally be better for our default use-case, as workers may
save a few percent cpu utilization as this policy does not have to
use any polling like the stealing policy does.

This also helps avoid a potential issue with the implementation of
spinlocks used in the work-stealing policy in current CAF versions,
where there's some conditions where lock contention causes a thread
to spin for long periods without relinquishing the cpu to others.
2019-06-28 16:34:33 -07:00
Jon Siwek
1ce0fcce49 GH-387: update Broker topic names to use "zeek/" prefix 2019-05-29 15:56:37 -07:00
Jon Siwek
7f0fb49612 Add an internal getenv wrapper function: zeekenv
It maps newer environment variable names starting with ZEEK to the
legacy names starting with BRO.
2019-05-23 20:42:42 -07:00
Daniel Thayer
1a74516db1 Rename all BRO-prefixed environment variables
For backward compatibility when reading values, we first check
the ZEEK-prefixed value, and if not set, then check the corresponding
BRO-prefixed value.
2019-05-22 00:12:31 -05:00
Daniel Thayer
be182aac83 More bro-to-zeek renaming in scripts and other files 2019-05-16 02:36:41 -05:00
Jon Siwek
cb6b9a1f1a Allow tuning Broker log batching via scripts
Via redefining "Broker::log_batch_size" or "Broker::log_batch_interval"
2019-05-08 12:44:55 -07:00
Jon Siwek
aebcb1415d GH-234: rename Broxygen to Zeexygen along with roles/directives
* All "Broxygen" usages have been replaced in
  code, documentation, filenames, etc.

* Sphinx roles/directives like ":bro:see" are now ":zeek:see"

* The "--broxygen" command-line option is now "--zeexygen"
2019-04-22 19:45:50 -07:00
Jon Siwek
a994be9eeb Merge remote-tracking branch 'origin/topic/seth/zeek_init'
* origin/topic/seth/zeek_init:
  Some more testing fixes.
  Update docs and tests for bro_(init|done) -> zeek_(init|done)
  Implement the zeek_init handler.
2019-04-19 11:24:29 -07:00
Daniel Thayer
18bd74454b Rename all scripts to have ".zeek" file extension 2019-04-11 21:12:40 -05:00
Renamed from scripts/base/frameworks/broker/main.bro (Browse further)