Commit graph

61 commits

Author SHA1 Message Date
Arne Welzel
a7423104e1 broker/main: Deprecate Broker::listen_websocket()
Optimistically deprecate Broker::listen_websocket() and promote
Cluster::listen_websocket() instead.
2025-04-23 14:27:43 +02:00
Christian Kreibich
ea88257d4d Management framework: move up addition of agent IPs into deployable cluster configs
Since the changes to port autoassignment in the preceding commits leverage agent
IP address information, we need to ensure that this information is available at
the time of autoassignment. The controller learns IP addresses from connecting
agents, and previously used that information at deploy time. This moves the
augmentation of the cluster config up to port autoassignment time.
2025-01-30 16:43:12 -08:00
Michael Dopheide
0c0769b1b2 Support multiple instances per host addr in auto metrics generation 2025-01-30 16:41:27 -08:00
Michael Dopheide
b120f39bd7 When auto-generating metrics ports for worker nodes, get them more uniform across instances. 2025-01-30 16:41:27 -08:00
Tim Wojtulewicz
535df5e263 Remove deprecated Controller::auto_assign_ports and Controller::auto_assign_start_port 2024-08-07 11:58:21 -07:00
Christian Kreibich
8a4fb0ee19 Management framework: augment deployed configs with instance IP addresses
The controller learns IP addresses from agents that peer with it, but that
information has so far gotten lost when resulting configs get pushed out to the
agents. This makes these updates include that information.
2024-07-08 23:05:24 -07:00
Christian Kreibich
742f7fe340 Management framework: add auto-enumeration of metrics ports
This is quite redundant with the enumeration for Broker ports,
unfortunately. But the logic is subtly different: all nodes obtain a telemetry
port, while not all nodes require a Broker port, for example, and in the metrics
port assignment we also cross-check selected Broker ports. I found more unified
code actually harder to read in the end.

The logic for the two sets remains the same: from a start point, ports get
enumerated sequentially that aren't otherwise taken. These ports are assumed
available; there's nothing that checks their availability -- for now.

The default start port is 9000. I considered 9090, to align with the Prometheus
default, but counting upward from there is likely to hit trouble with the Broker
default ports (9999/9997), used by the Supervisor. Counting downward is a bit
unnatural, and shifting the Broker default ports brings subtle ordering issues.

This also changes the node ordering logic slightly since it seems more intuitive
to keep sequential ports on a given instance, instead of striping across them.
2024-07-08 23:05:24 -07:00
Josh Soref
21e0d777b3 Spelling fixes: scripts
* accessing
* across
* adding
* additional
* addresses
* afterwards
* analyzer
* ancillary
* answer
* associated
* attempts
* because
* belonging
* buffer
* cleanup
* committed
* connects
* database
* destination
* destroy
* distinguished
* encoded
* entries
* entry
* hopefully
* image
* include
* incorrect
* information
* initial
* initiate
* interval
* into
* java
* negotiation
* nodes
* nonexistent
* ntlm
* occasional
* omitted
* otherwise
* ourselves
* paragraphs
* particular
* perform
* received
* receiver
* referring
* release
* repetitions
* request
* responded
* retrieval
* running
* search
* separate
* separator
* should
* synchronization
* target
* that
* the
* threshold
* timeout
* transaction
* transferred
* transmission
* triggered
* vetoes
* virtual

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-02 17:36:39 -04:00
Christian Kreibich
147283c8f5 Management framework: add websocket support to controller
The controller now listens on an additional port, defaulting to 2149, for Broker
connections via websockets. Configuration works as for the existing traditional
Broker port (2150), via ZEEK_CONTROLLER_WEBSOCKET_ADDR and
ZEEK_CONTROLLER_WEBSOCKET_PORT environment variables, as well as corresponding
redef'able constants.

To disable the websockets feature, leave ZEEK_CONTROLLER_WEBSOCKET_PORT unset
and redefine Management::Controller::default_port_websocket to 0/unknown.
2022-10-24 15:59:26 -07:00
Christian Kreibich
e73b561dca Update Management framework to new Supervisor::NodeConfig script fields 2022-09-02 12:12:19 -07:00
Christian Kreibich
6c3e545306 Management framework: fix early return condition for get-id-value
This erroneously used connectedness of instances, not presence of a deployed
cluster. Without a deployment, there's no point in trying to retrieve global ID
values.
2022-08-09 14:07:16 -07:00
Christian Kreibich
e947e1d1c2 Management framework: additional context in a few log messages
This adds request IDs in a few places that didn't mention them, and makes
requests to the Supervisor that act on all current nodes explicit.
2022-07-11 13:00:35 -07:00
Christian Kreibich
3aa0409792 Management framework: edit pass over docstrings
This expands cross-referencing in the doc strings and adds a bit more
explanation.
2022-06-22 23:26:11 -07:00
Christian Kreibich
b9879a50a0 Management framework: node restart support
This adds restart request/response event pairs that restart nodes in the running
Zeek cluster. The implementation is very similar to get_id_value, which also
involves distributing a list of nodes to agents and aggregating the responses.
2022-06-22 23:26:11 -07:00
Christian Kreibich
d994f33636 Management framework: log the controller's startup deployment attempt
The controller now logs its deployment attempt of a persisted configuration at
startup. This is generally helpful to see recorded, and also explains timeout of
the underlying request in case of failure (which triggers a timeout message).
2022-06-22 23:26:11 -07:00
Christian Kreibich
05447c413f Management framework: bugfix for a get_id_value corner case
For the case of a running cluster with no connected agents, use the
g_instances_known table instead of g_instances. The latter reflects the contents
of the last deployed config, not the live scenario of actually attached agents.
2022-06-22 23:26:06 -07:00
Christian Kreibich
b2f9e29bae Management framework: make "result" argument plural in multi-result response events
No functional change, just a consistency tweak. Since agent and controller send
response events via Broker::publish(), the arguments aren't named and so this
only affects the API definition.
2022-06-22 23:25:15 -07:00
Christian Kreibich
2c1cd1d401 Management framework: rename set_configuration events to stage_configuration
This reflects corresponding renaming of the client's set-config command to
stage-config, to make it more clear what's happening.
2022-06-22 11:54:58 -07:00
Christian Kreibich
68558e2874 Management framework: trigger deployment upon when instances are ready
More resilience: when an agent restarts, it checks in with the controller. If
the controller has deployed a config, this check-in may lead to an internal
notify_agents_ready event. At that point, we now trigger a deployment when there
currently isn't already one running. This ensures that any agents not yet
running the current cluster will start to do so, and does nothing when those
agents already run it, since they ignore the request in that case.
2022-06-21 17:22:45 -07:00
Christian Kreibich
1faf1ab8b7 Management framework: re-trigger deployment upon controller launch
A resilience feature: when a booting controller has a previously deployed
configuration (just reloaded from persistent state), it now triggers a
deployment. When agents at this point run something else, this restores the
controller's understanding of what's deployed, and if the agents do still run
this configuration, does nothing since agents ignore deployment of a
configuration they already run.
2022-06-21 17:22:45 -07:00
Christian Kreibich
c4862e7c5e Management framework: move most deployment handling to internal function
The controller now runs most of a config deployment via an internal function,
allowing it to be called from multiple places instead of just the deploy_request
event handler.
2022-06-21 17:22:45 -07:00
Christian Kreibich
3120fbc75e Management framework: distinguish internally and externally requested deployments
The controller's deployment request state now features a bit that indicates
whether the deployment was requested by a client, or triggered internally. This
affects logging and the transmission of deployment response events via Broker,
which are skipped when the deployment is internal.

This is in preparation of resilience features when the controller (re-)boots.
2022-06-21 17:22:45 -07:00
Christian Kreibich
7787d84739 Management framework: track instances by their Broker IDs
This allows us to handle loss of Broker peerings, updating instance state as we
see instances go away. This also tweaks logging slightly to differentiate
between an instance checking in for the first time, and checking in when the
controller already knows it.
2022-06-21 17:22:45 -07:00
Christian Kreibich
d7e88fc079 Management framework: make helper function a local 2022-06-21 17:22:45 -07:00
Christian Kreibich
a2525e44ba Management framework: add a helper for rendering result vectors to a string 2022-06-21 17:22:45 -07:00
Christian Kreibich
d367f1bad9 Management framework: agents now skip re-deployment of current config
When an agent is already running the configuration it's asked to deploy,
it will now recognize this and by default do nothing. The requester can force
it if needed, via a new argument to the deploy_request event.
2022-06-21 17:22:45 -07:00
Christian Kreibich
46db4a0e71 Management framework: introduce state machine for configs and persist them
The controller now knows three states that a cluster configuration can be in:

- STAGED: as uploaded by the client
- READY: with needed tweaks applied, e.g. to fill in ports
- DEPLOYED: as sent off to agents for deployment

These states aren't exclusive, they represent checkpoints that a config goes
through from upload through deployment. A deployed configuration will also exist
in its STAGED and READY versions, unless a client has uploaded a new
configuration, which will overwrite the STAGED and READY ones.

The controller saves all of these in a table, which lets us use Broker to
persist all states to disk. We use &broker_allow_complex_type, since we only
ever store entire configurations.
2022-06-21 17:22:45 -07:00
Christian Kreibich
77556e9f11 Management framework: introduce deployment API in controller
This separates uploading a configuration from deploying it to the instances into
separate event transactions. set_configuration_request/response remains, but now
only conducts validation and storage of the new configuration (upon validation
success, and not yet persisted to disk). The response event indicates success or
the list of validation errors. Successful upload now returns the configuration's
ID in the result record's data struct.

The new deploy_request/response event takes a previously uploaded configuration
and deploys it to the agents.

The controller now tracks uploaded and deployed configurations
separately. Uploading assigns g_config_staged; deployment assigns
g_config_deployed. Deployment does not affect g_config_staged.

The get_config_request/response event pair now allows selecting the
configuration the caller would like to retrieve.
2022-06-21 17:22:45 -07:00
Christian Kreibich
0480b5f39c Management framework: rename agent "set_configuration" to "deploy"
This renames the agent's functionality for setting a configuration to reflect
the controller's upcoming separation of set_configuration and deployment.
2022-06-21 17:22:45 -07:00
Christian Kreibich
3ac5fdfc59 Management framework: trivial changes and comment-only rewording 2022-06-21 17:22:45 -07:00
Christian Kreibich
d6042cf516 Management framework: add config validation
During `set_configuration_request` handling the controller now validates
received configurations, checking for a few common gotchas around naming and
port use. Validation continues once it finds a problem, resulting in a list
summarizing all identified problems.
2022-06-19 01:20:16 -07:00
Christian Kreibich
620db4d4eb Management framework: improvements to port auto-enumeration
The numbering process now accounts for the possibility of colliding with the
agent port, as well as with ports explicitly assigned in the configuration. It
also avoids nondeterminism that could result from traversal of sets.
2022-06-19 01:19:54 -07:00
Christian Kreibich
5592beaf31 Management framework: handle no-instances corner case in set-config correctly
When the controller receives a configuration with no instances (and thus no
nodes), it needs to roundtrip to agents and can send the response right away.
2022-06-19 01:19:47 -07:00
Christian Kreibich
64741b571e Management framework: switch default network visibilities
Up to now, agents and controllers listened locally only, and the Supervisor
(which listens when we run an agent) listened globally. It's now the other way
around: controllers and agents listen globally and the Supervisor, when
listening, does so locally.
2022-06-08 15:00:19 -07:00
Christian Kreibich
9b4841912c Management framework: also use send_set_configuration_response_error elsewhere 2022-06-07 13:42:07 -07:00
Christian Kreibich
ccf3c24e23 Management framework: minor log formatting tweak, for consistency 2022-06-07 13:41:47 -07:00
Christian Kreibich
7a471df1a1 Management framework: support auto-assignment of ports in cluster nodes
This enables the controller to assign listening ports to managers, loggers, and
proxies. (We don't currently make the workers listen.) The feature is controlled
by the Management::Controller::auto_assign_ports flag. When enabled (the
default), enumeration starts from Management::Controller::auto_assign_start_port,
beginning with the manager, then the logger(s), then proxy(s). When the feature
is disabled and nodes that require a port lack it, the controller rejects the
configuration.
2022-06-07 13:38:04 -07:00
Christian Kreibich
c53044981a Management framework: improve address and port handling
The get-nodes command also benefits from showing the state on connected agents
more broadly (as opposed to just the one for the current configuration).

Also a bugfix: ensure we use an agent's IP address as seen by the
controller. This avoids reporting "0.0.0.0" in some cases.
2022-06-03 02:14:07 -07:00
Christian Kreibich
0c47d45bb9 Management framework: broaden get_instances response data to connected instances
This response so far contained only the connected instances that are relevant to
the current configuration, but this isn't very helpful when troubleshooting
instance connectivity. It now reports all currently connected instances, with
network addresses & ports as known to Broker.
2022-06-03 02:13:30 -07:00
Christian Kreibich
72acf24f52 Management framework: expand notify_agent_hello event arguments
This swaps the host event argument for the Broker ID. The latter is more useful,
since the sending agent doesn't necessarily know its IP address as visible to
the controller, and the controller can pull up the full Broker context via the
ID.

It also adds an explicit argument to the event to indicate whether the agent
connected to the controller or vice versa. This simplifies the controller's
internal logic.

Also minor tweaks to logging to show Broker IDs.
2022-06-03 02:12:19 -07:00
Christian Kreibich
aa689807fa Management framework: comment-only tweaks and typo fixes 2022-06-03 02:12:12 -07:00
Christian Kreibich
f10b94de39 Management framework: enable stdout/stderr reporting
This uses the new frameworks/management/supervisor functionality to maintain
stdout/stderr files, and hooks output context into set_configuration error
results.
2022-05-31 12:55:21 -07:00
Christian Kreibich
49b9f1669c Management framework: move to ResultVec in agent's set_configuration response
We so far reported one result record per agent, which made it hard to report
per-node outcomes for the new configuration. Agents now report one result record
per node they're responsible for.
2022-05-31 12:55:21 -07:00
Christian Kreibich
83c60fd8ac Management framework: tune request timeout granularity and interval
When the controller relays requests to agents, we want agents to time out more
quickly than the corresponding controller requests. This allows agents to
respond with more meaningful errors, while the controller's timeout acts mostly
as a last resort to ensure a response to the client actually happens.

This dials down the table_expire_interval to 2 seconds in both agent and
controller, for more predictable timeout behavior. It also dials the agent-side
request expiration interval down to 5 seconds, compared to the agent's 10
seconds.

We may have to revisit this to allow custom expiration intervals per
request/response message type.
2022-05-31 12:55:21 -07:00
Christian Kreibich
c922f749c5 Management framework: a bit of debug-level logging for troubleshooting 2022-05-31 12:55:21 -07:00
Christian Kreibich
93ea03a081 Management framework: place each Zeek process in its own working dir
This establishes a directory "nodes" in Management::state_dir and places each
Zeek process into a subdirectory in it, named after the Zeek process. For
example, node "worker-01" runs with cwd <state_dir>/nodes/worker-01/.

Explicitly configured directories can override the naming logic, and also ignore
the state directory if they're absolute paths. One exception remains: the
Supervisor itself -- we'd have to use LogAscii::logdir to automatically place it
too in its own directory, but that feature currently does not interoperate with
log rotation.
2022-05-26 12:56:02 -07:00
Christian Kreibich
d1cd409e59 Management framework: set defaults for log rotation and persistent state
This adds management/persistence.zeek to establish common configuration for log
rotation and persistent variable state. Log-writing Zeek processes initially
write locally in their working directory, and rotate into subdirectory
"log-queue" of the spool. Since agent and controller have no logger,
persistence.zeek puts in place compatible configurations for them.

Storage folders for Broker-backed tables and clusterized stores default to
subdirectories of the new Zeek-level state folder.

When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state
is kept in the local directory, and log rotation remains disabled.

This also tweaks @loads a bit in favor of simply loading frameworks/management,
which is easier to keep track of.
2022-05-26 12:55:10 -07:00
Christian Kreibich
e305d9c613 Management framework: establish stdout/stderr files also for cluster nodes 2022-05-25 13:56:23 -07:00
Christian Kreibich
b96a4276eb Management framework: move role variable from logging into framework-wide config
The role isn't just about logging, it can also act as a general indicator to key
in on in role-specific code elsewhere, such as @if.
2022-05-25 13:56:23 -07:00
Christian Kreibich
e78fdc39e4 Management framework: distinguish supervisor/supervisee when loading agent/controller
Load the agent/controller bootstrapping code only from the Supervisor, and the
basic config only from a supervisee. When we're neither (which is likely a
mistake), we do nothing.
2022-05-25 13:56:23 -07:00