This is quite redundant with the enumeration for Broker ports,
unfortunately. But the logic is subtly different: all nodes obtain a telemetry
port, while not all nodes require a Broker port, for example, and in the metrics
port assignment we also cross-check selected Broker ports. I found more unified
code actually harder to read in the end.
The logic for the two sets remains the same: from a start point, ports get
enumerated sequentially that aren't otherwise taken. These ports are assumed
available; there's nothing that checks their availability -- for now.
The default start port is 9000. I considered 9090, to align with the Prometheus
default, but counting upward from there is likely to hit trouble with the Broker
default ports (9999/9997), used by the Supervisor. Counting downward is a bit
unnatural, and shifting the Broker default ports brings subtle ordering issues.
This also changes the node ordering logic slightly since it seems more intuitive
to keep sequential ports on a given instance, instead of striping across them.
The controller now listens on an additional port, defaulting to 2149, for Broker
connections via websockets. Configuration works as for the existing traditional
Broker port (2150), via ZEEK_CONTROLLER_WEBSOCKET_ADDR and
ZEEK_CONTROLLER_WEBSOCKET_PORT environment variables, as well as corresponding
redef'able constants.
To disable the websockets feature, leave ZEEK_CONTROLLER_WEBSOCKET_PORT unset
and redefine Management::Controller::default_port_websocket to 0/unknown.
The controller now knows three states that a cluster configuration can be in:
- STAGED: as uploaded by the client
- READY: with needed tweaks applied, e.g. to fill in ports
- DEPLOYED: as sent off to agents for deployment
These states aren't exclusive, they represent checkpoints that a config goes
through from upload through deployment. A deployed configuration will also exist
in its STAGED and READY versions, unless a client has uploaded a new
configuration, which will overwrite the STAGED and READY ones.
The controller saves all of these in a table, which lets us use Broker to
persist all states to disk. We use &broker_allow_complex_type, since we only
ever store entire configurations.
Up to now, agents and controllers listened locally only, and the Supervisor
(which listens when we run an agent) listened globally. It's now the other way
around: controllers and agents listen globally and the Supervisor, when
listening, does so locally.
This enables the controller to assign listening ports to managers, loggers, and
proxies. (We don't currently make the workers listen.) The feature is controlled
by the Management::Controller::auto_assign_ports flag. When enabled (the
default), enumeration starts from Management::Controller::auto_assign_start_port,
beginning with the manager, then the logger(s), then proxy(s). When the feature
is disabled and nodes that require a port lack it, the controller rejects the
configuration.
This adds management/persistence.zeek to establish common configuration for log
rotation and persistent variable state. Log-writing Zeek processes initially
write locally in their working directory, and rotate into subdirectory
"log-queue" of the spool. Since agent and controller have no logger,
persistence.zeek puts in place compatible configurations for them.
Storage folders for Broker-backed tables and clusterized stores default to
subdirectories of the new Zeek-level state folder.
When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state
is kept in the local directory, and log rotation remains disabled.
This also tweaks @loads a bit in favor of simply loading frameworks/management,
which is easier to keep track of.
The fallback mechanism when no explicit agent/controller names are configured
didn't work properly, because many places in the code relied on accessing the
name via the variables meant for explicit configuration, such as
Management::Agent::name. Agent and controller now offer functions for computing
the correct effective name, and we use that throughout.
- This gives the cluster controller and agent the common name "Management
framework" and changes the start directory of the sources from
"policy/frameworks/cluster" to "policy/frameworks/management". This avoids
ambiguity with the existing cluster framework.
- It renames the "ClusterController" and "ClusterAgent" script modules to
"Management::Controller" and "Management::Agent", respectively. This allows us
to anchor tooling common to both controller and agent at the "Management"
module.
- It moves common configuration settings, logging, requests, types, and
utilities to the common "Management" module.
- It removes the explicit "::Types" submodule (so a request/response result is
now a Management::Result, not a Management::Types::Result), which makes
typenames more readable.
- It updates tests that depend on module naming and full set of scripts.