* topic/christian/management-verify-nodestarts:
Management framework: bump external cluster testsuite
Management framework: bump zeek-client to pull in set-config rendering
Management framework: enable stdout/stderr reporting
Management framework: Supervisor extensions for stdout/stderr handling
Management framework: disambiguate redef field names in agent and controller
Management framework: move to ResultVec in agent's set_configuration response
Management framework: tune request timeout granularity and interval
Management framework: verify node starts when deploying a configuration
Management framework: a bit of debug-level logging for troubleshooting
This uses the new frameworks/management/supervisor functionality to maintain
stdout/stderr files, and hooks output context into set_configuration error
results.
This improves the framework's handling of Zeek node stdout and stderr by
extending the (script-layer) Supervisor functionality.
- The Supervisor _either_ directs Zeek nodes' stdout/stderr to files _or_ lets
you hook into it at the script level. We'd like both: files make sense to allow
inspection outside of the framework, and the framework would benefit from
tapping into the streams e.g. for error context. We now provide the file
redirection functionality in the Supervisor, in addition to the hook
mechanism. The hook mechanism also builds up rolling windows of up to
100 lines (configurable) into stdout/stderr.
- The new Mangement::Supervisor::API::notify_node_exit event notifies
subscribers (agents, really) that a particular node has exited (and is possibly
being restarted by the Supervisor). The event includes the name of the node,
plus its recent stdout/stderr context.
During Zeekygen's doc generation both the agent's and controller's main.zeek get
loaded. This just happened to not throw errors so far because the redefs either
matched perfectly or used different field names.
We so far reported one result record per agent, which made it hard to report
per-node outcomes for the new configuration. Agents now report one result record
per node they're responsible for.
When the controller relays requests to agents, we want agents to time out more
quickly than the corresponding controller requests. This allows agents to
respond with more meaningful errors, while the controller's timeout acts mostly
as a last resort to ensure a response to the client actually happens.
This dials down the table_expire_interval to 2 seconds in both agent and
controller, for more predictable timeout behavior. It also dials the agent-side
request expiration interval down to 5 seconds, compared to the agent's 10
seconds.
We may have to revisit this to allow custom expiration intervals per
request/response message type.
We so far hoped for the best when an agent asked the Supervisor to launch a
node. Since the Management::Node::API::notify_node_hello events arriving from
new nodes signal when such nodes are up and running, we can use those events to
track once/whether all launched nodes have checked in, and respond accordingly.
This delays the set_configuration_response event until these checkins have
occurred, or a timeout kicks in. In case of error, the agent's response to the
controller is in error state and has the remaining, unresponsive/failed set of
nodes as its data member.
* origin/topic/vern/find-unused:
Update spicy-plugin with change that checks for zeek version
deprecation messages for unused base script functions
clearer messages for warning about unused functions
Fixes from review, post-rebase
code formatting and more btest updates
baseline & btest updates
annotate orphan base script components with &deprecated
annotate base scripts with &is_used as needed
--no-usage-warnings flag to suppress analysis
support for associating &is_used attributes with functions
classes for evaluating function/hook/event usage
broader support for AST traversal, including Attr and Attributes objects
include attributes in descriptions of sets and tables
low-level tidying
The Supervisor generates this event every time it receives a status update from
the stem, meaning a node got created or re-created. A corresponding
SupervisorControl::node_status event relays the same information for users
interacting with the Supervisor over Broker.
* topic/christian/management-cluster-dirs:
Management framework: bump zeek-client to pull in instance serialization fixes
Management framework: bump external cluster testsuite
Management framework: update agent-checkin test to reflect recent changes
Management framework: place each Zeek process in its own working dir
Management framework: set defaults for log rotation and persistent state
Management framework: add spool and state directory config settings
Management framework: establish stdout/stderr files also for cluster nodes
Management framework: default to having agents check in with the (local) controller
Management framework: move role variable from logging into framework-wide config
Management framework: distinguish supervisor/supervisee when loading agent/controller
Management framework: simplify agent and controller stdout/stderr files
Management framework: prefix the management logs with "management-"
Management framework: comment and layouting tweaks, no functional change
Management framework: rename env var that labels agents/controllers
Management framework: increase robustness of agent/controller naming
This establishes a directory "nodes" in Management::state_dir and places each
Zeek process into a subdirectory in it, named after the Zeek process. For
example, node "worker-01" runs with cwd <state_dir>/nodes/worker-01/.
Explicitly configured directories can override the naming logic, and also ignore
the state directory if they're absolute paths. One exception remains: the
Supervisor itself -- we'd have to use LogAscii::logdir to automatically place it
too in its own directory, but that feature currently does not interoperate with
log rotation.
This adds management/persistence.zeek to establish common configuration for log
rotation and persistent variable state. Log-writing Zeek processes initially
write locally in their working directory, and rotate into subdirectory
"log-queue" of the spool. Since agent and controller have no logger,
persistence.zeek puts in place compatible configurations for them.
Storage folders for Broker-backed tables and clusterized stores default to
subdirectories of the new Zeek-level state folder.
When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state
is kept in the local directory, and log rotation remains disabled.
This also tweaks @loads a bit in favor of simply loading frameworks/management,
which is easier to keep track of.