Commit graph

18 commits

Author SHA1 Message Date
Christian Kreibich
83c60fd8ac Management framework: tune request timeout granularity and interval
When the controller relays requests to agents, we want agents to time out more
quickly than the corresponding controller requests. This allows agents to
respond with more meaningful errors, while the controller's timeout acts mostly
as a last resort to ensure a response to the client actually happens.

This dials down the table_expire_interval to 2 seconds in both agent and
controller, for more predictable timeout behavior. It also dials the agent-side
request expiration interval down to 5 seconds, compared to the agent's 10
seconds.

We may have to revisit this to allow custom expiration intervals per
request/response message type.
2022-05-31 12:55:21 -07:00
Christian Kreibich
c922f749c5 Management framework: a bit of debug-level logging for troubleshooting 2022-05-31 12:55:21 -07:00
Christian Kreibich
93ea03a081 Management framework: place each Zeek process in its own working dir
This establishes a directory "nodes" in Management::state_dir and places each
Zeek process into a subdirectory in it, named after the Zeek process. For
example, node "worker-01" runs with cwd <state_dir>/nodes/worker-01/.

Explicitly configured directories can override the naming logic, and also ignore
the state directory if they're absolute paths. One exception remains: the
Supervisor itself -- we'd have to use LogAscii::logdir to automatically place it
too in its own directory, but that feature currently does not interoperate with
log rotation.
2022-05-26 12:56:02 -07:00
Christian Kreibich
d1cd409e59 Management framework: set defaults for log rotation and persistent state
This adds management/persistence.zeek to establish common configuration for log
rotation and persistent variable state. Log-writing Zeek processes initially
write locally in their working directory, and rotate into subdirectory
"log-queue" of the spool. Since agent and controller have no logger,
persistence.zeek puts in place compatible configurations for them.

Storage folders for Broker-backed tables and clusterized stores default to
subdirectories of the new Zeek-level state folder.

When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state
is kept in the local directory, and log rotation remains disabled.

This also tweaks @loads a bit in favor of simply loading frameworks/management,
which is easier to keep track of.
2022-05-26 12:55:10 -07:00
Christian Kreibich
e305d9c613 Management framework: establish stdout/stderr files also for cluster nodes 2022-05-25 13:56:23 -07:00
Christian Kreibich
b96a4276eb Management framework: move role variable from logging into framework-wide config
The role isn't just about logging, it can also act as a general indicator to key
in on in role-specific code elsewhere, such as @if.
2022-05-25 13:56:23 -07:00
Christian Kreibich
e78fdc39e4 Management framework: distinguish supervisor/supervisee when loading agent/controller
Load the agent/controller bootstrapping code only from the Supervisor, and the
basic config only from a supervisee. When we're neither (which is likely a
mistake), we do nothing.
2022-05-25 13:56:23 -07:00
Christian Kreibich
d40bb6e85f Management framework: simplify agent and controller stdout/stderr files
Moving to a model in which every Zeek process runs out of its own working
directory simplifies the handling of those files.
2022-05-25 13:56:23 -07:00
Christian Kreibich
bd6c1683a2 Management framework: comment and layouting tweaks, no functional change
Also remove additional instances of the term "data cluster".
2022-05-25 13:56:23 -07:00
Christian Kreibich
d4d6f10299 Management framework: rename env var that labels agents/controllers
Just a consistency tweak to avoid confusion with "cluster".
2022-05-25 13:56:23 -07:00
Christian Kreibich
d2903bb645 Management framework: increase robustness of agent/controller naming
The fallback mechanism when no explicit agent/controller names are configured
didn't work properly, because many places in the code relied on accessing the
name via the variables meant for explicit configuration, such as
Management::Agent::name. Agent and controller now offer functions for computing
the correct effective name, and we use that throughout.
2022-05-25 13:56:23 -07:00
Christian Kreibich
001de561fc Management framework: add get_configuration_request/response transaction
Includes submodule bumps for Broker (to pull in better handling of data
structures that are difficult to unserialize in Python), zeek-client (for the
get-config command), and a commit hash update for the external testsuite.
2022-05-05 16:09:21 -07:00
Christian Kreibich
b23d292410 Management framework: consistency fixes around event() vs Broker::publish()
Switch to using Broker::publish() for any event we only send to a peered entity,
and not to drive local processing.

Also minor indentation cleanup.
2022-04-26 23:23:58 -07:00
Christian Kreibich
7edd1a2651 Management framework: allow selecting cluster nodes in get_id_value
This adds an optional set of cluster node names to narrow the querying to. It
similarly expands the dispatch mechanism, since it likely most sense for any
such request to apply only to a subset of nodes.

Requests for invalid nodes trigger Response records in error state.
2022-04-18 12:38:54 -07:00
Christian Kreibich
497b2723d7 Management framework: add get_id_value dispatch
This adds support for retrieving the value of a global identifier from any
subset of cluster nodes. It relies on the lookup_ID() BiF to retrieve the val,
and to_json() to render the value to an easily parsed string. Ideally we'd send
the val directly, but this hits several roadblocks, including the fact that
Broker won't serialize arbitrary values.
2022-04-15 18:51:56 -07:00
Christian Kreibich
788348f9d6 Management framework: allow dispatching "actions" on cluster nodes.
This adds request/response event pairs to enable the controller to dispatch
"actions" (pre-implemented Zeek script actions) on subsets of Zeek cluster nodes
and collect the results. Using generic events to carry multiple such "run X on
the nodes" scenarios simplifies adding these in the future.
2022-04-15 18:51:56 -07:00
Christian Kreibich
337c7267e0 Management framework: allow agents to communicate with cluster nodes
This provides Broker-level plumbing that allows agents to reach out to their
managed Zeek nodes and collect responses.

As a first event, it establishes Management::Node::API::notify_agent_hello,
to notify the agent when the cluster node is ready to communicate.

Also a bit of comment rewording to replace use of "data cluster" with simply
"cluster", to avoid ambiguity with data nodes in SumStats, and expansion of
test-all-policy.zeek and related/dependent tests, since we're introducing new
scripts.
2022-04-15 18:51:54 -07:00
Christian Kreibich
54aaf3a623 Reorg of the cluster controller to new "Management framework" layout
- This gives the cluster controller and agent the common name "Management
framework" and changes the start directory of the sources from
"policy/frameworks/cluster" to "policy/frameworks/management". This avoids
ambiguity with the existing cluster framework.

- It renames the "ClusterController" and "ClusterAgent" script modules to
"Management::Controller" and "Management::Agent", respectively. This allows us
to anchor tooling common to both controller and agent at the "Management"
module.

- It moves common configuration settings, logging, requests, types, and
utilities to the common "Management" module.

- It removes the explicit "::Types" submodule (so a request/response result is
now a Management::Result, not a Management::Types::Result), which makes
typenames more readable.

- It updates tests that depend on module naming and full set of scripts.
2022-02-09 18:09:42 -08:00