Commit graph

21 commits

Author SHA1 Message Date
Christian Kreibich
c5778b7932 Merge branch 'topic/christian/management-restart'
* topic/christian/management-restart:
  Management framework: bump external cluster testsuite
  Management framework: bump zeek-client
  Management framework: edit pass over docstrings
  Management framework: node restart support
  Management framework: more consistent Supervisor interaction in the agent
  Management framework: log the controller's startup deployment attempt
  Management framework: bugfix for a get_id_value corner case
  Management framework: minor timeout bugfix
  Management framework: make "result" argument plural in multi-result response events

(cherry picked from commit 3287b8b793)
2022-06-23 13:19:35 -07:00
Christian Kreibich
708b0df091 Merge branch 'topic/christian/management-deploy'
* topic/christian/management-deploy: (21 commits)
  Management framework: bump external cluster testsuite
  Management framework: bump zeek-client
  Management framework: rename set_configuration events to stage_configuration
  Management framework: trigger deployment upon when instances are ready
  Management framework: more resilient node shutdown upon deployment
  Management framework: re-trigger deployment upon controller launch
  Management framework: move most deployment handling to internal function
  Management framework: distinguish internally and externally requested deployments
  Management framework: track instances by their Broker IDs
  Management framework: tweak Supervisor event logging
  Management framework: make helper function a local
  Management framework: rename "log_level" to "level"
  Management framework: add "finish" callback to requests
  Management framework: add a helper for rendering result vectors to a string
  Management framework: agents now skip re-deployment of current config
  Management framework: suppress notify_agent_hello upon Supervisor peering
  Management framework: introduce state machine for configs and persist them
  Management framework: introduce deployment API in controller
  Management framework: rename agent "set_configuration" to "deploy"
  Management framework: consistency fixes to the Result record
  ...

(cherry picked from commit 54f2f28047)
2022-06-23 13:18:05 -07:00
Christian Kreibich
b36f92b641 Merge branch 'topic/christian/management-config-validation'
* topic/christian/management-config-validation:
  Management framework: bump external cluster testsuite
  Management framework: bump zeek-client
  Management framework: add config validation
  Management framework: improvements to port auto-enumeration
  Management framework: control output-to-console in Supervisor
  Management framework: handle no-instances corner case in set-config correctly

(cherry picked from commit 4deacefa4c)
2022-06-23 13:15:50 -07:00
Christian Kreibich
1ed73ce742 Merge branch 'topic/christian/management-auto-assign-ports'
* topic/christian/management-auto-assign-ports:
  Management framework: bump zeek-client to pull in relaxed port handling
  Management framework: bump external cluster testsuite
  Management framework: also use send_set_configuration_response_error elsewhere
  Management framework: minor log formatting tweak, for consistency
  Management framework: support auto-assignment of ports in cluster nodes

(cherry picked from commit 763b0c8d10)
2022-06-23 13:12:10 -07:00
Christian Kreibich
c53044981a Management framework: improve address and port handling
The get-nodes command also benefits from showing the state on connected agents
more broadly (as opposed to just the one for the current configuration).

Also a bugfix: ensure we use an agent's IP address as seen by the
controller. This avoids reporting "0.0.0.0" in some cases.
2022-06-03 02:14:07 -07:00
Christian Kreibich
0c47d45bb9 Management framework: broaden get_instances response data to connected instances
This response so far contained only the connected instances that are relevant to
the current configuration, but this isn't very helpful when troubleshooting
instance connectivity. It now reports all currently connected instances, with
network addresses & ports as known to Broker.
2022-06-03 02:13:30 -07:00
Christian Kreibich
72acf24f52 Management framework: expand notify_agent_hello event arguments
This swaps the host event argument for the Broker ID. The latter is more useful,
since the sending agent doesn't necessarily know its IP address as visible to
the controller, and the controller can pull up the full Broker context via the
ID.

It also adds an explicit argument to the event to indicate whether the agent
connected to the controller or vice versa. This simplifies the controller's
internal logic.

Also minor tweaks to logging to show Broker IDs.
2022-06-03 02:12:19 -07:00
Christian Kreibich
aa689807fa Management framework: comment-only tweaks and typo fixes 2022-06-03 02:12:12 -07:00
Christian Kreibich
f10b94de39 Management framework: enable stdout/stderr reporting
This uses the new frameworks/management/supervisor functionality to maintain
stdout/stderr files, and hooks output context into set_configuration error
results.
2022-05-31 12:55:21 -07:00
Christian Kreibich
49b9f1669c Management framework: move to ResultVec in agent's set_configuration response
We so far reported one result record per agent, which made it hard to report
per-node outcomes for the new configuration. Agents now report one result record
per node they're responsible for.
2022-05-31 12:55:21 -07:00
Christian Kreibich
83c60fd8ac Management framework: tune request timeout granularity and interval
When the controller relays requests to agents, we want agents to time out more
quickly than the corresponding controller requests. This allows agents to
respond with more meaningful errors, while the controller's timeout acts mostly
as a last resort to ensure a response to the client actually happens.

This dials down the table_expire_interval to 2 seconds in both agent and
controller, for more predictable timeout behavior. It also dials the agent-side
request expiration interval down to 5 seconds, compared to the agent's 10
seconds.

We may have to revisit this to allow custom expiration intervals per
request/response message type.
2022-05-31 12:55:21 -07:00
Christian Kreibich
c922f749c5 Management framework: a bit of debug-level logging for troubleshooting 2022-05-31 12:55:21 -07:00
Christian Kreibich
b96a4276eb Management framework: move role variable from logging into framework-wide config
The role isn't just about logging, it can also act as a general indicator to key
in on in role-specific code elsewhere, such as @if.
2022-05-25 13:56:23 -07:00
Christian Kreibich
bd6c1683a2 Management framework: comment and layouting tweaks, no functional change
Also remove additional instances of the term "data cluster".
2022-05-25 13:56:23 -07:00
Christian Kreibich
001de561fc Management framework: add get_configuration_request/response transaction
Includes submodule bumps for Broker (to pull in better handling of data
structures that are difficult to unserialize in Python), zeek-client (for the
get-config command), and a commit hash update for the external testsuite.
2022-05-05 16:09:21 -07:00
Christian Kreibich
b23d292410 Management framework: consistency fixes around event() vs Broker::publish()
Switch to using Broker::publish() for any event we only send to a peered entity,
and not to drive local processing.

Also minor indentation cleanup.
2022-04-26 23:23:58 -07:00
Christian Kreibich
7edd1a2651 Management framework: allow selecting cluster nodes in get_id_value
This adds an optional set of cluster node names to narrow the querying to. It
similarly expands the dispatch mechanism, since it likely most sense for any
such request to apply only to a subset of nodes.

Requests for invalid nodes trigger Response records in error state.
2022-04-18 12:38:54 -07:00
Christian Kreibich
497b2723d7 Management framework: add get_id_value dispatch
This adds support for retrieving the value of a global identifier from any
subset of cluster nodes. It relies on the lookup_ID() BiF to retrieve the val,
and to_json() to render the value to an easily parsed string. Ideally we'd send
the val directly, but this hits several roadblocks, including the fact that
Broker won't serialize arbitrary values.
2022-04-15 18:51:56 -07:00
Christian Kreibich
788348f9d6 Management framework: allow dispatching "actions" on cluster nodes.
This adds request/response event pairs to enable the controller to dispatch
"actions" (pre-implemented Zeek script actions) on subsets of Zeek cluster nodes
and collect the results. Using generic events to carry multiple such "run X on
the nodes" scenarios simplifies adding these in the future.
2022-04-15 18:51:56 -07:00
Christian Kreibich
337c7267e0 Management framework: allow agents to communicate with cluster nodes
This provides Broker-level plumbing that allows agents to reach out to their
managed Zeek nodes and collect responses.

As a first event, it establishes Management::Node::API::notify_agent_hello,
to notify the agent when the cluster node is ready to communicate.

Also a bit of comment rewording to replace use of "data cluster" with simply
"cluster", to avoid ambiguity with data nodes in SumStats, and expansion of
test-all-policy.zeek and related/dependent tests, since we're introducing new
scripts.
2022-04-15 18:51:54 -07:00
Christian Kreibich
54aaf3a623 Reorg of the cluster controller to new "Management framework" layout
- This gives the cluster controller and agent the common name "Management
framework" and changes the start directory of the sources from
"policy/frameworks/cluster" to "policy/frameworks/management". This avoids
ambiguity with the existing cluster framework.

- It renames the "ClusterController" and "ClusterAgent" script modules to
"Management::Controller" and "Management::Agent", respectively. This allows us
to anchor tooling common to both controller and agent at the "Management"
module.

- It moves common configuration settings, logging, requests, types, and
utilities to the common "Management" module.

- It removes the explicit "::Types" submodule (so a request/response result is
now a Management::Result, not a Management::Types::Result), which makes
typenames more readable.

- It updates tests that depend on module naming and full set of scripts.
2022-02-09 18:09:42 -08:00
Renamed from scripts/policy/frameworks/cluster/controller/main.zeek (Browse further)