Commit graph

901 commits

Author SHA1 Message Date
Christian Kreibich
f10b94de39 Management framework: enable stdout/stderr reporting
This uses the new frameworks/management/supervisor functionality to maintain
stdout/stderr files, and hooks output context into set_configuration error
results.
2022-05-31 12:55:21 -07:00
Christian Kreibich
24a495da42 Management framework: Supervisor extensions for stdout/stderr handling
This improves the framework's handling of Zeek node stdout and stderr by
extending the (script-layer) Supervisor functionality.

- The Supervisor _either_ directs Zeek nodes' stdout/stderr to files _or_ lets
you hook into it at the script level. We'd like both: files make sense to allow
inspection outside of the framework, and the framework would benefit from
tapping into the streams e.g. for error context. We now provide the file
redirection functionality in the Supervisor, in addition to the hook
mechanism. The hook mechanism also builds up rolling windows of up to
100 lines (configurable) into stdout/stderr.

- The new Mangement::Supervisor::API::notify_node_exit event notifies
subscribers (agents, really) that a particular node has exited (and is possibly
being restarted by the Supervisor). The event includes the name of the node,
plus its recent stdout/stderr context.
2022-05-31 12:55:21 -07:00
Christian Kreibich
f74f21767a Management framework: disambiguate redef field names in agent and controller
During Zeekygen's doc generation both the agent's and controller's main.zeek get
loaded. This just happened to not throw errors so far because the redefs either
matched perfectly or used different field names.
2022-05-31 12:55:21 -07:00
Christian Kreibich
49b9f1669c Management framework: move to ResultVec in agent's set_configuration response
We so far reported one result record per agent, which made it hard to report
per-node outcomes for the new configuration. Agents now report one result record
per node they're responsible for.
2022-05-31 12:55:21 -07:00
Christian Kreibich
83c60fd8ac Management framework: tune request timeout granularity and interval
When the controller relays requests to agents, we want agents to time out more
quickly than the corresponding controller requests. This allows agents to
respond with more meaningful errors, while the controller's timeout acts mostly
as a last resort to ensure a response to the client actually happens.

This dials down the table_expire_interval to 2 seconds in both agent and
controller, for more predictable timeout behavior. It also dials the agent-side
request expiration interval down to 5 seconds, compared to the agent's 10
seconds.

We may have to revisit this to allow custom expiration intervals per
request/response message type.
2022-05-31 12:55:21 -07:00
Christian Kreibich
4371c17d4c Management framework: verify node starts when deploying a configuration
We so far hoped for the best when an agent asked the Supervisor to launch a
node. Since the Management::Node::API::notify_node_hello events arriving from
new nodes signal when such nodes are up and running, we can use those events to
track once/whether all launched nodes have checked in, and respond accordingly.

This delays the set_configuration_response event until these checkins have
occurred, or a timeout kicks in. In case of error, the agent's response to the
controller is in error state and has the remaining, unresponsive/failed  set of
nodes as its data member.
2022-05-31 12:55:21 -07:00
Christian Kreibich
c922f749c5 Management framework: a bit of debug-level logging for troubleshooting 2022-05-31 12:55:21 -07:00
Christian Kreibich
93ea03a081 Management framework: place each Zeek process in its own working dir
This establishes a directory "nodes" in Management::state_dir and places each
Zeek process into a subdirectory in it, named after the Zeek process. For
example, node "worker-01" runs with cwd <state_dir>/nodes/worker-01/.

Explicitly configured directories can override the naming logic, and also ignore
the state directory if they're absolute paths. One exception remains: the
Supervisor itself -- we'd have to use LogAscii::logdir to automatically place it
too in its own directory, but that feature currently does not interoperate with
log rotation.
2022-05-26 12:56:02 -07:00
Christian Kreibich
d1cd409e59 Management framework: set defaults for log rotation and persistent state
This adds management/persistence.zeek to establish common configuration for log
rotation and persistent variable state. Log-writing Zeek processes initially
write locally in their working directory, and rotate into subdirectory
"log-queue" of the spool. Since agent and controller have no logger,
persistence.zeek puts in place compatible configurations for them.

Storage folders for Broker-backed tables and clusterized stores default to
subdirectories of the new Zeek-level state folder.

When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state
is kept in the local directory, and log rotation remains disabled.

This also tweaks @loads a bit in favor of simply loading frameworks/management,
which is easier to keep track of.
2022-05-26 12:55:10 -07:00
Christian Kreibich
7708cbe500 Management framework: add spool and state directory config settings
This allows specifying spool and variable-state directories specifically for the
management framework. They default to the corresponding installation-level
folders.
2022-05-25 13:56:23 -07:00
Christian Kreibich
e305d9c613 Management framework: establish stdout/stderr files also for cluster nodes 2022-05-25 13:56:23 -07:00
Christian Kreibich
da016b8a68 Management framework: default to having agents check in with the (local) controller
This allows single-machine settings to work out of the box when agent and
cluster are loaded in Supervisor mode.
2022-05-25 13:56:23 -07:00
Christian Kreibich
b96a4276eb Management framework: move role variable from logging into framework-wide config
The role isn't just about logging, it can also act as a general indicator to key
in on in role-specific code elsewhere, such as @if.
2022-05-25 13:56:23 -07:00
Christian Kreibich
e78fdc39e4 Management framework: distinguish supervisor/supervisee when loading agent/controller
Load the agent/controller bootstrapping code only from the Supervisor, and the
basic config only from a supervisee. When we're neither (which is likely a
mistake), we do nothing.
2022-05-25 13:56:23 -07:00
Christian Kreibich
d40bb6e85f Management framework: simplify agent and controller stdout/stderr files
Moving to a model in which every Zeek process runs out of its own working
directory simplifies the handling of those files.
2022-05-25 13:56:23 -07:00
Christian Kreibich
f8f7fd97e8 Management framework: prefix the management logs with "management-"
These were still using "cluster-", a leftover from earlier days of the
framework.
2022-05-25 13:56:23 -07:00
Christian Kreibich
bd6c1683a2 Management framework: comment and layouting tweaks, no functional change
Also remove additional instances of the term "data cluster".
2022-05-25 13:56:23 -07:00
Christian Kreibich
d4d6f10299 Management framework: rename env var that labels agents/controllers
Just a consistency tweak to avoid confusion with "cluster".
2022-05-25 13:56:23 -07:00
Christian Kreibich
d2903bb645 Management framework: increase robustness of agent/controller naming
The fallback mechanism when no explicit agent/controller names are configured
didn't work properly, because many places in the code relied on accessing the
name via the variables meant for explicit configuration, such as
Management::Agent::name. Agent and controller now offer functions for computing
the correct effective name, and we use that throughout.
2022-05-25 13:56:23 -07:00
Johanna Amann
5f04f216bc Include certificate information in SSL::Weak_Key notice 2022-05-11 18:56:04 +01:00
Christian Kreibich
001de561fc Management framework: add get_configuration_request/response transaction
Includes submodule bumps for Broker (to pull in better handling of data
structures that are difficult to unserialize in Python), zeek-client (for the
get-config command), and a commit hash update for the external testsuite.
2022-05-05 16:09:21 -07:00
Christian Kreibich
b23d292410 Management framework: consistency fixes around event() vs Broker::publish()
Switch to using Broker::publish() for any event we only send to a peered entity,
and not to drive local processing.

Also minor indentation cleanup.
2022-04-26 23:23:58 -07:00
Christian Kreibich
7edd1a2651 Management framework: allow selecting cluster nodes in get_id_value
This adds an optional set of cluster node names to narrow the querying to. It
similarly expands the dispatch mechanism, since it likely most sense for any
such request to apply only to a subset of nodes.

Requests for invalid nodes trigger Response records in error state.
2022-04-18 12:38:54 -07:00
Christian Kreibich
438cd9b9f7 Management framework: minor tweaks to logging component
Use an enum with explicitly assigned values since we rely on enum_to_int() to
reason about log levels, and bump the default level from DEBUG to INFO.
2022-04-18 12:38:20 -07:00
Christian Kreibich
fcef7f4925 Management framework: improve handling of node run states
When agents receive a configuration, we don't currently honor requested run
states (there's no such thing as registering a node but not running it, for
example). To reflect this, we now start off nodes in state PENDING as we
launch them via the Supervisor, and move them to RUNNING when they check
in with us via Management::Node::API::notify_node_hello.
2022-04-15 18:51:56 -07:00
Christian Kreibich
497b2723d7 Management framework: add get_id_value dispatch
This adds support for retrieving the value of a global identifier from any
subset of cluster nodes. It relies on the lookup_ID() BiF to retrieve the val,
and to_json() to render the value to an easily parsed string. Ideally we'd send
the val directly, but this hits several roadblocks, including the fact that
Broker won't serialize arbitrary values.
2022-04-15 18:51:56 -07:00
Christian Kreibich
788348f9d6 Management framework: allow dispatching "actions" on cluster nodes.
This adds request/response event pairs to enable the controller to dispatch
"actions" (pre-implemented Zeek script actions) on subsets of Zeek cluster nodes
and collect the results. Using generic events to carry multiple such "run X on
the nodes" scenarios simplifies adding these in the future.
2022-04-15 18:51:56 -07:00
Christian Kreibich
0020cc4af0 Management framework: some renaming to avoid the term "data cluster" 2022-04-15 18:51:56 -07:00
Christian Kreibich
337c7267e0 Management framework: allow agents to communicate with cluster nodes
This provides Broker-level plumbing that allows agents to reach out to their
managed Zeek nodes and collect responses.

As a first event, it establishes Management::Node::API::notify_agent_hello,
to notify the agent when the cluster node is ready to communicate.

Also a bit of comment rewording to replace use of "data cluster" with simply
"cluster", to avoid ambiguity with data nodes in SumStats, and expansion of
test-all-policy.zeek and related/dependent tests, since we're introducing new
scripts.
2022-04-15 18:51:54 -07:00
Johanna Amann
d38923cfcf Merge remote-tracking branch 'origin/topic/johanna/tls12-decryption'
Documentation is missing and will be added in the next couple of hours.

* origin/topic/johanna/tls12-decryption: (24 commits)
  TLS decryption: add test, fix small issues
  Address PR feedback
  TLS decryption: refactoring, more comments, less bare pointers
  Small code fix and test baseline update.
  SSL decryption: refactor TLS12_PRF
  SSL decryption: small style changes, a bit of documentation
  Deprecation and warning fixes
  Clang-format updates
  add missing call to EVP_KDF_CTX_set_params
  TLS decryption: remove payload from ssl_encrypted_data again.
  TLS 1.2 decryption: adapt OpenSSL 3.0 changes for 1.1
  ssl: adapt TLS-PRF to openSSL 3.0
  ssl/analyzer: potentially fix memory leaks caused by bytestrings
  analyzer/ssl: several improvements
  analyzer/ssl: defensive key length check + more debug logging
  testing: feature gate ssl/decryption test
  testing: add ssl/decryption test
  analyzer/ssl: handle missing <openssl/kdf.h>
  analyzer/ssl: silence warning in DTLS analyzer
  analyzer/ssl: move proc-{client,server}-hello into the respective analyzers
  ...
2022-03-02 08:20:39 +00:00
Johanna Amann
590d4aa13e TLS decryption: add test, fix small issues
Add a test loading keys from an external file. Make some debug messages
slightly better and remove unnecessary debug output.
2022-03-01 17:45:11 +00:00
Johanna Amann
1c9ea09d9f Address PR feedback
This addresses feedback to GH-1814. The most significant change is the
fact that the ChipertextRecord now can remain &transient - which might
lead to improved speed.
2022-02-23 11:31:21 +00:00
Christian Kreibich
54aaf3a623 Reorg of the cluster controller to new "Management framework" layout
- This gives the cluster controller and agent the common name "Management
framework" and changes the start directory of the sources from
"policy/frameworks/cluster" to "policy/frameworks/management". This avoids
ambiguity with the existing cluster framework.

- It renames the "ClusterController" and "ClusterAgent" script modules to
"Management::Controller" and "Management::Agent", respectively. This allows us
to anchor tooling common to both controller and agent at the "Management"
module.

- It moves common configuration settings, logging, requests, types, and
utilities to the common "Management" module.

- It removes the explicit "::Types" submodule (so a request/response result is
now a Management::Result, not a Management::Types::Result), which makes
typenames more readable.

- It updates tests that depend on module naming and full set of scripts.
2022-02-09 18:09:42 -08:00
Christian Kreibich
3e0a86e3b3 Updates to the cluster controller scripts to fix the docs build
Mostly trivial changes, except for one aspect: if a module exports a record type
and that record bears Zeekygen comments, then redefs that add to the record in
another module cannot be private to that module. Zeekygen will complain with
"unknown target" errors, even when such redefs have Zeekygen comments. So this
commits also adds two export-blocks that aren't technically required at this point.
2022-02-09 12:28:47 -08:00
Christian Kreibich
9a7d5c986e Merge branch 'topic/christian/cluster-controller-get-nodes'
* topic/christian/cluster-controller-get-nodes:
  Bump external cluster testsuite
  Bump zeek-client for the get-nodes command
  Add ClusterController::API::get_nodes_request/response event pair
  Support optional listening ports for cluster nodes
  Don't auto-publish Supervisor response events in the cluster agent
  Make members of the ClusterController::Types::State enum all-caps
  Be more conservative with triggering request timeout events
  Move redefs of ClusterController::Request::Request to their places of use
  Simplify ClusterController::API::set_configuration_request/response
2022-02-03 13:19:34 -08:00
Christian Kreibich
7db8634c8b Add ClusterController::API::get_nodes_request/response event pair
This allows querying the status of Zeek nodes currently running in a cluster.
The controller relays the request to all instances and accumulates their
responses.

The response back to the client contains one Result record per instance
response, each of which carrying a ClusterController::Types::NodeState vector in
its $data member to convey the state of each node at that instance.

The NodeState record tracks the name of the node, its role in the controller (if
any), its role in the data cluster (if any), as well as PID and listening port,
if any.
2022-02-02 22:59:22 -08:00
Christian Kreibich
791e5545b1 Support optional listening ports for cluster nodes
This makes cluster node listening ports &optional, and maps absent values to
0/unknown, the value the cluster framework currently uses to indicate that
listening isn't desired.
2022-02-02 16:10:46 -08:00
Johanna Amann
95f1565498 Match DPD TLS signature on one-sided connections.
This commit changes DPD matching for TLS connections. A one-sided match
is enough to enable DPD now.

This commit also removes DPD for SSLv2 connections. SSLv2 connections do
basically no longer happen in the wild. SSLv2 is also really finnicky to
identify correctly - there is very little data required to match it, and
basically all matches today will be false positives. If DPD for SSLv2 is
still desired, the optional signature in policy/protocols/ssl/dpd-v2.sig
can be loaded.

Fixes GH-1952
2022-02-01 16:51:21 +00:00
Christian Kreibich
c79c2a2b00 Don't auto-publish Supervisor response events in the cluster agent
This was an oversight: we auto-publish the agent's requests _to_ the supervisor,
not the latter's responses.
2022-01-31 18:42:53 -08:00
Christian Kreibich
ad4744eba6 Make members of the ClusterController::Types::State enum all-caps
A consistency tweak since we mostly use all-caps elsewhere as well.
2022-01-31 18:42:03 -08:00
Christian Kreibich
3da95de5b8 Be more conservative with triggering request timeout events 2022-01-31 18:38:40 -08:00
Christian Kreibich
4b5584a85d Move redefs of ClusterController::Request::Request to their places of use
The Request module does not need to know about additional state tucked onto it
by its users.
2022-01-31 18:29:58 -08:00
Christian Kreibich
f9ac03d6e3 Simplify ClusterController::API::set_configuration_request/response
It's easier to track outstanding controller/agent requests via a simple set of
pending agent names, and we can remove all of the result aggregation logic since
we can simply re-use the results reported by the agents.

This can serve as a template for request-response patterns where a client's
request triggers a request to all agents, followed by a response to the client
once all agents have responded. Once we have a few more of those, it'll become
clearer how to abstract this further.
2022-01-31 17:45:14 -08:00
Johanna Amann
b78f30339f TLS decryption: refactoring, more comments, less bare pointers
This commit refactors TLS decryption, adds more comments in scripts and
in C++ source-code, and removes use of bare pointers, instead relying
more on stl data types.
2022-01-17 15:04:44 +00:00
Johanna Amann
689b06d9bd Merge remote-tracking branch 'origin/master' into topic/johanna/tls12-decryption 2022-01-17 10:56:06 +00:00
Tim Wojtulewicz
3d9d6e953b Merge remote-tracking branch 'origin/topic/vern/when-lambda'
* origin/topic/vern/when-lambda:
  explicitly provide the frame for evaluating a "when" timeout expression
  attempt to make "when" btest deterministic
  tests for new "when" semantics/errors
  update existing test suite usage of "when" statements to include captures
  update uses of "when" in base scripts to include captures
  captures for "when" statements update Triggers to IntrusivePtr's and simpler AST traversal introduce IDSet type, migrate associated "ID*" types to "const ID*"
  logic (other than in profiling) for assignments that yield separate values
  option for internal use to mark a function type as allowing non-expression returns
  removed some now-obsolete profiling functionality
  minor commenting clarifications
2022-01-14 14:41:42 -07:00
Johanna Amann
304a06bb88 Merge remote-tracking branch 'origin/master' into topic/johanna/tls12-decryption 2022-01-11 11:04:20 +00:00
Robin Sommer
964293209b
Merge remote-tracking branch 'origin/topic/robin/gh1844-host'
* origin/topic/robin/gh1844-host:
  Fix host header normalization in intel framework.
  Switch to recording unmodified HTTP header.
2022-01-10 14:43:30 +01:00
Vern Paxson
98cd3f2213 update uses of "when" in base scripts to include captures 2022-01-07 14:53:33 -08:00
Johanna Amann
4204615997 SSL decryption: small style changes, a bit of documentation 2022-01-05 15:44:36 +00:00