Mirror/zeek - git.uphillsecurity.com: We code.

mirror of https://github.com/zeek/zeek.git synced 2025-10-02 06:38:20 +00:00

Author	SHA1	Message	Date
Christian Kreibich	c53044981a	Management framework: improve address and port handling The get-nodes command also benefits from showing the state on connected agents more broadly (as opposed to just the one for the current configuration). Also a bugfix: ensure we use an agent's IP address as seen by the controller. This avoids reporting "0.0.0.0" in some cases.	2022-06-03 02:14:07 -07:00
Christian Kreibich	0c47d45bb9	Management framework: broaden get_instances response data to connected instances This response so far contained only the connected instances that are relevant to the current configuration, but this isn't very helpful when troubleshooting instance connectivity. It now reports all currently connected instances, with network addresses & ports as known to Broker.	2022-06-03 02:13:30 -07:00
Christian Kreibich	72acf24f52	Management framework: expand notify_agent_hello event arguments This swaps the host event argument for the Broker ID. The latter is more useful, since the sending agent doesn't necessarily know its IP address as visible to the controller, and the controller can pull up the full Broker context via the ID. It also adds an explicit argument to the event to indicate whether the agent connected to the controller or vice versa. This simplifies the controller's internal logic. Also minor tweaks to logging to show Broker IDs.	2022-06-03 02:12:19 -07:00
Christian Kreibich	aa689807fa	Management framework: comment-only tweaks and typo fixes	2022-06-03 02:12:12 -07:00
Christian Kreibich	f10b94de39	Management framework: enable stdout/stderr reporting This uses the new frameworks/management/supervisor functionality to maintain stdout/stderr files, and hooks output context into set_configuration error results.	2022-05-31 12:55:21 -07:00
Christian Kreibich	24a495da42	Management framework: Supervisor extensions for stdout/stderr handling This improves the framework's handling of Zeek node stdout and stderr by extending the (script-layer) Supervisor functionality. - The Supervisor _either_ directs Zeek nodes' stdout/stderr to files _or_ lets you hook into it at the script level. We'd like both: files make sense to allow inspection outside of the framework, and the framework would benefit from tapping into the streams e.g. for error context. We now provide the file redirection functionality in the Supervisor, in addition to the hook mechanism. The hook mechanism also builds up rolling windows of up to 100 lines (configurable) into stdout/stderr. - The new Mangement::Supervisor::API::notify_node_exit event notifies subscribers (agents, really) that a particular node has exited (and is possibly being restarted by the Supervisor). The event includes the name of the node, plus its recent stdout/stderr context.	2022-05-31 12:55:21 -07:00
Christian Kreibich	f74f21767a	Management framework: disambiguate redef field names in agent and controller During Zeekygen's doc generation both the agent's and controller's main.zeek get loaded. This just happened to not throw errors so far because the redefs either matched perfectly or used different field names.	2022-05-31 12:55:21 -07:00
Christian Kreibich	49b9f1669c	Management framework: move to ResultVec in agent's set_configuration response We so far reported one result record per agent, which made it hard to report per-node outcomes for the new configuration. Agents now report one result record per node they're responsible for.	2022-05-31 12:55:21 -07:00
Christian Kreibich	83c60fd8ac	Management framework: tune request timeout granularity and interval When the controller relays requests to agents, we want agents to time out more quickly than the corresponding controller requests. This allows agents to respond with more meaningful errors, while the controller's timeout acts mostly as a last resort to ensure a response to the client actually happens. This dials down the table_expire_interval to 2 seconds in both agent and controller, for more predictable timeout behavior. It also dials the agent-side request expiration interval down to 5 seconds, compared to the agent's 10 seconds. We may have to revisit this to allow custom expiration intervals per request/response message type.	2022-05-31 12:55:21 -07:00
Christian Kreibich	4371c17d4c	Management framework: verify node starts when deploying a configuration We so far hoped for the best when an agent asked the Supervisor to launch a node. Since the Management::Node::API::notify_node_hello events arriving from new nodes signal when such nodes are up and running, we can use those events to track once/whether all launched nodes have checked in, and respond accordingly. This delays the set_configuration_response event until these checkins have occurred, or a timeout kicks in. In case of error, the agent's response to the controller is in error state and has the remaining, unresponsive/failed set of nodes as its data member.	2022-05-31 12:55:21 -07:00
Christian Kreibich	c922f749c5	Management framework: a bit of debug-level logging for troubleshooting	2022-05-31 12:55:21 -07:00
Christian Kreibich	93ea03a081	Management framework: place each Zeek process in its own working dir This establishes a directory "nodes" in Management::state_dir and places each Zeek process into a subdirectory in it, named after the Zeek process. For example, node "worker-01" runs with cwd <state_dir>/nodes/worker-01/. Explicitly configured directories can override the naming logic, and also ignore the state directory if they're absolute paths. One exception remains: the Supervisor itself -- we'd have to use LogAscii::logdir to automatically place it too in its own directory, but that feature currently does not interoperate with log rotation.	2022-05-26 12:56:02 -07:00
Christian Kreibich	d1cd409e59	Management framework: set defaults for log rotation and persistent state This adds management/persistence.zeek to establish common configuration for log rotation and persistent variable state. Log-writing Zeek processes initially write locally in their working directory, and rotate into subdirectory "log-queue" of the spool. Since agent and controller have no logger, persistence.zeek puts in place compatible configurations for them. Storage folders for Broker-backed tables and clusterized stores default to subdirectories of the new Zeek-level state folder. When setting the ZEEK_MANAGEMENT_TESTING environment variable, persistent state is kept in the local directory, and log rotation remains disabled. This also tweaks @loads a bit in favor of simply loading frameworks/management, which is easier to keep track of.	2022-05-26 12:55:10 -07:00
Christian Kreibich	7708cbe500	Management framework: add spool and state directory config settings This allows specifying spool and variable-state directories specifically for the management framework. They default to the corresponding installation-level folders.	2022-05-25 13:56:23 -07:00
Christian Kreibich	e305d9c613	Management framework: establish stdout/stderr files also for cluster nodes	2022-05-25 13:56:23 -07:00
Christian Kreibich	da016b8a68	Management framework: default to having agents check in with the (local) controller This allows single-machine settings to work out of the box when agent and cluster are loaded in Supervisor mode.	2022-05-25 13:56:23 -07:00
Christian Kreibich	b96a4276eb	Management framework: move role variable from logging into framework-wide config The role isn't just about logging, it can also act as a general indicator to key in on in role-specific code elsewhere, such as @if.	2022-05-25 13:56:23 -07:00
Christian Kreibich	e78fdc39e4	Management framework: distinguish supervisor/supervisee when loading agent/controller Load the agent/controller bootstrapping code only from the Supervisor, and the basic config only from a supervisee. When we're neither (which is likely a mistake), we do nothing.	2022-05-25 13:56:23 -07:00
Christian Kreibich	d40bb6e85f	Management framework: simplify agent and controller stdout/stderr files Moving to a model in which every Zeek process runs out of its own working directory simplifies the handling of those files.	2022-05-25 13:56:23 -07:00
Christian Kreibich	f8f7fd97e8	Management framework: prefix the management logs with "management-" These were still using "cluster-", a leftover from earlier days of the framework.	2022-05-25 13:56:23 -07:00
Christian Kreibich	bd6c1683a2	Management framework: comment and layouting tweaks, no functional change Also remove additional instances of the term "data cluster".	2022-05-25 13:56:23 -07:00
Christian Kreibich	d4d6f10299	Management framework: rename env var that labels agents/controllers Just a consistency tweak to avoid confusion with "cluster".	2022-05-25 13:56:23 -07:00
Christian Kreibich	d2903bb645	Management framework: increase robustness of agent/controller naming The fallback mechanism when no explicit agent/controller names are configured didn't work properly, because many places in the code relied on accessing the name via the variables meant for explicit configuration, such as Management::Agent::name. Agent and controller now offer functions for computing the correct effective name, and we use that throughout.	2022-05-25 13:56:23 -07:00
Christian Kreibich	001de561fc	Management framework: add get_configuration_request/response transaction Includes submodule bumps for Broker (to pull in better handling of data structures that are difficult to unserialize in Python), zeek-client (for the get-config command), and a commit hash update for the external testsuite.	2022-05-05 16:09:21 -07:00
Christian Kreibich	b23d292410	Management framework: consistency fixes around event() vs Broker::publish() Switch to using Broker::publish() for any event we only send to a peered entity, and not to drive local processing. Also minor indentation cleanup.	2022-04-26 23:23:58 -07:00
Christian Kreibich	7edd1a2651	Management framework: allow selecting cluster nodes in get_id_value This adds an optional set of cluster node names to narrow the querying to. It similarly expands the dispatch mechanism, since it likely most sense for any such request to apply only to a subset of nodes. Requests for invalid nodes trigger Response records in error state.	2022-04-18 12:38:54 -07:00
Christian Kreibich	438cd9b9f7	Management framework: minor tweaks to logging component Use an enum with explicitly assigned values since we rely on enum_to_int() to reason about log levels, and bump the default level from DEBUG to INFO.	2022-04-18 12:38:20 -07:00
Christian Kreibich	fcef7f4925	Management framework: improve handling of node run states When agents receive a configuration, we don't currently honor requested run states (there's no such thing as registering a node but not running it, for example). To reflect this, we now start off nodes in state PENDING as we launch them via the Supervisor, and move them to RUNNING when they check in with us via Management::Node::API::notify_node_hello.	2022-04-15 18:51:56 -07:00
Christian Kreibich	497b2723d7	Management framework: add get_id_value dispatch This adds support for retrieving the value of a global identifier from any subset of cluster nodes. It relies on the lookup_ID() BiF to retrieve the val, and to_json() to render the value to an easily parsed string. Ideally we'd send the val directly, but this hits several roadblocks, including the fact that Broker won't serialize arbitrary values.	2022-04-15 18:51:56 -07:00
Christian Kreibich	788348f9d6	Management framework: allow dispatching "actions" on cluster nodes. This adds request/response event pairs to enable the controller to dispatch "actions" (pre-implemented Zeek script actions) on subsets of Zeek cluster nodes and collect the results. Using generic events to carry multiple such "run X on the nodes" scenarios simplifies adding these in the future.	2022-04-15 18:51:56 -07:00
Christian Kreibich	0020cc4af0	Management framework: some renaming to avoid the term "data cluster"	2022-04-15 18:51:56 -07:00
Christian Kreibich	337c7267e0	Management framework: allow agents to communicate with cluster nodes This provides Broker-level plumbing that allows agents to reach out to their managed Zeek nodes and collect responses. As a first event, it establishes Management::Node::API::notify_agent_hello, to notify the agent when the cluster node is ready to communicate. Also a bit of comment rewording to replace use of "data cluster" with simply "cluster", to avoid ambiguity with data nodes in SumStats, and expansion of test-all-policy.zeek and related/dependent tests, since we're introducing new scripts.	2022-04-15 18:51:54 -07:00
Christian Kreibich	54aaf3a623	Reorg of the cluster controller to new "Management framework" layout - This gives the cluster controller and agent the common name "Management framework" and changes the start directory of the sources from "policy/frameworks/cluster" to "policy/frameworks/management". This avoids ambiguity with the existing cluster framework. - It renames the "ClusterController" and "ClusterAgent" script modules to "Management::Controller" and "Management::Agent", respectively. This allows us to anchor tooling common to both controller and agent at the "Management" module. - It moves common configuration settings, logging, requests, types, and utilities to the common "Management" module. - It removes the explicit "::Types" submodule (so a request/response result is now a Management::Result, not a Management::Types::Result), which makes typenames more readable. - It updates tests that depend on module naming and full set of scripts.	2022-02-09 18:09:42 -08:00
Christian Kreibich	3e0a86e3b3	Updates to the cluster controller scripts to fix the docs build Mostly trivial changes, except for one aspect: if a module exports a record type and that record bears Zeekygen comments, then redefs that add to the record in another module cannot be private to that module. Zeekygen will complain with "unknown target" errors, even when such redefs have Zeekygen comments. So this commits also adds two export-blocks that aren't technically required at this point.	2022-02-09 12:28:47 -08:00
Christian Kreibich	7db8634c8b	Add ClusterController::API::get_nodes_request/response event pair This allows querying the status of Zeek nodes currently running in a cluster. The controller relays the request to all instances and accumulates their responses. The response back to the client contains one Result record per instance response, each of which carrying a ClusterController::Types::NodeState vector in its $data member to convey the state of each node at that instance. The NodeState record tracks the name of the node, its role in the controller (if any), its role in the data cluster (if any), as well as PID and listening port, if any.	2022-02-02 22:59:22 -08:00
Christian Kreibich	791e5545b1	Support optional listening ports for cluster nodes This makes cluster node listening ports &optional, and maps absent values to 0/unknown, the value the cluster framework currently uses to indicate that listening isn't desired.	2022-02-02 16:10:46 -08:00
Christian Kreibich	c79c2a2b00	Don't auto-publish Supervisor response events in the cluster agent This was an oversight: we auto-publish the agent's requests _to_ the supervisor, not the latter's responses.	2022-01-31 18:42:53 -08:00
Christian Kreibich	ad4744eba6	Make members of the ClusterController::Types::State enum all-caps A consistency tweak since we mostly use all-caps elsewhere as well.	2022-01-31 18:42:03 -08:00
Christian Kreibich	3da95de5b8	Be more conservative with triggering request timeout events	2022-01-31 18:38:40 -08:00
Christian Kreibich	4b5584a85d	Move redefs of ClusterController::Request::Request to their places of use The Request module does not need to know about additional state tucked onto it by its users.	2022-01-31 18:29:58 -08:00
Christian Kreibich	f9ac03d6e3	Simplify ClusterController::API::set_configuration_request/response It's easier to track outstanding controller/agent requests via a simple set of pending agent names, and we can remove all of the result aggregation logic since we can simply re-use the results reported by the agents. This can serve as a template for request-response patterns where a client's request triggers a request to all agents, followed by a response to the client once all agents have responded. Once we have a few more of those, it'll become clearer how to abstract this further.	2022-01-31 17:45:14 -08:00
Tim Wojtulewicz	3d9d6e953b	Merge remote-tracking branch 'origin/topic/vern/when-lambda' * origin/topic/vern/when-lambda: explicitly provide the frame for evaluating a "when" timeout expression attempt to make "when" btest deterministic tests for new "when" semantics/errors update existing test suite usage of "when" statements to include captures update uses of "when" in base scripts to include captures captures for "when" statements update Triggers to IntrusivePtr's and simpler AST traversal introduce IDSet type, migrate associated "ID" types to "const ID" logic (other than in profiling) for assignments that yield separate values option for internal use to mark a function type as allowing non-expression returns removed some now-obsolete profiling functionality minor commenting clarifications	2022-01-14 14:41:42 -07:00
Robin Sommer	964293209b	Merge remote-tracking branch 'origin/topic/robin/gh1844-host' * origin/topic/robin/gh1844-host: Fix host header normalization in intel framework. Switch to recording unmodified HTTP header.	2022-01-10 14:43:30 +01:00
Vern Paxson	98cd3f2213	update uses of "when" in base scripts to include captures	2022-01-07 14:53:33 -08:00
Christian Kreibich	5a72864ae8	Docs/comment pass over the cluster controller framework	2022-01-03 00:31:03 -08:00
Christian Kreibich	ac40d5c5b2	Remove periodic pinging of controller by agents This changes the agent-controller communication to remove the need for ongoing pinging of the controller by agents not actively "in service". Instead, agents now use the notify_agent_hello event to the controller to report only their identity. The controller puts them into service via an agent_welcome_request/ response pair, and takes them out of service via agent_standby_request/response. This removes the on_change handler from the set of agents that is ready for service, because not every change to this set is now a suitable time to potentially send out the configuration. We now invoke this check explicitly in the two situations where it's warranted: when a agent reports ready for service, and when we've received a new configuration.	2021-12-21 16:44:04 -08:00
Christian Kreibich	8463f14a52	Move cluster controller/agent main.zeek scripts into their own modules This has no practical relevance other than allowing the two to be loaded a the same time, which some of our (cluster-unrelated) tests require. Absence of namespacing would trigger symbol clashes at this point.	2021-12-21 14:52:29 -08:00
Christian Kreibich	30db1b3bfb	First uses of request state timeouts This now features support for the test_timeout_request/response events, as supported by the client, and also adds a timeout event for set_configuration, in case agents do not respond in time. Includes corresponding zeek-client submodule bump.	2021-12-21 14:52:29 -08:00
Christian Kreibich	1e823f931e	Add expiration mechanism to client request state. This establishes a timeout controlled via ClusterController::request_timeout, triggering a ClusterController::Request::request_expired event whenever a timeout rolls around before request state has been finalized by a request's normal processing.	2021-12-21 14:52:29 -08:00
Christian Kreibich	fc9679e510	Move get_instances_response event to using a Result record Includes corresponding zeek-client bump.	2021-12-21 14:52:29 -08:00

... 2 3 4 5 6 ...

381 commits