Commit graph

15 commits

Author SHA1 Message Date
Christian Kreibich
4b5584a85d Move redefs of ClusterController::Request::Request to their places of use
The Request module does not need to know about additional state tucked onto it
by its users.
2022-01-31 18:29:58 -08:00
Christian Kreibich
f9ac03d6e3 Simplify ClusterController::API::set_configuration_request/response
It's easier to track outstanding controller/agent requests via a simple set of
pending agent names, and we can remove all of the result aggregation logic since
we can simply re-use the results reported by the agents.

This can serve as a template for request-response patterns where a client's
request triggers a request to all agents, followed by a response to the client
once all agents have responded. Once we have a few more of those, it'll become
clearer how to abstract this further.
2022-01-31 17:45:14 -08:00
Christian Kreibich
5a72864ae8 Docs/comment pass over the cluster controller framework 2022-01-03 00:31:03 -08:00
Christian Kreibich
ac40d5c5b2 Remove periodic pinging of controller by agents
This changes the agent-controller communication to remove the need for ongoing
pinging of the controller by agents not actively "in service". Instead, agents
now use the notify_agent_hello event to the controller to report only their
identity. The controller puts them into service via an agent_welcome_request/
response pair, and takes them out of service via agent_standby_request/response.

This removes the on_change handler from the set of agents that is ready for
service, because not every change to this set is now a suitable time to
potentially send out the configuration. We now invoke this check explicitly in
the two situations where it's warranted: when a agent reports ready for service,
and when we've received a new configuration.
2021-12-21 16:44:04 -08:00
Christian Kreibich
8463f14a52 Move cluster controller/agent main.zeek scripts into their own modules
This has no practical relevance other than allowing the two to be loaded a the
same time, which some of our (cluster-unrelated) tests require. Absence of
namespacing would trigger symbol clashes at this point.
2021-12-21 14:52:29 -08:00
Christian Kreibich
30db1b3bfb First uses of request state timeouts
This now features support for the test_timeout_request/response events, as
supported by the client, and also adds a timeout event for set_configuration, in
case agents do not respond in time.

Includes corresponding zeek-client submodule bump.
2021-12-21 14:52:29 -08:00
Christian Kreibich
fc9679e510 Move get_instances_response event to using a Result record
Includes corresponding zeek-client bump.
2021-12-21 14:52:29 -08:00
Christian Kreibich
1461d56340 Track successful config deployment in cluster controller
This allows us to start returning deployed configurations to the client upon
request.
2021-12-21 14:52:29 -08:00
Christian Kreibich
09d9be3433 Add ClusterController::API::notify_agents_ready event
This changes the basic agent-management model to one in which the configurations
received from the client define not just the data cluster, but also set the set
of acceptable instances. Unless connectivity already exists, the controller will
establish peerings with new agents that listen, or wait for ones that connect to
the controller to check in.

Once all required agents are available, the controller triggers the new
notify_agents_ready event, an agent/controller-level "cluster-is-ready"
event. The controller also uses this event to submit a pending config update to
the now-ready instances.
2021-12-21 14:52:29 -08:00
Christian Kreibich
b57be021b7 Make all globals start with a "g_" prefix
This makes it easier to spot them in code, and is shorter than using explicit
namespacing.
2021-12-21 14:52:28 -08:00
Christian Kreibich
ddbd83fee4 Support for dropping instances no longer needed after config updates
This sends such expired instances empty configurations that will cause them to
shut down their remaining data cluster nodes.
2021-12-21 14:52:28 -08:00
Christian Kreibich
5cb44c2f69 Support on-demand peering with agents when receiving new cluster configuration
Prior to this, static configuration needed to be in place to configure the
controller/agent layout. The configuration update can now include new instances
that the controller will connect to, assuming they're instances with a listening
agent.
2021-12-21 14:52:28 -08:00
Christian Kreibich
aceb05099a Whitespace tweaks in cluster controller and agent scripts 2021-12-21 14:52:28 -08:00
Christian Kreibich
8db985ea78 Merge branch 'topic/christian/cluster-controller'
* topic/christian/cluster-controller:
  Add a cluster controller testcase for agent-controller checkin
  Add zeek-client via new submodule
  Update baselines affected by cluster controller changes
  Introduce cluster controller and cluster agent scripting
  Establish a separate init script when using the supervisor
  Add optional bare-mode boolean flag to Supervisor's node configuration
  Add support for making the supervisor listen for requests
  Add support for setting environment variables via supervisor
2021-07-08 16:51:11 -07:00
Christian Kreibich
c744702f94 Introduce cluster controller and cluster agent scripting
This is a preliminary implementation of a subset of the functionality set out in
our cluster controller architecture. The controller is the central management
node, existing once in any Zeek cluster. The agent is a node that runs once per
instance, where an instance will commonly be a physical machine. The agent in
turn manages the "data cluster", i.e. the traditional notion of a Zeek cluster
with manager, worker nodes, etc.

Agent and controller live in the policy folder, and are activated when loading
policy/frameworks/cluster/agent and policy/frameworks/cluster/controller,
respectively. Both run in nodes forked by the supervisor. When Zeek doesn't use
the supervisor, they do nothing. Otherwise, boot.zeek instructs the supervisor
to create the respective node, running main.zeek.

Both controller and agent have their own config.zeek with relevant knobs. For
both, controller/types.zeek provides common data types, and controller/log.zeek
provides basic logging (without logger communication -- no such node might
exist).

A primitive request-tracking abstraction can be found in controller/request.zeek
to track outstanding request events and their subsequent responses.
2021-07-08 13:12:53 -07:00