mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00

This is based on commit 2731def9159247e6da8a3191783c89683363689c from the zeek-docs repo.
867 lines
33 KiB
ReStructuredText
867 lines
33 KiB
ReStructuredText
.. _framework-management:
|
|
|
|
====================
|
|
Management Framework
|
|
====================
|
|
|
|
.. rst-class:: opening
|
|
|
|
The management framework provides a Zeek-based, service-oriented architecture
|
|
and event-driven APIs to manage a Zeek cluster that monitors live traffic. It
|
|
provides a central, stateful *controller* that relays and orchestrates
|
|
cluster management tasks across connected *agents*. Each agent manages Zeek
|
|
processes in its local *instance*, the Zeek process tree controlled by the
|
|
local Zeek :ref:`Supervisor <framework-supervisor>`. A management *client*
|
|
lets the user interact with the controller to initiate cluster management
|
|
tasks, such as deployment of cluster configurations, monitoring of
|
|
operational aspects, or to restart cluster nodes. The default client is
|
|
``zeek-client``, included in the Zeek distribution.
|
|
|
|
.. _framework-management-quickstart:
|
|
|
|
Quickstart
|
|
==========
|
|
|
|
Run the following (as root) to launch an all-in-one management instance on your
|
|
system:
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -C -j policy/frameworks/management/controller policy/frameworks/management/agent
|
|
|
|
The above will stay in the foreground. In a new shell, save the following
|
|
content to a file ``cluster.cfg`` and adapt the workers' sniffing interfaces to
|
|
your system:
|
|
|
|
.. literalinclude:: management/mini-config.ini
|
|
:language: ini
|
|
|
|
Run the following command (as any user) to deploy the configuration:
|
|
|
|
.. literalinclude:: management/mini-deployment.console
|
|
:language: console
|
|
|
|
You are now running a Zeek cluster on your system. Try ``zeek-client get-nodes``
|
|
to see more details about the cluster's current status. (In the above, "testbox"
|
|
is the system's hostname.)
|
|
|
|
Architecture and Terminology
|
|
============================
|
|
|
|
Controller
|
|
----------
|
|
|
|
The controller forms the central hub of cluster management. It exists once in
|
|
every installation and runs as a Zeek process solely dedicated to management
|
|
tasks. It awaits instructions from a management client and communicates with one
|
|
or more agents to manage their cluster nodes.
|
|
|
|
All controller communication happens via :ref:`Broker <broker-framework>`-based
|
|
Zeek event exchange, usually in the form of request-response event pairs tagged
|
|
with a request ID to provide context. The controller is stateful and persists
|
|
cluster configurations to disk. In a multi-system setup, the controller runs
|
|
inside a separate, dedicated Zeek instance. In a single-system setup, the
|
|
controller can run as an additional process in the local instance.
|
|
|
|
The controller's API resides in the :zeek:see:`Management::Controller::API` module.
|
|
Additional code documentation is :doc:`here </scripts/policy/frameworks/management/controller/index>`.
|
|
|
|
Instance
|
|
--------
|
|
|
|
A Zeek instance comprises the set of processes managed by a Zeek
|
|
:ref:`Supervisor <framework-supervisor>`. The management framework builds
|
|
heavily on the Supervisor framework and cannot run without it. Typically, a
|
|
single instance includes all Zeek processes on the local system (a physical
|
|
machine, a container, etc), but running multiple instances on a system is
|
|
possible.
|
|
|
|
Agent
|
|
-----
|
|
|
|
Management agents implement instance-level cluster management tasks. Every
|
|
instance participating in cluster management runs an agent. Agents peer with the
|
|
controller to receive instructions (a node restart, say), carry them out, and
|
|
respond with the outcome. The direction of connection establishment for the
|
|
peering depends on configuration and can go either way (more on this below); by
|
|
default, agents connect to the controller.
|
|
|
|
The agent's API resides in the :zeek:see:`Management::Agent::API` module.
|
|
Additional code documentation is :doc:`here </scripts/policy/frameworks/management/agent/index>`.
|
|
|
|
Agents add script-layer code to both the Supervisor (details :doc:`here
|
|
</scripts/policy/frameworks/management/supervisor/index>`) and Zeek cluster
|
|
nodes (details :doc:`here </scripts/policy/frameworks/management/node/index>`)
|
|
to enable management tasks (e.g. to tap into node stdout/stderr output) and to
|
|
receive confirmation of successful node startup.
|
|
|
|
Cluster nodes
|
|
-------------
|
|
|
|
The Zeek processes involved in traffic analysis and log output make up the Zeek
|
|
*cluster*, via the :ref:`cluster framework <cluster-framework>`. The management
|
|
framework does not change the cluster framework, and all of its concepts (the
|
|
manager, logger(s), workers, etc) apply as before. Cluster *nodes* refer to
|
|
individual Zeek processes in the cluster, as managed by the Supervisor.
|
|
|
|
Client
|
|
------
|
|
|
|
The management client provides the user's interface to cluster management. It
|
|
allows configuration and deployment of the Zeek cluster, insight into the
|
|
running cluster, the ability to restart nodes, etc. The client uses the
|
|
controller's event API to communicate and is the only component in the framework
|
|
not (necessarily) implemented in Zeek's script layer. The Zeek distribution
|
|
ships with ``zeek-client``, a command-line client implemented in Python, to
|
|
provide management functionality. Users are welcome to implement other clients.
|
|
|
|
.. _framework-management-visual-example:
|
|
|
|
A Visual Example
|
|
================
|
|
|
|
Consider the following setup, consisting of a single instance, controller, and a
|
|
connected ``zeek-client``, all running on different machines:
|
|
|
|
.. image:: /images/management.png
|
|
:align: center
|
|
|
|
The cluster system runs a single management instance, with an agent listening on
|
|
TCP port 2151, the default. Since the agent needs to communicate with the
|
|
Supervisor for node management tasks and the two run in separate processes, the
|
|
Supervisor listens for Broker peerings, on TCP port 9999 (again, the default),
|
|
and the two communicate events over topic ``zeek/supervisor``. As shown, the
|
|
agent has launched a 4-node Zeek cluster consisting of two workers, a logger,
|
|
and a manager, communicating internally as usual.
|
|
|
|
The controller system is more straightforward, consisting merely of a
|
|
Supervisor-governed management controller. This controller has connected to and
|
|
peered with the agent on the cluster system, to relay commands received by the
|
|
client via the agent's API and receive responses over Broker topic
|
|
``zeek/management/agent``. Since the controller doesn't need to interact with
|
|
the Supervisor, the latter doesn't listen on any ports. Standalone controllers,
|
|
as running here, still require a Supervisor, to simplify co-located deployment
|
|
of agent and controller in a single instance.
|
|
|
|
Finally, the admin system doesn't run Zeek, but has it installed to provide
|
|
``zeek-client``, the CLI for issuing cluster management requests. This client
|
|
connects to and peers with the controller, exchanging controller API events over
|
|
topic ``zeek/management/controller``. For more details on ``zeek-client``, see
|
|
:ref:`below <framework-management-zeek-client>`.
|
|
|
|
In practice you can simplify the deployment by running ``zeek-client`` directly
|
|
on the controller machine, or by running agent and controller jointly on a
|
|
single system. We cover this in :ref:`more detail
|
|
<framework-management-running>`.
|
|
|
|
Goals and Relationship to ZeekControl
|
|
=====================================
|
|
|
|
The management framework first shipped in usable form in Zeek 5.0. It will
|
|
replace the aging :ref:`ZeekControl <cluster-configuration>` over the course of
|
|
the coming releases. The framework is not compatible with ZeekControl's approach
|
|
to cluster management: use one or the other, not both.
|
|
|
|
The framework currently targets single-instance deployments, i.e., setups in
|
|
which traffic monitoring happens on a single system. While the management
|
|
framework technically supports clusters spanning multiple monitoring systems,
|
|
much of the infrastructure users know from ``zeekctl`` (such as the ability to
|
|
deploy Zeek scripts and additional configuration) is not yet available in the
|
|
management framework.
|
|
|
|
ZeekControl remains included in the Zeek distribution, and remains the
|
|
recommended solution for multi-system clusters and those needing rich management
|
|
capabilities.
|
|
|
|
.. _framework-management-running:
|
|
|
|
Running Controller and Agent
|
|
============================
|
|
|
|
.. _joint-launch:
|
|
|
|
Joint launch
|
|
------------
|
|
|
|
The easiest approach is to run a single Zeek instance in which the Supervisor
|
|
launches both an agent and the controller. The framework comes pre-configured for
|
|
this use-case. Its invocation looks as follows:
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -j policy/frameworks/management/controller policy/frameworks/management/agent
|
|
|
|
The ``-j`` flag enables the Supervisor and is required for successful launch of
|
|
the framework. (Without it, the above command will simply return.)
|
|
|
|
.. note::
|
|
|
|
If you're planning to monitor the machine's own traffic, add the ``-C`` flag
|
|
to avoid checksum errors, which commonly happen in local monitoring due to
|
|
offload of the checksum computation to the NIC.
|
|
|
|
The following illustrates this setup:
|
|
|
|
.. image:: /images/management-all-in-one.png
|
|
:align: center
|
|
:scale: 75%
|
|
|
|
Separate controller and agent instances
|
|
---------------------------------------
|
|
|
|
You can also separate the agent and controller instances. For this, you'd say
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -j policy/frameworks/management/agent
|
|
|
|
for the agent, and
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -j policy/frameworks/management/controller
|
|
|
|
for the controller. You can run the latter as a regular user, assuming the user
|
|
has write access to the installation's spool and log directories (more on this
|
|
below). While technically not required to operate a stand-alone controller, the
|
|
Supervisor is currently also required in this scenario, so don't omit the
|
|
``-j``.
|
|
|
|
This looks as follows:
|
|
|
|
.. image:: /images/management-all-in-one-two-zeeks.png
|
|
:align: center
|
|
|
|
|
|
Controller and agent instances on separate systems
|
|
--------------------------------------------------
|
|
|
|
You can also separate the two across different systems, though that approach
|
|
will only really start to make sense when the framework fully supports running
|
|
multiple traffic-sniffing instances. To do this, you either need to configure
|
|
the agent to find the controller, or tell the controller where to find the
|
|
agent. For the former, redefine the corresponding config setting, for example by
|
|
saying
|
|
|
|
.. code-block:: zeek
|
|
|
|
redef Management::Agent::controller = [$address="1.2.3.4", $bound_port=21500/tcp];
|
|
|
|
in ``local.zeek`` and then launching
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -j policy/frameworks/management/agent local
|
|
|
|
The result looks as already covered :ref:`earlier <framework-management-visual-example>`:
|
|
|
|
.. image:: /images/management.png
|
|
:align: center
|
|
|
|
To make the controller connect to remote agents, deploy configurations that
|
|
include the location of such agents in the configuration. More on this below.
|
|
|
|
Multiple instances
|
|
------------------
|
|
|
|
You can run multiple instances on a single system, but it requires some
|
|
care. Doing so requires specifying a different listening port for each agent,
|
|
and additionally providing a different listening port for each instance's
|
|
Supervisor. Since agents communicate with their Supervisor to facilitate node
|
|
management, the Supervisor needs to listen (though only locally). Furthermore,
|
|
you need to ensure this agent runs with a unique name (see the next section for
|
|
more on naming).
|
|
|
|
Assuming you already have an instance running, a launch of an additional agent
|
|
might look as follows:
|
|
|
|
.. code-block:: console
|
|
|
|
# zeek -j policy/frameworks/management/agent \
|
|
Management::Agent::default_port=2152/tcp \
|
|
Management::Agent::name=agent-standby \
|
|
Broker::default_port=10001/tcp
|
|
|
|
Finally, as already mentioned, you can spread multiple instances across multiple
|
|
systems to explore distributed cluster management. This simplifies the
|
|
individual launch invocations, but for practical distributed cluster use you may
|
|
find the framework's current cluster management features lacking when compared
|
|
to ZeekControl.
|
|
|
|
Controller and agent naming
|
|
---------------------------
|
|
|
|
The management framework identifies all nodes in the system by name, and all
|
|
nodes (agent(s), controller, and Zeek cluster nodes) must have unique names. By
|
|
default, the framework chooses ``agent-<hostname>`` and
|
|
``controller-<hostname>`` for agent and controller, respectively. To reconfigure
|
|
naming, set the ``ZEEK_AGENT_NAME`` / ``ZEEK_CONTROLLER_NAME`` environment
|
|
variables, or redefine the following:
|
|
|
|
.. code-block:: zeek
|
|
|
|
redef Management::Controller::name = "controller1";
|
|
redef Management::Agent::name = "agent1";
|
|
|
|
Firewalling and encryption
|
|
--------------------------
|
|
|
|
By default, the controller listens for clients and agents on ports ``2149/tcp`` and
|
|
``2150/tcp``. The former port supports Broker's WebSocket data format, the latter its
|
|
traditional one.
|
|
Unless you run all components, including the client, on a single system, you'll
|
|
want to open up these ports on the controller's system. The agent's default port
|
|
is ``2151/tcp``. It always listens; this allows cluster nodes to connect to it
|
|
to send status reports. If the agents connect to the controller, your firewall
|
|
may block the agent's port since host-local connectivity from cluster nodes to
|
|
the agent process suffices.
|
|
|
|
To switch agent and/or controller to different ports, set environment variables
|
|
``ZEEK_CONTROLLER_PORT`` / ``ZEEK_CONTROLLER_WEBSOCKET_PORT`` / ``ZEEK_AGENT_PORT``,
|
|
or use the following:
|
|
|
|
.. code-block:: zeek
|
|
|
|
redef Management::Controller::default_port_websocket = 21490/tcp;
|
|
redef Management::Controller::default_port = 21500/tcp;
|
|
redef Management::Agent::default_port = 21510/tcp;
|
|
|
|
By default, agent and controller listen globally. To make them listen on a
|
|
specific interface, set environment variables ``ZEEK_CONTROLLER_ADDR`` /
|
|
``ZEEK_CONTROLLER_WEBSOCKET_ADDR`` / ``ZEEK_AGENT_ADDR``,
|
|
or redefine the framework's fallback default address:
|
|
|
|
.. code-block:: zeek
|
|
|
|
redef Management::default_address = "127.0.0.1";
|
|
|
|
The framework inherits Broker's TLS capabilities and defaults. For details,
|
|
please refer to the :doc:`Broker config settings
|
|
</scripts/base/frameworks/broker/main.zeek>`.
|
|
|
|
.. note::
|
|
|
|
``zeek-client`` currently doesn't support client-side certificates.
|
|
|
|
Additional framework configuration
|
|
----------------------------------
|
|
|
|
The framework features a number of additional settings that we cover as needed
|
|
in the remainder of this chapter. Refer to the following to browse them all:
|
|
|
|
* :doc:`General settings </scripts/policy/frameworks/management/config.zeek>`
|
|
* :doc:`Controller </scripts/policy/frameworks/management/controller/config.zeek>`
|
|
* :doc:`Agents </scripts/policy/frameworks/management/agent/config.zeek>`
|
|
* :doc:`Cluster nodes </scripts/policy/frameworks/management/node/config.zeek>`
|
|
* :doc:`Supervisor </scripts/policy/frameworks/management/supervisor/config.zeek>`
|
|
|
|
Node Operation and Outputs
|
|
==========================
|
|
|
|
The framework places every Supervisor-created node into its own working
|
|
directory, located in ``$(zeek-config --prefix)/var/lib/nodes/<name>``. You can
|
|
reconfigure this by setting the ``ZEEK_MANAGEMENT_STATE_DIR`` or redefining
|
|
:zeek:see:`Management::state_dir`. Doing either will change the toplevel
|
|
directory (i.e., replacing the path up to and including ``var/lib`` in the
|
|
above); the framework will still create the ``nodes/<name>`` directory structure
|
|
within it.
|
|
|
|
Outputs in the resulting directory include:
|
|
|
|
* Two separate ad-hoc logs (not structured by Zeek's logging framework)
|
|
capturing the node's stdout and stderr streams. Their naming is configurable,
|
|
defaulting simply to ``stdout`` and ``stderr``.
|
|
|
|
* Zeek log files prior to log rotation.
|
|
|
|
* Persisted Zeek state, such as Broker-backed tables.
|
|
|
|
|
|
Log Management
|
|
==============
|
|
|
|
The framework configures log rotation and archival via Zeek's included
|
|
`zeek-archiver tool <https://github.com/zeek/zeek-archiver>`_, as follows:
|
|
|
|
* The :zeek:see:`Log::default_rotation_interval` is one hour, with both local
|
|
and remote logging enabled. You are free to adjust it as needed.
|
|
|
|
* The log rotation directory defaults to ``$(zeek-config --prefix)/spool/log-queue``.
|
|
To adjust this, redefine :zeek:see:`Log::default_rotation_dir` as usual.
|
|
You can also relocate the spool by setting the ``ZEEK_MANAGEMENT_SPOOL_DIR``
|
|
environment variable or redefining :zeek:see:`Management::spool_dir`. The
|
|
framework will place ``log-queue`` into that new destination.
|
|
|
|
* The log rotation callback rotates node-local logs into the log queue, with
|
|
naming suitable for ``zeek-archiver``. An example:
|
|
|
|
.. code-block:: console
|
|
|
|
conn__2022-06-20-10-00-00__2022-06-20-11-00-00__.log
|
|
|
|
For details, take a look at the implementation in
|
|
``scripts/policy/frameworks/management/persistence.zeek``.
|
|
|
|
* Once per log rotation interval, the agent launches log archival to archive
|
|
rotated logs into the installation's log directory (``$(zeek-config
|
|
--root)/logs``). By default this invokes ``zeek-archiver``, which establishes
|
|
a datestamp directory in the ``logs`` directory and places the compressed logs
|
|
into it:
|
|
|
|
.. code-block:: console
|
|
|
|
# cd $(zeek-config --root)/logs
|
|
# ls -l
|
|
total 4
|
|
drwx------. 2 root root 4096 Jun 20 21:17 2022-06-20
|
|
# cd 2022-06-20
|
|
# ls -l
|
|
total 712
|
|
-rw-r--r--. 1 root root 280 Jun 20 20:17 broker.19:00:00-20:00:00.log.gz
|
|
-rw-r--r--. 1 root root 24803 Jun 20 20:17 conn.19:00:00-20:00:00.log.gz
|
|
-rw-r--r--. 1 root root 26036 Jun 20 21:17 conn.20:00:00-21:00:00.log.gz
|
|
-rw-r--r--. 1 root root 350 Jun 20 20:17 dhcp.19:00:00-20:00:00.log.gz
|
|
-rw-r--r--. 1 root root 400 Jun 20 21:17 dhcp.20:00:00-21:00:00.log.gz
|
|
...
|
|
|
|
You can adapt the log archival configuration via the following settings:
|
|
|
|
* Redefine :zeek:see:`Management::Agent::archive_logs` to ``F`` to disable
|
|
archival entirely.
|
|
|
|
* Redefine :zeek:see:`Management::Agent::archive_interval` for an interval other
|
|
than the log rotation one.
|
|
|
|
* Redefine :zeek:see:`Management::Agent::archive_dir` to change the
|
|
destination directory.
|
|
|
|
* Redefine :zeek:see:`Management::Agent::archive_cmd` to invoke an executable
|
|
other than the included ``zeek-archiver``. The replacement should accept the
|
|
same argument structure: ``<executable> -1 <input dir> <output dir>``. The
|
|
``-1`` here refers to ``zeek-archiver``'s one-shot processing mode.
|
|
|
|
.. _framework-management-zeek-client:
|
|
|
|
The zeek-client CLI
|
|
===================
|
|
|
|
Zeek ships with a command-line client for the Management framework:
|
|
``zeek-client``, installed alongside the other executables in the
|
|
distribution. It looks as follows:
|
|
|
|
.. literalinclude:: management/zeek-client-help.console
|
|
:language: console
|
|
|
|
Run commands with ``--help`` for additional details.
|
|
|
|
The majority of ``zeek-client``'s commands send off a request to the controller,
|
|
wait for it to act on it, retrieve the response, and render it to the
|
|
console. The output is typically in JSON format, though a few commands also
|
|
support ``.ini`` output.
|
|
|
|
Looking at the :zeek:see:`Management::Controller::API` module, you'll notice
|
|
that the structure of response event arguments is fairly rigid, consisting of
|
|
one or more :zeek:see:`Management::Result` records. ``zeek-client`` does not
|
|
render these directly to JSON. Instead, it translates the responses to a more
|
|
convenient JSON format reflecting specific types of requests. Several commands
|
|
share a common output format.
|
|
|
|
.. _zeek-client-installation:
|
|
|
|
Standalone installation
|
|
-----------------------
|
|
|
|
As mentioned above, Zeek ships with ``zeek-client`` by default. Since users will
|
|
often want to use the client from machines not otherwise running Zeek, the
|
|
client is also available as a standalone Python package via ``pip``:
|
|
|
|
.. code-block:: console
|
|
|
|
$ pip install zeek-client
|
|
|
|
Users with custom Zeek builds who don't require a Zeek-bundled ``zeek-client``
|
|
can skip its installation by configuring their build with
|
|
``--disable-zeek-client``.
|
|
|
|
.. _zeek-client-compatibility:
|
|
|
|
Compatibility
|
|
-------------
|
|
|
|
Zeek 5.2 switched client/controller communication from Broker's native wire
|
|
format to the newer `WebSocket data transport
|
|
<https://docs.zeek.org/projects/broker/en/current/web-socket.html>`_, with
|
|
``zeek-client`` 1.2.0 being the first version to exclusively use WebSockets.
|
|
This has a few implications:
|
|
|
|
* Since Broker dedicates separate ports to the respective wire formats, the
|
|
controller listens on TCP port 2149 for WebSocket connections, while
|
|
TCP port 2150 remains available for connections by native-Broker clients, as well
|
|
as by management agents connecting to the controller.
|
|
|
|
* ``zeek-client`` 1.2.0 and newer default to connecting to port 2149.
|
|
|
|
* Controllers running Zeek older than 5.2 need tweaking to listen on a WebSocket
|
|
port, for example by saying:
|
|
|
|
.. code-block:: console
|
|
|
|
event zeek_init()
|
|
{
|
|
Broker::listen_websocket("0.0.0.0", 2149/tcp);
|
|
}
|
|
|
|
* Older clients continue to work with Zeek 5.2 and newer.
|
|
|
|
.. _zeek-client-configuration:
|
|
|
|
Configuration
|
|
-------------
|
|
|
|
The client features a handful of configuration settings, reported when running
|
|
``zeek-client show-settings``:
|
|
|
|
.. literalinclude:: management/zeek-client-show-settings.console
|
|
:language: console
|
|
|
|
You can override these via a configuration file, the environment variable
|
|
``ZEEK_CLIENT_CONFIG_SETTINGS``, and the ``--set`` command-line argument, in
|
|
order of increasing precedence. To identify a setting, use
|
|
``<section>.<setting>``, as shown by your client. For example, in order to
|
|
specify a controller's location on the network, you could:
|
|
|
|
* Put the following in a config file, either at its default location shown in
|
|
the help output (usually ``$(zeek-config --prefix)/etc/zeek-client.cfg``)
|
|
or one that you provide via ``-c``/``--configfile``:
|
|
|
|
.. code-block:: ini
|
|
|
|
[controller]
|
|
host = mycontroller
|
|
port = 21490
|
|
|
|
* Set the environment:
|
|
|
|
.. code-block:: console
|
|
|
|
ZEEK_CLIENT_CONFIG_SETTINGS="controller.host=mycontroller controller.port=21490"
|
|
|
|
* Use the ``--set`` option, possibly repeatedly:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client --set controller.host=mycontroller --set controller.port=21490 ...
|
|
|
|
Other than the controller coordinates, the settings should rarely require
|
|
changing. If you're curious about their meaning, please consult the `source code
|
|
<https://github.com/zeek/zeek-client/blob/master/zeekclient/config.py>`_.
|
|
|
|
Auto-complete
|
|
-------------
|
|
|
|
On systems with an installed `argcomplete <https://pypi.org/project/argcomplete/>`_
|
|
package, ``zeek-client`` features command-line auto-completion. For example:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client --set controller.<TAB>
|
|
controller.host=127.0.0.1 controller.port=2149
|
|
|
|
Common cluster management tasks
|
|
===============================
|
|
|
|
With a running controller and agent, it's time start using ``zeek-client`` for
|
|
actual cluster management tasks. By default, the client will connect to a
|
|
controller running on the local system. If that doesn't match your setting,
|
|
instruct the client to contact the controller via one of the approaches shown
|
|
:ref:`earlier <zeek-client-configuration>`.
|
|
|
|
Checking connected agents
|
|
-------------------------
|
|
|
|
Use ``zeek-client get-instances`` to get a summary of agents currently peered
|
|
with the controller:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client get-instances
|
|
{
|
|
"agent-testbox": {
|
|
"host": "127.0.0.1"
|
|
}
|
|
}
|
|
|
|
For agents connecting to the controller you'll see the above output; for agents
|
|
the controller connected to you'll also see those agent's listening ports.
|
|
|
|
Defining a cluster configuration
|
|
--------------------------------
|
|
|
|
For ``zeek-client``, cluster configurations are simple ``.ini`` files with two
|
|
types of sections: the special ``instances`` section defines the instances
|
|
involved in the cluster, represented by their agents. All other sections in the
|
|
file name individual cluster nodes and describe their roles and properties.
|
|
|
|
Here's a full-featured configuration describing the available options, assuming
|
|
a single agent running on a machine "testbox" with default settings:
|
|
|
|
.. literalinclude:: management/full-config.ini
|
|
:language: ini
|
|
|
|
.. _simplification-instance-local:
|
|
|
|
Simplification for instance-local deployment
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
In practice you can omit many of the settings. We already saw in the
|
|
:ref:`Quickstart <framework-management-quickstart>` section that a configuration
|
|
deployed locally in a :ref:`joint agent-controller setup <joint-launch>` need
|
|
not specify any instances at all. In that case, use of the local instance
|
|
``agent-<hostname>`` is implied. If you use other agent naming or more complex
|
|
setups, every node needs to specify its instance.
|
|
|
|
Simplification for agent-to-controller connectivity
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
In setups where agents connect to the controller, you may omit the ``instances``
|
|
section if it would merely repeat the list of instances claimed by the nodes.
|
|
|
|
Simplification for port selection
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
All but the worker nodes in a Zeek cluster require a listening port, and you can
|
|
specify one for each node as shown in the above configuration. If you'd rather
|
|
not pick ports, the controller can auto-enumerate ports for you, as follows:
|
|
|
|
* The :zeek:see:`Management::Controller::auto_assign_broker_ports` Boolean, which defaults to
|
|
``T``, controls whether port auto-enumeration is active. Redefining to ``F``
|
|
disables the feature.
|
|
|
|
* :zeek:see:`Management::Controller::auto_assign_broker_start_port` defines the starting point
|
|
for port enumeration. This defaults to ``2200/tcp``.
|
|
|
|
* Any nodes with explicitly configured ports will keep them.
|
|
|
|
* For other nodes, the controller will assign ports first to the manager, then
|
|
logger(s), then proxies. Within each of those groups, it first groups nodes
|
|
in the same instance (to obtain locally sequential ports), and orders these
|
|
alphabetically by name before enumerating. It also avoids conflicts with
|
|
configured agent and controller ports.
|
|
|
|
* The controller does not verify that selected ports are in fact unclaimed.
|
|
It's up to you to ensure an available pool of unclaimed listening ports from
|
|
the start port onward.
|
|
|
|
By retrieving the deployed configuration from the controller (see the next two
|
|
sections) you can examine which ports the controller selected.
|
|
|
|
Configuration of the Telemetry framework
|
|
----------------------------------------
|
|
|
|
By default, the framework will enable Prometheus metrics exposition ports,
|
|
including a service discovery endpoint on the manager (refer to the
|
|
:ref:`Telemetry Framework <framework-telemetry>` for details), and
|
|
auto-assign them for you. Specifically, the controller will enumerate ports
|
|
starting from
|
|
:zeek:see:`Management::Controller::auto_assign_metrics_start_port`, which
|
|
defaults to ``9000/tcp``. Any ports you define manually will be preserved. To
|
|
disable metrics port auto-assignment, redefine
|
|
:zeek:see:`Management::Controller::auto_assign_metrics_ports` to ``F``.
|
|
|
|
Staging and deploying configurations
|
|
------------------------------------
|
|
|
|
The framework structures deployment of a cluster configuration into two
|
|
phases:
|
|
|
|
#. First, the cluster configuration is *staged*: the client uploads it to the
|
|
controller, which validates its content, and --- upon successful validation
|
|
--- persists this configuration to disk. Restarting the controller at this
|
|
point will preserve this configuration in its staged state. Validation checks
|
|
the configuration for consistency and structural errors, such as doubly
|
|
defined nodes, port collisions, or inconsistent instance use. The controller
|
|
only ever stores a single staged configuration.
|
|
|
|
#. Then, *deployment* applies needed finalization to the configuration (e.g. to
|
|
auto-enumerate ports) and, assuming all needed instances have peered,
|
|
distributes the configuration to their agents. Deployment replaces any
|
|
preexisting Zeek cluster, shutting down the existing node processes. The
|
|
controller also persists the deployed configuration to disk, alongside the
|
|
staged one. Deployment does *not* need to be successful to preserve a
|
|
deployed configuration: it's the attempt to deploy that matters.
|
|
|
|
Internally, configurations bear an identifier string to allow tracking. The
|
|
client selects this identifier, which comes with no further assurances --- for
|
|
example, identical configurations need not bear the same identifier.
|
|
|
|
To stage a configuration, use the following:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client stage-config cluster.cfg
|
|
{
|
|
"errors": [],
|
|
"results": {
|
|
"id": "5e90197a-f850-11ec-a77f-7c10c94416bb"
|
|
}
|
|
}
|
|
|
|
The ``errors`` array contains textual description of any validation problems
|
|
encountered, causing the client to exit with error. The reported ``id`` is the
|
|
configuration's identifier, as set by the client.
|
|
|
|
Then, trigger deployment of the staged configuration:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client deploy
|
|
{
|
|
"errors": [],
|
|
"results": {
|
|
"id": "5e90197a-f850-11ec-a77f-7c10c94416bb"
|
|
"nodes": {
|
|
"logger": {
|
|
"instance": "agent-testbox4",
|
|
"success": true
|
|
},
|
|
"manager": {
|
|
"instance": "agent-testbox4",
|
|
"success": true
|
|
},
|
|
"worker-01": {
|
|
"instance": "agent-testbox4",
|
|
"success": true
|
|
},
|
|
"worker-02": {
|
|
"instance": "agent-testbox4",
|
|
"success": true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
Success! Note the matching identifiers. The errors array covers any internal
|
|
problems, and per-node summaries report the deployment outcome. In case of
|
|
launch errors in individual nodes, stdout/stderr is captured and hopefully
|
|
provides clues. Revisiting the quickstart example, let's introduce an error in
|
|
``cluster.cfg``:
|
|
|
|
.. literalinclude:: management/mini-config-with-error.ini
|
|
:language: ini
|
|
|
|
Since staging and deployment will frequently go hand-in-hand, the client
|
|
provides the ``deploy-config`` command to combine them into one. Let's use it:
|
|
|
|
.. literalinclude:: management/mini-deployment-error.console
|
|
:language: console
|
|
|
|
The client exits with error, the timeout error refers to the fact that one of
|
|
the launch commands timed out, and ``worker-02``'s stderr shows the problem. The
|
|
Supervisor will continue to try to launch the node with ever-increasing
|
|
reattempt delays, and keep failing.
|
|
|
|
Retrieving configurations
|
|
-------------------------
|
|
|
|
The client's ``get-config`` command lets you retrieve staged and deployed
|
|
configurations from the controller, in JSON or :file:`.ini` form. This is helpful for
|
|
examining the differences between the two. Following the successful deployment
|
|
shown above, we have:
|
|
|
|
.. literalinclude:: management/mini-deployment-get-config-staged.console
|
|
:language: console
|
|
|
|
You can see here how the client's :ref:`instance-local simplification
|
|
<simplification-instance-local>` filled in instances under the hood.
|
|
|
|
The ``.ini`` output is reusable as deployable configuration. The same
|
|
configuration is available in JSON, showing more detail:
|
|
|
|
.. literalinclude:: management/mini-deployment-get-config-staged-json.console
|
|
:language: console
|
|
|
|
Finally, you can also retrieve the deployed configuration (in either format):
|
|
|
|
.. literalinclude:: management/mini-deployment-get-config-deployed.console
|
|
:language: console
|
|
|
|
Note the manager's and logger's auto-enumerated ports in this one.
|
|
|
|
Showing the current instance nodes
|
|
----------------------------------
|
|
|
|
To see the current node status as visible to the Supervisors in each agent's
|
|
instance, use the ``get-nodes`` command:
|
|
|
|
.. literalinclude:: management/mini-deployment-get-nodes.console
|
|
:language: console
|
|
|
|
This groups nodes by instances, while also showing agents and controllers, so
|
|
``agent-testbox`` shows up twice in the above. Nodes can be in two states,
|
|
``PENDING`` upon launch and before the new node has checked in with the agent,
|
|
and ``RUNNING`` once that has happened. Nodes also have a role either in cluster
|
|
management (as ``AGENT`` or ``CONTROLLER``), or in the Zeek cluster. The
|
|
information shown per node essentially reflects the framework's
|
|
:zeek:see:`Management::NodeStatus` record.
|
|
|
|
Showing current global identifier values
|
|
----------------------------------------
|
|
|
|
For troubleshooting scripts in production it can be very handy to verify the
|
|
contents of global variables in specific nodes. The client supports this via the
|
|
``get-id-value`` command. To use it, specify the name of a global identifier, as
|
|
well as any node names from which you'd like to retrieve it. The framework
|
|
renders the value to JSON directly in the queried cluster node, side-stepping
|
|
potential serialization issues for complex types, and integrates the result in
|
|
the response:
|
|
|
|
.. literalinclude:: management/get-id-value-simple.console
|
|
:language: console
|
|
|
|
.. literalinclude:: management/get-id-value-complex.console
|
|
:language: console
|
|
|
|
Restarting cluster nodes
|
|
------------------------
|
|
|
|
The ``restart`` command allows you to restart specific cluster nodes, or the
|
|
entire cluster. Note that this refers only to the operational cluster (manager,
|
|
workers, etc) --- this will not restart any agents or a co-located controller.
|
|
|
|
Here's the current manager:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client get-nodes | jq '.results."agent-testbox".manager'
|
|
{
|
|
"cluster_role": "MANAGER",
|
|
"mgmt_role": null,
|
|
"pid": 54073,
|
|
"port": 2200,
|
|
"state": "RUNNING"
|
|
}
|
|
|
|
Let's restart it:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client restart manager
|
|
{
|
|
"errors": [],
|
|
"results": {
|
|
"manager": true
|
|
}
|
|
}
|
|
|
|
It's back up and running (note the PID change):
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek-client get-nodes | jq '.results."agent-testbox".manager'
|
|
{
|
|
"cluster_role": "MANAGER",
|
|
"mgmt_role": null,
|
|
"pid": 68752,
|
|
"port": 2200,
|
|
"state": "RUNNING"
|
|
}
|