mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 06:38:20 +00:00

This is based on commit 2731def9159247e6da8a3191783c89683363689c from the zeek-docs repo.
162 lines
6.9 KiB
ReStructuredText
162 lines
6.9 KiB
ReStructuredText
|
|
.. _framework-supervisor:
|
|
|
|
====================
|
|
Supervisor Framework
|
|
====================
|
|
|
|
.. rst-class:: opening
|
|
|
|
The Supervisor framework enables an entirely new mode for Zeek, one that
|
|
supervises a set of Zeek processes that are meant to be persistent. A
|
|
Supervisor automatically revives any process that dies or exits prematurely
|
|
and also arranges for an ordered shutdown of the entire process tree upon
|
|
its own termination. This Supervisor mode for Zeek provides the basic
|
|
foundation for process configuration/management that could be used to
|
|
deploy a Zeek cluster similar to what ZeekControl does, but is also simpler
|
|
to integrate as a standard system service.
|
|
|
|
Simple Example
|
|
==============
|
|
|
|
A simple example of using the Supervisor to monitor one Zeek process
|
|
sniffing packets from an interface looks like the following:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -j simple-supervisor.zeek
|
|
|
|
.. literalinclude:: supervisor/simple-supervisor.zeek
|
|
:caption: simple-supervisor.zeek
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
The command-line argument of ``-j`` toggles Zeek to run in "Supervisor mode" to
|
|
allow for creation and management of child processes. If you're going to test
|
|
this locally, be sure to change ``en0`` to a real interface name you can sniff.
|
|
|
|
Notice that the ``simple-supervisor.zeek`` script is loaded and executed by
|
|
both the main Supervisor process and also the child Zeek process that it spawns
|
|
via :zeek:see:`Supervisor::create` with :zeek:see:`Supervisor::is_supervisor`
|
|
or :zeek:see:`Supervisor::is_supervised` being able to distinguish the
|
|
Supervisor process from the supervised child process, respectively.
|
|
You can also distinguish between multiple supervised child processes by
|
|
inspecting the contents of :zeek:see:`Supervisor::node` (e.g. comparing node
|
|
names).
|
|
|
|
If you happened to be running this locally on an interface with checksum
|
|
offloading and want Zeek to ignore checksums, instead simply run with the
|
|
``-C`` command-line argument like:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -j -C simple-supervisor.zeek
|
|
|
|
Most command-line arguments to Zeek are automatically inherited by any
|
|
supervised child processes that get created. The notable ones that are *not*
|
|
inherited are the options to read pcap files and live interfaces, ``-r`` and
|
|
``-i``, respectively.
|
|
|
|
For node-specific configuration options, see :zeek:see:`Supervisor::NodeConfig`
|
|
which gets passed as argument to :zeek:see:`Supervisor::create`.
|
|
|
|
Supervised Cluster Example
|
|
==========================
|
|
|
|
To run a full Zeek cluster similar to what you may already know, try the
|
|
following script:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek -j cluster-supervisor.zeek
|
|
|
|
.. literalinclude:: supervisor/cluster-supervisor.zeek
|
|
:caption: cluster-supervisor.zeek
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
This script now spawns four nodes: a cluster manager, logger, worker, and
|
|
proxy. It also configures each node to use a separate working directory
|
|
corresponding to the node's name within the current working directory of the
|
|
Supervisor process. Any stdout/stderr output of the nodes is automatically
|
|
redirected through the Supervisor process and prefixes with relevant
|
|
information, like the node name that the output came from.
|
|
|
|
The Supervisor process also listens on a port of its own for further
|
|
instructions from other external/remote processes via
|
|
:zeek:see:`Broker::listen`. For example, you could use this other script to
|
|
tell the Supervisor to restart all processes, perhaps to re-load Zeek scripts
|
|
you've changed in the meantime:
|
|
|
|
.. code-block:: console
|
|
|
|
$ zeek supervisor-control.zeek
|
|
|
|
.. literalinclude:: supervisor/supervisor-control.zeek
|
|
:caption: supervisor-control.zeek
|
|
:language: zeek
|
|
:linenos:
|
|
:tab-width: 4
|
|
|
|
Any Supervisor instruction you can perform via an API call in a local script
|
|
can also be triggered via an associated external event.
|
|
|
|
For further details, consult the ``Supervisor`` API at
|
|
:doc:`/scripts/base/frameworks/supervisor/api.zeek` and
|
|
``SupervisorControl`` API (for remote management) at
|
|
:doc:`/scripts/base/frameworks/supervisor/control.zeek`.
|
|
|
|
Internal Architecture
|
|
=====================
|
|
|
|
The following details aren't necessarily important for most users, but instead
|
|
aim to give developers a high-level overview of how the process supervision
|
|
framework is implemented. The process tree in "supervisor" mode looks like:
|
|
|
|
.. figure:: supervisor/zeek-supervisor-architecture.png
|
|
|
|
The top-level "Supervisor" process does not directly manage any of the
|
|
supervised nodes that are created. Instead, it spawns in intermediate process,
|
|
called "Stem", to manage the lifetime of supervised nodes. This is done for
|
|
two reasons:
|
|
|
|
1. Avoids the need to ``exec()`` the supervised processes which requires
|
|
executing whatever version of the ``zeek`` binary happens to exist on
|
|
the filesystem at the time of call and it may have changed in the meantime.
|
|
This can help avoid potential incompatibility or race-condition pitfalls
|
|
associated with system maintenance/upgrades. The one situation that does
|
|
still require an ``exec()`` is if the Stem process dies prematurely, but
|
|
that is expected to be a rare scenario.
|
|
2. Zeek run-time operation generally taints global state, so creating an early
|
|
``fork()`` for use as the Stem process provides a pure baseline image to use
|
|
for supervised processes.
|
|
|
|
Ultimately, there are two tiers of process supervision happening: the
|
|
Supervisor will revive the Stem process if needed and the Stem process will
|
|
revive any of its children when needed.
|
|
|
|
Also, either the Stem or any of its supervised children processes will
|
|
automatically detect if they are orphaned from their parent process and
|
|
self-terminate. The Stem checks for orphaning simply by waking up every second
|
|
from its ``poll()`` loop to look if its parent PID changed. A supervised node
|
|
checks for orphaning similarly, but instead does so from a recurring ``Timer``.
|
|
Other than the orphaning-check and how it establishes the desired
|
|
configuration from a combination of inheriting command-line arguments and
|
|
inspecting Supervisor-specific options, a supervised node does not operate
|
|
differently at run-time from a traditional Zeek process.
|
|
|
|
Node Revival
|
|
============
|
|
|
|
The Supervisor framework assumes that supervised nodes run until something asks
|
|
the Supervisor to stop them. When a supervised node exits unexpectedly, the Stem
|
|
attempts to revive it during its periodic polling routine. This revival
|
|
procedure implements exponential delay, as follows: starting from a delay of one
|
|
second, the Stem revives the node up to 3 times. At that point, it doubles the
|
|
revival delay, and again tries up to 3 times. This continues indefinitely: the
|
|
Stem never gives up on a node, while the revival delay keeps growing. Once a
|
|
supervised node has remained up for at least 30 seconds, the revival state
|
|
clears and will start from scratch as just described, should the node exit
|
|
again. The Supervisor codebase currently hard-wires these thresholds and delays.
|