zeek/doc/frameworks/storage.rst
Tim Wojtulewicz ded98cd373 Copy docs into Zeek repo directly
This is based on commit 2731def9159247e6da8a3191783c89683363689c from the
zeek-docs repo.
2025-09-26 02:58:29 +00:00

216 lines
9.9 KiB
ReStructuredText

.. _framework-storage:
.. versionadded:: 7.2
=================
Storage Framework
=================
The storage framework provides a plugin-based system for short- and long-term storage of
data, accessible from Zeek script-land. This is not packet data itself, but data artifacts
generated from the packet data. It has interchangeable asynchronous and synchronous
modes. The framework provides just a simple key-value store, using Zeek values as the keys
to store and lookup data.
This chapter gives an overview of the storage framework, plus examples of using it. For
more examples, see the test cases in ``testing/btest/scripts/base/frameworks/storage`` and
an example storage plugin in ``testing/btest/plugin/storage-src``.
Terminology
===========
Zeek's storage framework uses two main components:
Backend
A backend plugin provides access to a storage system. Backends can be network-based
storage systems such as Redis, on-disk database systems such as SQLite, etc. Backend
plugins can define script-level records for configuring them when they're opened. Zeek
provides backends for Redis and SQLite by default, but others may be implemented as
external packages.
Serializer
A serializer plugin provides a mechanism for converting data from Zeek scripts into
formats that backends can use. Serializers are intended to be agnostic to
backends. They convert between Zeek values and opaque byte buffers, and backends
should be able to handle the result of any individual serializer. Zeek provides a JSON
serializer by default, but others may be implemented as external packages.
Asynchronous Mode vs Synchronous Mode
=====================================
Storage backends support both asynchronous and synchronous modes. The difference between
using the two modes is that asynchronous calls must be used as part of :zeek:see:`when`
statements, whereas synchronous calls can be used either with ``when`` statements or
called directly. Synchronous functions will block until the backend returns
data. Otherwise, all of the arguments and return values are the same between them. They
are split between two script-level modules: :zeek:see:`Storage::Async` loaded from
``base/frameworks/storage/async`` and :zeek:see:`Storage::Sync` loaded from
``base/frameworks/storage/sync``.
When reading pcap data via the ``-r`` Zeek argument, all backends operate in a synchronous
manner internally to ensure that Zeek's timers run correctly. Regardless of this behavior,
asynchronous functions are required to be used with the ``when`` statement, but they'll
essentially be translated to synchronous calls.
Using the Storage Framework
===========================
All of the examples below use the SQLite backend. Usage of other backends follows the same
model. Switching the examples to a different backend involves only using a different tag
and options record with the :zeek:see:`Storage::Async::open_backend`/
:zeek:see:`Storage::Sync::open_backend` functions.
Operation Return Values
-----------------------
All backend methods return a record of type :zeek:see:`Storage::OperationResult`. This
record contains a code that indicates the result of the operation. For failures, backends
may provide more details in the optional error message. The record will also contain data
for operations that return values, namely ``open_backend`` or ``get``.
:zeek:see:`Storage::ReturnCode` contains all of the codes that can be returned from the
various operations. Not all codes are valid for all operations.
:zeek:see:`Storage::ReturnCode` can be redefined by backends to add new backend-specific
statuscodes as needed.
.. _storage-opening-closing:
Opening and Closing a Backend
-----------------------------
Opening a backend starts with defining a set of options for that backend. The
:zeek:see:`Storage::BackendOptions` is defined with some fields by default, but loading a
policy for a specific backend type may add new fields to it. In the example below, we
loaded the SQLite policy, which adds a new ``sqlite`` field with additional options. These
options are filled in to denote where to store the sqlite database file and what table to
use. This allows users to separate different instances of a backend from each other in a
single database file.
The script then sets a serializer. The storage framework sets this to the JSON
(:zeek:see:`Storage::STORAGE_SERIALIZER_JSON`) serializer by default, but setting it
explicitly is included below as an example.
Calling :zeek:see:`Storage::Sync::open_backend` instantiates a backend connection. As
described above, ``open_backend`` returns a :zeek:see:`Storage::OperationResult`. On
success, it stores the handle to the backend in the ``value`` field of the result
record. We check the ``code`` field as well to make sure the operation succeeded. Backend
handles can be stored in global values just like any other value. They can be opened
during startup, such as in a :zeek:see:`zeek_init` event handler, and reused throughout
the runtime of Zeek. When a backend is successfully opened, a
:zeek:see:`Storage::backend_opened` event will be emitted.
The two type arguments to ``open_backend`` define the script-level types for keys and
values. Attempting to use other types with the backend results in
:zeek:see:`Storage::KEY_TYPE_MISMATCH` errors.
Lastly, we call :zeek:see:`Storage::Sync::close_backend` to close the backend before
exiting. When a backend is successfully closed, a :zeek:see:`Storage::backend_lost` event
will be emitted.
.. code-block:: zeek
@load base/frameworks/storage/sync
@load policy/frameworks/storage/backend/sqlite
local backend_opts: Storage::BackendOptions;
local backend: Storage::BackendHandle;
# Loading the sqlite policy adds this field to the options record.
opts$sqlite = [$database_path="test.sqlite", $table_name="testing"];
# This is the default, but is shown here for how to set it.
opts$serializer = Storage::STORAGE_SERIALIZER_JSON;
local res = Storage::Sync::open_backend(Storage::STORAGE_BACKEND_SQLITE, opts, string, string);
if ( res$code == Storage::SUCCESS )
backend = res$value;
res = Storage::Sync::close_backend(backend);
Storing, Retrieving, and Erasing Data
-------------------------------------
The true point of the storage framework is to store and retrieve data. This example shows
making synchronous calls to add a new key/value pair to a backend, retrieve it, and erase
the entry associated with the key. This assumes that the ``backend`` variable used below
points to an opened backend handle. The idea is that users do not need to worry about the
underlying backend implementation. In terms of Zeek's script-layer API, SQLite, Redis, or
other backends should behave identically.
First, we make a call to :zeek:see:`Storage::Sync::put`, passing a key and a value to be
stored. These must be of the same types that were passed in the arguments to
``open_backend``, as described in the :ref:`earlier section <storage-opening-closing>`.
The arguments passed into ``put`` are contained in a record of type
:zeek:see:`Storage::PutArgs`. See the documentation for that type for descriptions of the
fields available. In this case, we specify a key and a value plus an expiration time. This
expiration time indicates when the data should be automatically removed from the
backend. We check the result value, and print the error string and return if the operation
failed.
Next, we attempt to retrieve the same key from the backend. Assuming that the key hasn't
been erased, either manually or via expiration, the value is returned in the ``value``
field of the result record. If the key has been removed already, the backend should return
a :zeek:see:`Storage::KEY_NOT_FOUND` code.
Finally, we manually attempt to erase the key. This will remove the key/value pair from
the store, assuming that it hasn't already been removed manually or via expiration. Same
as with ``get``, :zeek:see:`Storage::KEY_NOT_FOUND` should be returned if the key doesn't
exist.
.. code-block:: zeek
local res = Storage::Sync::put(backend, [$key="abc", $value="def", $expire_time=45sec]);
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
res = Storage::Sync::get(backend, "abc");
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
res = Storage:Sync::erase(backend, "abc");
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
Events
======
Two events exist for the storage framework: :zeek:see:`Storage::backend_lost` and
:zeek:see:`Storage::backend_opened`. Both events were mentioned in the :ref:`example of
opening and closing a backend <storage-opening-closing>`, but an additional point needs to
be made about the :zeek:see:`Storage::backend_lost` event. This event is also raised when
a connection is lost unexpectedly. This gives users information about connection failures,
as well an opportunity to handle those failures by reconnecting.
Notes for Built-in Backends
===========================
Redis
-----
- The Redis backend requires the ``hiredis`` library to installed on the system in order
to build. At least version 1.1.0 (Released Nov 2022) is required.
- Redis server version 6.2.0 or later (or a third-party server implementing the equivalent
level of the Redis API) is required. This is due to some API features the backend uses
not being implemented until that version.
SQLite
------
- The default batch of pragmas in :zeek:see:`Storage::Backend::SQLite::Options` set
``journal_mode`` to ``WAL``. ``WAL`` mode does not work over network filesystems. If
this mode is used, the database file must be stored on the same computer as all of the
Zeek processes opening it. See the documentation in https://www.sqlite.org/wal.html for
more information.
- Usage of in-memory databases (i.e. passing ``:memory:`` as the database path) will
result in data not being synced between nodes. Each process will open its own database
within that process's memory space.