When multiple loggers are configured in a Supervisor controlled cluster
configuration, encode extra information into the rotated filename to
identify which logger produced the log.
This is similar to the approach taken for ZeekControl, re-using the
log_suffix terminology, but as there's only a single zeek-archiver
process and no postprocessors and no other side-channel for additional
information, we encode extra metadata into the filename. zeek-archiver
is extended to recognize the special metadata part of the filename.
This also solves the issue that multiple loggers in a supervisor setup
overwrite each others log files within a single log-queue directory.
This is similar to what the external corelight/zeek-smb-clear-state script
does, but leverages the smb2_discarded_messages_state() event instead of
regularly checking on the state of SMB connections.
The pcap was created using the dperson/samba container image and mounting
a share with Linux's CIFS filesystem, then copying the content of a
directory with 100 files. The test uses a BPF filter to imitate mostly
"half-duplex" traffic.
* origin/topic/awelzel/zeekctl-multiple-loggers:
NEWS: Add entry for ZeekControl and multi-loggers
Bump zeekctl to multi-logger version
logging: Support rotation_postprocessor_command_env
This set contains the topics to reach all cluster nodes. Due to broker's
forwarding mechanism, we cannot define a single broadcast topic, as it
would create routing loops.
This new table provides a mechanism to add environment variables to the
postprocessor execution. Use case is from ZeekControl to inject a suffix
to be used when running with multiple logger.
While working on a rotation format function, ran into Zeek crashing
when not returning a value from it, fix and recover the same way as
for scripting errors.
An invalid mail transaction is determined as
* RCPT TO command without a preceding MAIL FROM
* a DATA command without a preceding RCPT TO
and logged as a weird.
The testing pcap for invalid mail transactions was produced with a Python
script against a local exim4 configured to accept more errors and unknown
commands than 3 by default:
# exim4.conf.template
smtp_max_synprot_errors = 100
smtp_max_unknown_commands = 100
See also: https://www.rfc-editor.org/rfc/rfc5321#section-3.3
The user and password fields are replicated to each of the ftp.log
entries. Using a very large username (100s of KBs) allows to bloat
the log without actually sending much traffic. Further, limit the
arg and reply_msg columns to large, but not unbounded values.
When an analyzer calls DataIn(), there's a costly callback construct
going through the event queue. If an analyzer does not have a
get_file_handle() handler installed, the produced file_id would
end up empty and ignored. Consequently, the get_file_handle() callback
was invoked for every new DataIn() invocations.
This is surprising and costly. Log a warning when this happens and
instead set a generically generated file handle value instead to
prevent the repeated get_file_handle() invocations.
Add configurability of synchronous and journal_mode for SQLite backed
Broker data stores. Setting these to synchronous=normal and journal_mode=wal
can significantly improve throughput at the cost of some durability in
the presence of power loss or OS crash. In the context of Zeek, this is
likely more than acceptable.
Additionally, add integrity_check and failure_mode options to support deleting
and re-opening a corrupted SQLite database at store creation.
Closes#2698
* origin/topic/awelzel/analyzer-log:
btest/net-control: Use different expiration times for rules
analyzer: Add analyzer.log for logging violations/confirmations
By default this only logs all the violations, regardless of the
confirmation state (for which there's still dpd.log). It includes
packet, protocol and file analyzers.
This uses options, change handlers and event groups for toggling
the functionality at runtime.
Closes#2031
In certain deployment scenarios, all analyzers are disabled by default.
However, conditionally/optionally loaded scripts may rely on analyzers
functioning and declare a request for them.
Add a global set set to the Analyzer module where external scripts can record
their requirement/request for a certain analyzer. Analyzers found in this
set are enabled at zeek_init() time.
This commit adds an optional event_groups field to the Logging::Stream record
to associated event groups with logging streams.
This can be used to disable all event groups of a logging stream when it is
disabled. It does require making an explicit connection between the
logging stream and the involved groups, however.
When a fa_file object is created through the use of Input::add_analysis(),
the fa_file's source is likely not valid representation of an analyzer's
tag and a Files::describe() should not error and instead return an empty
description.
Add a new Analyzer::is_tag() helper that can be used to pre-check `f$source`.
* When a file is transferred over multiple connection, have
create_file_info() just pick the first one instead of none.
* Do not unconditionally assume cid and cuid as set on a
Notice::FileInfo object.
oss-fuzz produced FTP traffic with a ~550KB long FTP command. Cap FTP command
length at 100 bytes, log a weird if a command is larger than that and move
on to the next. Likely it's not actual FTP traffic, but raising an
analyzer violation would allow clients an easy way to disable the analyzer
by sending an overly long command.
The added test PCAP was generated using a fake Python socket server/client.
This allows to enable/disable file analyzers through the same interfaces
as packet and protocol analyzers, specifically Analyzer::disable_analyzer
could be interesting.
This adds machinery to the packet_analysis manager for disabling
and enabling packet analyzers and implements two low-level bifs
to use it.
Extend Analyzer::enable_analyzer() and Analyzer::disable_analyzer()
to transparently work with packet analyzers, too. This also allows
to add packet analyzers to Analyzer::disabled_analyzers.
Introduce two new events for analyzer confirmation and analyzer violation
reporting. The current analyzer_confirmation and analyzer_violation
events assume connection objects and analyzer ids are available which
is not always the case. We're already passing aid=0 for packet analyzers
and there's not currently a way to report violations from file analyzers
using analyzer_violation, for example.
These new events use an extensible Info record approach so that additional
(optional) information can be added later without changing the signature.
It would allow for per analyzer extensions to the info records to pass
analyzer specific info to script land. It's not clear that this would be
a good idea, however.
The previous analyzer_confirmation and analyzer_violation events
continue to exist, but are deprecated and will be removed with Zeek 6.1.
...the only known cases where the `-` for `connection$service` was
handled is to skip/ignore these analyzers.
Slight suspicion that join_string_set() should maybe become a bif
now determine_service() runs once for each connection.
Closes#2388
* origin/topic/awelzel/dpd-analyzer-merger:
analyzer/dpd: Address review comments
Remove @load base/frameworks/dpd from tests
frameworks/dpd: Move to frameworks/analyzer/dpd, load by default
scripts/dce-rpc,ntlm: Do not load base/frameworks/dpd
btest: Remove unnecessary loading of frameworks/dpd
In supervised nodes, the Supervisor's NodeConfig$scripts vector adds scripts to
the end of the user-provided scripts (options.scripts_to_load), so they load
_after_ any user-provided ones. This can cause confusing redef pitfalls when
users expect their customizations to run last, as they normally do.
This adds two members in Supervisor::NodeConfig, `addl_base_scripts` and
`addl_user_scripts`, to store scripts to load before and after the user scripts,
respectively. The latter serves the same purpose as the old `scripts` member,
which is still there but deprecated (in scriptland only). It functions as
before, after any scripts added via `addl_user_scripts`.
* Because frameworks/analyzer is loaded via init-frameworks-and-bifs the
dpd functionality (really just dpd.log and disabling of analyzers) is
now enabled even in bare mode.
* Not sure we need to keep frameworks/base/dpd/__load__.zeek around
or can just remove it right away.
This is a script-only change that unrolls File::Info records into
multiple files.log entries if the same file was seen over different
connections by single worker. Consequently, the File::Info record
gets the commonly used uid and id fields added. These fields are
optional for File::Info - a file may be analyzed without relation
to a network connection (e.g by using Input::add_analysis()).
The existing tx_hosts, rx_hosts and conn_uids fields of Files::Info
are not meaningful after this change and removed by default. Therefore,
files.log will have them removed, too.
The tx_hosts, rx_hosts and conn_uids fields can be revived by using the
policy script frameworks/files/deprecated-txhosts-rxhosts-connuids.zeek
included in the distribution. However, with v6.1 this script will be
removed.