When a shadow file is empty/missing during rotation, Zeek aborts
with an error message, but if the shadow file was empty, it'll still
be there after the restart, causing an endless restart loop. This
solution gracefully handles the rotation in such cases using the
default file extension and post processing function.
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
Removes a bit of reliance around the magic DoLog() rendering at the
cost of needing to open-code some of it. The new obj_desc_short()
helper makes that acceptable, though.
Also modify FormatRotationPath to keep rotated logs within
Log::default_logdir unless the rotation function explicitly
set dir, e.g. by when the user redef'ed default_rotation_interval.
With the introduction of LogAscii::logdir, log filenames can now include
parent directories rather than being plain basenames. Enabling log rotation,
leftover log rotation and setting LogAscii::logdir broke due to not
handling this situation.
This change ensures that .shadow files are placed within the directory where
the respective .log file is created. Previously, the .shadow. (or .tmp.shadow.)
prefix was simply prepended, yielding non-sensical paths such as
.tmp.shadow.foo/bar/packet_filter.log for a logdir of foo/bar.
Additionally, respect LogAscii::logdir when searching for leftover log files
rather than defaulting to the current working directory.
The following quirk exist around LogAscii::logdir, but will be addressed
in a follow-up.
* By default, logs are currently rotated into the working directory of the
process, rather than staying confined within LogAscii::logdir. One of
the added tests shows this behavior.
We're using shadow files for log rotation on systems with ext4 running
Linux 4.19. We've observed zero-length shadow files in the logger's working
directory after a power-outage. This leads to a broken/stuck logger
process due to empty shadow files being considered invalid and the
process exiting:
error: failed to process leftover log 'conn.log.gz': Found leftover log, 'conn.log.gz', but the associated shadow file, '.shadow.conn.log.gz', required to process it is invalid
PR #1137 introduced atomic renaming of shadow files and was supposed to
handle this. However, after more investigation, the rename() has to be
preceded by an fsync() in order to avoid zero-length files in the presence
of hard-crashes or power-failures. This is generally operating system
and filesystem dependent, but should not hurt to add. The performance impact
can likely be neglected due to the low frequency and limited number of
log streams.
This has happened to others, too. Some references around this issue:
* https://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
* https://unix.stackexchange.com/questions/464382/which-filesystems-require-fsync-for-crash-safety-when-replacing-an-existing-fi
* https://bugzilla.kernel.org/show_bug.cgi?id=15910
Reproducer
This issue was reproduced artificially on Linux using the sysrq-trigger
functionality to hard-reset the system shortly after a .shadow file was
renamed to it's final destination with the following script watching for
.shadow.conn.log.gz:
#!/bin/bash
set -eu
dir=/data/logger-01/
# Allow everything via /proc/sysrq-trigger
echo "1" > /proc/sys/kernel/sysrq
inotifywait -m -e MOVED_TO --format '%e %w%f' "${dir}" | while read -r line; do
if echo "${line}" | grep -q '^MOVED_TO .*/.shadow.conn.log.gz$'; then
echo "RESET: $line"
sleep 4
# Trigger a hard-reset without sync/unmount
echo "b" > /proc/sysrq-trigger
fi
done
This quite reliably (4 out of 4 times) yielded a system with zero-length
shadow files and a broken logger after it came back online:
$ ls -lha /data/logger-01/.shadow.*
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.conn.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.dns.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.files.log.gz
After this change while running the reproducer, the shadow files always
contained content after a hard-reset.
Rework with util::safe_fsync helper
* 'logging/script-logdir' of https://github.com/kramse/zeek:
Copy of ascii-empty test, just changed path in the beginning
Logdir: Change requested by 0xxon, no problem
Introduce script-land variable that can be used to set logdir.
Closes GH-772
This enables locating the headers within the install-tree using the
dirs provided by `zeek-config --include_dir`.
To enable locating these headers within the build-tree, this change also
creates a 'build/src/include/zeek -> ..' symlink.
A logger process being terminated/killed while in the process of creating
a new .shadow file may leave an empty (invalid) one around. This in turn
causes the logger to error and exit during startup.
$ $ ls -lha .shadow.*
-rw-r--r-- 1 root root 0 Dec 16 18:48 .shadow.dns.log
-rw-r--r-- 1 root root 0 Dec 16 18:48 .shadow.packet_filter.log
$ zeek LogAscii::enable_leftover_log_rotation=T Log::default_rotation_interval=30sec -i wlp0s20f3
error in <params>, line 1: failed to process leftover log 'dns.log': Found leftover log, 'dns.log', but the associated shadow file, '.shadow.dns.log', required to process it is invalid
error in <params>, line 1: failed to process leftover log 'packet_filter.log': Found leftover log, 'packet_filter.log', but the associated shadow file, '.shadow.packet_filter.log', required to process it is invalid
$ ...
Prevent creating invalid .shadow files by atomically creating them.
We have a use case to rotate leftover log files in a non-supervisor
setup. There doesn't seem to be a strict requirement on supervisor
functionality. Allow enabling leftover log rotation through
LogAscii::enable_leftover_log_rotation and redef this for the
logger node in a supervisor setup individually.
These may be redefined to customize log rotation path prefixes,
including use of a directory. File extensions are still up to
individual log writers to add themselves during the actual rotation.
These new also allow for some simplication to the default
ASCII postprocessor function: it eliminates the need for it doing an
extra/awkward rename() operation that only changes the timestamp format.
This also teaches the supervisor framework to use these new options
to rotate ascii logs into a log-queue/ directory with a specific
file name format (intended for an external archiver process to
monitor separately).
This helps prevent a node from being killed/crashing in the middle
of writing a log, restarting, and eventually clobbering that log
file that never underwent the rotation/archival process.
The old `archive-log` and `post-terminate` scripts as used by
ZeekControl previously implemented this behavior, but the new logic is
entirely in the ASCII writer. It uses ".shadow" log files stored
alongside the real log to help detect such scenarios and rotate them
correctly upon the next startup of the Zeek process.
* origin/topic/timw/776-using-statements:
Remove 'using namespace std' from SerialTypes.h
Remove other using statements from headers
GH-776: Remove using statements added by PR 770
Includes small fixes in files that changed since the merge request was
made.
Also includes a few small indentation fixes.
This unfortunately cuases a ton of flow-down changes because a lot of other
code was depending on that definition existing. This has a fairly large chance
to break builds of external plugins, considering how many internal ones it broke.