This allows us to create an EnumType that groups all of the analyzer
tag values into a single type, while still having the existing types
that split them up. We can then use this for certain events that benefit
from taking all of the tag types at once.
We're using shadow files for log rotation on systems with ext4 running
Linux 4.19. We've observed zero-length shadow files in the logger's working
directory after a power-outage. This leads to a broken/stuck logger
process due to empty shadow files being considered invalid and the
process exiting:
error: failed to process leftover log 'conn.log.gz': Found leftover log, 'conn.log.gz', but the associated shadow file, '.shadow.conn.log.gz', required to process it is invalid
PR #1137 introduced atomic renaming of shadow files and was supposed to
handle this. However, after more investigation, the rename() has to be
preceded by an fsync() in order to avoid zero-length files in the presence
of hard-crashes or power-failures. This is generally operating system
and filesystem dependent, but should not hurt to add. The performance impact
can likely be neglected due to the low frequency and limited number of
log streams.
This has happened to others, too. Some references around this issue:
* https://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
* https://unix.stackexchange.com/questions/464382/which-filesystems-require-fsync-for-crash-safety-when-replacing-an-existing-fi
* https://bugzilla.kernel.org/show_bug.cgi?id=15910
Reproducer
This issue was reproduced artificially on Linux using the sysrq-trigger
functionality to hard-reset the system shortly after a .shadow file was
renamed to it's final destination with the following script watching for
.shadow.conn.log.gz:
#!/bin/bash
set -eu
dir=/data/logger-01/
# Allow everything via /proc/sysrq-trigger
echo "1" > /proc/sys/kernel/sysrq
inotifywait -m -e MOVED_TO --format '%e %w%f' "${dir}" | while read -r line; do
if echo "${line}" | grep -q '^MOVED_TO .*/.shadow.conn.log.gz$'; then
echo "RESET: $line"
sleep 4
# Trigger a hard-reset without sync/unmount
echo "b" > /proc/sysrq-trigger
fi
done
This quite reliably (4 out of 4 times) yielded a system with zero-length
shadow files and a broken logger after it came back online:
$ ls -lha /data/logger-01/.shadow.*
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.conn.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.dns.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.files.log.gz
After this change while running the reproducer, the shadow files always
contained content after a hard-reset.
Rework with util::safe_fsync helper
This functionality previously lived in the CompHash class, with one difference:
this removes a discrepancy between the offset aligner and the memory pointer
aligner/padder. The size aligner used to align the provided offset and then add an
additional alignment size (for example, 1 aligned to 4 wouldn't yield 4 but 8).
Like the memory aligners it now only rounds up as needed.
Includes unit tests.
This adds a new function for validating UTF-8 sequences by converting to
UTF-32. This allows us to also check for various blocks of codepointsi
that we consider invalid while checking for valid sequences in general.
* origin/topic/vern/zval: (42 commits)
whitespace tweaks
resolved some TODO comments
remove unnecessary casts, and change necessary ones to use static_cast<>
explain cmp_func default
change functions for ZVal type management to static members
fix some unsigned/signed integer warnings
address lint concern about uninitialized variable
Remove use of obsolete forward-declaration macros
fix #include's that lack zeek/ prefixes
explicitly populate holes created in vectors
fixes for now-incorrect assumption that GetField always returns an existing ValPtr
memory management for assignment to vector elements
memory management for assignment to record fields
destructor cleanup from ZAM_vector/ZAM_record
fix #include's that lack zeek/ prefixes
overlooked another way in which vector holes can be created
initialize vector holes to the correct corresponding type
explicitly populate holes created in vectors
fix other instances of GetField().get() assuming long-lived ValPtr's
fix for now-incorrect assumption that GetField always returns an existing ValPtr
...
- Move all of the time handling code out of PktSrc into RunState
- Call packet_mgr->ProcessPacket() from various places to setup layer 2 data in packets
- Did a few whitespace re-adjustments during merge
* origin/topic/timw/266-namespaces-part5:
Update plugin btests for namespace changes
Plugins: Clean up explicit uses of namespaces in places where they're not necessary.
Base: Clean up explicit uses of namespaces in places where they're not necessary.
In deterministic mode, RAND_MAX is not related to the result of
zeek::random_number() (formerly bro_random()), but some logic was
using RAND_MAX as indication of the possible range of values. The
new zeek::max_random() will give the correct upper-bound regardless
of whether deterministic-mode is used.
The type used for storing the state of the RNG is changed from
`unsigned int` to `long int` since the former has a minimal range
of [0, 65,535] while the RNG function itself has a range of
[1, 2147483646]. A `long int` must be capable of
[−2147483647, +2147483647] and is also the return type of `random()`,
which is what zeek::prng() aims to roughly parity.