This adds https://github.com/gulrak/filesystem as a submodule into auxil
as a compiler-independent std::filesystem replacement.
The ghc::filesystem namespace is exposed as zeek::filesystem in util.h.
In the build directory, we add 3rdparty/ghc as a symlink to auxil in
order to support building from the build tree.
<build_dir>/src/3rdparty/ghc -> /path/to/zeek/src/auxil/filesystem/include/ghc
In the installation tree, the headers are installed into include/zeek/3rdparty:
<install_dir>/include/zeek/3rdparty/ghc
Note, this differs from how we approached rapidjson which isn't included
using a zeek/3rdparty and instead requires an additional include path of
the following form for external plugins to find and use it.
<install_dir>/include/zeek/3rdparty/rapidjson/include/
We diverge from this approach. Placing ghc directly into 3rdparty appears
nicer and avoids changing external components (DynamicPlugin.cmake / spicyc)
This allows us to create an EnumType that groups all of the analyzer
tag values into a single type, while still having the existing types
that split them up. We can then use this for certain events that benefit
from taking all of the tag types at once.
We're using shadow files for log rotation on systems with ext4 running
Linux 4.19. We've observed zero-length shadow files in the logger's working
directory after a power-outage. This leads to a broken/stuck logger
process due to empty shadow files being considered invalid and the
process exiting:
error: failed to process leftover log 'conn.log.gz': Found leftover log, 'conn.log.gz', but the associated shadow file, '.shadow.conn.log.gz', required to process it is invalid
PR #1137 introduced atomic renaming of shadow files and was supposed to
handle this. However, after more investigation, the rename() has to be
preceded by an fsync() in order to avoid zero-length files in the presence
of hard-crashes or power-failures. This is generally operating system
and filesystem dependent, but should not hurt to add. The performance impact
can likely be neglected due to the low frequency and limited number of
log streams.
This has happened to others, too. Some references around this issue:
* https://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
* https://unix.stackexchange.com/questions/464382/which-filesystems-require-fsync-for-crash-safety-when-replacing-an-existing-fi
* https://bugzilla.kernel.org/show_bug.cgi?id=15910
Reproducer
This issue was reproduced artificially on Linux using the sysrq-trigger
functionality to hard-reset the system shortly after a .shadow file was
renamed to it's final destination with the following script watching for
.shadow.conn.log.gz:
#!/bin/bash
set -eu
dir=/data/logger-01/
# Allow everything via /proc/sysrq-trigger
echo "1" > /proc/sys/kernel/sysrq
inotifywait -m -e MOVED_TO --format '%e %w%f' "${dir}" | while read -r line; do
if echo "${line}" | grep -q '^MOVED_TO .*/.shadow.conn.log.gz$'; then
echo "RESET: $line"
sleep 4
# Trigger a hard-reset without sync/unmount
echo "b" > /proc/sysrq-trigger
fi
done
This quite reliably (4 out of 4 times) yielded a system with zero-length
shadow files and a broken logger after it came back online:
$ ls -lha /data/logger-01/.shadow.*
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.conn.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.dns.log.gz
-rw-r--r-- 1 bro bro 0 Oct 14 02:26 .shadow.files.log.gz
After this change while running the reproducer, the shadow files always
contained content after a hard-reset.
Rework with util::safe_fsync helper
This functionality previously lived in the CompHash class, with one difference:
this removes a discrepancy between the offset aligner and the memory pointer
aligner/padder. The size aligner used to align the provided offset and then add an
additional alignment size (for example, 1 aligned to 4 wouldn't yield 4 but 8).
Like the memory aligners it now only rounds up as needed.
Includes unit tests.
In deterministic mode, RAND_MAX is not related to the result of
zeek::random_number() (formerly bro_random()), but some logic was
using RAND_MAX as indication of the possible range of values. The
new zeek::max_random() will give the correct upper-bound regardless
of whether deterministic-mode is used.
The type used for storing the state of the RNG is changed from
`unsigned int` to `long int` since the former has a minimal range
of [0, 65,535] while the RNG function itself has a range of
[1, 2147483646]. A `long int` must be capable of
[−2147483647, +2147483647] and is also the return type of `random()`,
which is what zeek::prng() aims to roughly parity.
Merge adjustments:
- Preserved original `base_type_no_ref` argument type as ::TypeTag
- Removed superfluous #pragma guard around deprecated TableVal ctor
- Clarify NEWS regarding MetaHook{Pre,Post} deprecations
- Simplify some `::zeek::` qualifications to just `zeek::`
- Prefixed FORWARD_DECLARE_NAMESPACED macro with ZEEK_
* origin/topic/timw/266-namespaces:
Disable some deprecation diagnostics for GCC
Rename BroType to Type
Update NEWS
Review cleanup
Move Type types to zeek namespace
Move Flare/Pipe from the bro namespace to zeek::detail
Move Attr to the zeek::detail namespace
Move Trigger into the zeek::detail namespace
Move ID to the zeek::detail namespace
Move Anon.h into zeek::detail namespace
Mark all of the aliased classes in plugin/Plugin.h deprecated, and fix all of the plugins that were using them
Move all of the base plugin classes into the zeek::plugin namespace
Expr: move all classes into zeek::detail
Stmt: move Stmt classes into zeek::detail namespace
Add utility macro for creating namespaced aliases for classes
* origin/master:
Use zeek::detail namespace for fuzzer utils
Set terminating flag during fuzzer cleanup
Add missing include to standalone fuzzer driver
Improve standalone fuzzer driver error messages
Test fuzzers against seed corpus under CI ASan build
Update fuzzing README with OSS-Fuzz integration notes
Link fuzzers against shared library to reduce executable sizes
Improve FuzzBuffer chunking
Fix compiler warning in standalone fuzzer driver
Adjust minor fuzzing documentation
Exit immediately after running unit tests
Add OSS-Fuzz Zeek script search path to fuzzers
Assume libFuzzer when LIB_FUZZING_ENGINE file doesn't exist
Change handling of LIB_FUZZING_ENGINE
Change --enable-fuzzing to --enable-fuzzers
Add standalone driver for fuzz targets
Add basic structure for fuzzing targets
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.
It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.
There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.