* origin/topic/jsiwek/fuzzing:
Use zeek::detail namespace for fuzzer utils
Set terminating flag during fuzzer cleanup
Add missing include to standalone fuzzer driver
Improve standalone fuzzer driver error messages
Test fuzzers against seed corpus under CI ASan build
Update fuzzing README with OSS-Fuzz integration notes
Link fuzzers against shared library to reduce executable sizes
Improve FuzzBuffer chunking
Fix compiler warning in standalone fuzzer driver
Adjust minor fuzzing documentation
Exit immediately after running unit tests
Add OSS-Fuzz Zeek script search path to fuzzers
Assume libFuzzer when LIB_FUZZING_ENGINE file doesn't exist
Change handling of LIB_FUZZING_ENGINE
Change --enable-fuzzing to --enable-fuzzers
Add standalone driver for fuzz targets
Add basic structure for fuzzing targets
There's now a couple placeholder/unimplemented status values in Broker
related to upcoming routing features that we don't want to handle
explicitly for compatibility reasons, but also don't want the compiler
warning about unhandled values in the switch.
General changes:
* Add -D/--deterministic command line option as
convenience/alternative to -G/--load-seeds (i.e. no file needed, it just
uses zero-initialized random seeds). It also changes Broker data
stores over to using deterministic timing rather than real time.
* Add option to make Reporter abort on runtime scripting errors
* origin/topic/jsiwek/32bit-compat:
Improve formatting of doubles that are close to integers
Improve HTTP version number comparisons
Add a 32-bit task to Cirrus CI config
Replace va_list fmt() overload with vfmt()
Format tables indexed by patterns consistently across 32-bit/64-bit
Format interval values consistently across 32-bit/64-bit platforms
Using an overload that takes a va_list argument potentially causes
accidental misuse on platforms (e.g. 32-bit) where va_list is
implemented as a type that may collide with commonly-used argument
types.
For example:
char* c = copy_string("hi");
fmt("%s", (const char*)c);
fmt("%s", c);
The first fmt() call correctly goes through fmt(const char*, ...) first,
but the second mistakenly goes through fmt(const char*, va_list) first
because variadic function overloads have lower priority during overload
resolution and va_list on a 32-bit system happens to be defined as a
pointer type that can match with "char*" but not "const char*".
Logs that got sent sparsely or burstily would get buffered for long
periods of time since the logic to flush them only does so on the next
log write. In the worst case, a subsequent log write could never happen
and cause a log entry to be indefinitely buffered.
This fix introduces a recurring event/timer to simply flush all pending
logs at frequency of Broker::log_batch_interval.
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
Manual memory management via Ref/Unref is verbose and prone to error. An
intrusive smart pointer automates the reference counting, makes code
more robust (in particular w.r.t. to exceptions) and reduces boilerplate
code. A big benefit of the intrusive smart pointers for Zeek is that
they can co-exist with the manual memory management. Rather than having
to port the entire code base at once, we can migrate components
one-by-one. In this first step, we add the new template
`IntrusivePtr<T>` and start using it in the Broker Manager. This makes
the previous `unref_guard` obsolete.
* origin/topic/timw/deprecate-int-types:
Deprecate the internal int/uint types in favor of the cstdint types they were based on
Merge adjustments:
* A bpf type mistakenly got replaced (inside an unlikely #ifdef)
* Did a few substitutions that got missed (likely due to
pre-processing out of DEBUG macros)
* origin/topic/timw/easy-pdict-replacements:
Cleanups related to PDict -> std::map replacements
Remove other simple uses of PDict
Protocols: Remove uses of PDict
g_dbgfilemaps: Remove uses of PDict
Scope: remove uses of PDict
DFA: remove uses of PDict
EventRegistry: remove uses of PDict
anonymous-functions, their closures, can now be sent over broker.
In order to send an anonymous function the receiver must have parsed
a definition of the functon, but it need not to have been evaluated.
See testing/btest/language/closure-sending.zeek for an example of how
this can be done.
This also sends their closures as well as the closures of regular
functions.
It may generally be better for our default use-case, as workers may
save a few percent cpu utilization as this policy does not have to
use any polling like the stealing policy does.
This also helps avoid a potential issue with the implementation of
spinlocks used in the work-stealing policy in current CAF versions,
where there's some conditions where lock contention causes a thread
to spin for long periods without relinquishing the cpu to others.
For broker.log and cluster.log: there was a race condition. A worker's
first IOSource that it processes is potentially Broker if there were
no packets available yet and thread scheduling happens to work out
such that network connections (inside CAF threads) become established
before we enter the main I/O loop. Such peering establishments would
generate logs with timestamp 0 as there was not yet any code path
taken that would update network_time.
For reporter.log: any non-worker (packet-processing) node would just
unnecessarily use a timestamp of 0 for their reporter messages.
* origin/topic/jsiwek/broker-store-process-n:
Improve Broker I/O loop integration: less mutex locking
Improve processing of broker data store responses
For backward compatibility when reading values, we first check
the ZEEK-prefixed value, and if not set, then check the corresponding
BRO-prefixed value.
Checking a subscriber for available messages required locking a mutex,
but we should never actually need to do that in the main-loop to check
for Broker readiness since we can rely on file descriptor polling.
Note - this compiles, but you cannot run Bro anymore - it crashes
immediately with a 0-pointer access. The reason behind it is that the
required clone functionality does not work anymore.