Calling collect_metrics() from a script would not invoke metric
callbacks, resulting in most of the process metrics to be zero
when a Zeek process isn't scraped via Prometheus.
Fixes#4309
This btest uses the exit() BIF to shut down, which immediately calls
::exit() and kills Zeek without doing any shutdown. This will sometimes
leave the thread running the storage manager, which causes TSan to
complain about a thread leak. Switch to use the terminate() BIF instead
which cleanly shuts down all of Zeek.
- New erase/overwrite tests
- Change existing sqlite-basic test to use async
- Test passing bad keys to validate backend type checking
- New test for compound keys and values
This commit renames the `service_violation` column that can be added via
a policy script to `failed_service`. This expresses the intent of it
better - the column contains services that failed and were removed after
confirmation.
Furthermore, the script is fixed so it actually does this - before it
would sometimes add services to the list that were not actually removed.
In the course of this, the type of the column was changed from a vector
to an ordered set.
Due to the column rename, the policy script itself is also renamed.
Also adds a NEWS entry for the DPD changes.
This introduces ian options, DPD::track_removed_services_in_connection.
It adds failed services to the services column, prefixed with a
"-".
Alternatively, this commit also adds
policy/protocols/conn/failed-services.zeek, which provides the same
information in a new column in conn.log.
This commit revamps the handling of analyzer violations that happen
before an analyzer confirms the protocol.
The current state is that an analyzer is disabled after 5 violations, if
it has not been confirmed. If it has been confirmed, it is disabled
after a single violation.
The reason for this is a historic mistake. In Zeek up to versions 1.5,
analyzers were unconditianally removed when they raised the first
protocol violation.
When this script was ported to the new layout for Zeek 2.0 in
b4b990cfb5, a logic error was introduced
that caused analyzers to no longer be disabled if they were not
confirmed.
This was the state for ~8 years, till the DPD::max_violations options
was added, which instates the current approach of disabling unconfirmed
analyzers after 5 violations. Sadly, there is not much discussion about
this change - from my hazy memory, I think this was discovered during
performance tests and the new behavior was added without checking into
the history of previous changes.
This commit reinstates the originally intended behavior of DPD. When an
analyzer that has not been confirmed raises a protocol violation, it is
immediately removed from the connection. This also makes a lot of sense
- this allows the analyzer to be in a "tasting" phase at the beginning
of the connection, and to error out quickly once it realizes that it was
attached to a connection not containing the desired protocol.
This change also removes the DPD::max_violations option, as it no longer
serves any purpose after this change. (In practice, the option remains
with an &deprecated warning, but it is no longer used for anything).
There are relatively minimal test-baseline changes due to this; they are
mostly triggered by the removal of the data structure and by less
analyzer errors being thrown, as unconfirmed analyzers are disabled
after the first error.
* origin/topic/johanna/sqlite-pragmas:
Options for SQLite log writer, eliminate duplicate definitions
Test synchronous/journal mode options for SQLite log writer
Added default options for synchronous and journal mode
Support for synchronous and journal_mode
* origin/topic/awelzel/pluggable-cluster-backends-part1:
btest: Test Broker::make_event() together with Cluster::publish_hrw()
btest: Add cluster dir, minimal test for enum value
broker: Add shim plugin adding a backend component
zeek-setup: Instantiate backend::manager
cluster: Add to src/CMakeLists.txt
cluster: Add Components and ComponentManager for new components
cluster/Backend: Interface for cluster backends
cluster/Serializer: Interface for event and log serializers
logging: Introduce logging/Types.h
SerialTypes/Field: Allow default construction and add move constructor
DebugLogger: Add cluster debugging stream
plugin: Add component enums for pluggable cluster backends
broker: Pass frame to MakeEvent()
@Sheco reported that standalone epoch processing may exclude scheduled
events when the final sumstat epoch runs before. For example, this easily
happens when attempting to do sumstat observations within connection_state_remove().
Delay final epoch processing to zeek_done() instead.
This doesn't deal with the clustered version - this would need something
more elaborate and potentially a mechanism to delay the shutdown of
other cluster nodes until/after sumstat processing completed.
This stops invoking Telemetry::sync() via a scheduled event and instead
only invokes it on-demand. This makes metric collection network time
independent and lazier, too.
With Prometheus scrape requests being processed on Zeek's main thread
now, we can safely invoke the script layer Telemetry::sync() hook.
Closes#3947
This isn't a straightforward fix, unfortunately. The existing GetLine()
implementation didn't deal well with input that's incrementally produced
where individually read chunks wouldn't end with the separator.
The prior implementation increased the buffer each time it failed to find
a separator in the current buffer, but then also ended up not searching the
full new buffer size for the terminator, doing that endlessly.
This change reworks the Raw reader to rely only on bufpos for reading
and searching purposes and skip reallocation if the buffer size if it
wasn't actually exhausted.
Closes#3957
It seems like other similar tests get by because they have more "stuff"
before they call `terminate()` most likely. But, to be safe, just
removing the "received termination signal" line seems like the best
approach.
Invalid lines in a file was the one case that would not suppress future
warnings. Just make it suppress warnings too, but clear that suppression
if there is a field in between that doesn't error.
Fixes#3692