This commit renames the `service_violation` column that can be added via
a policy script to `failed_service`. This expresses the intent of it
better - the column contains services that failed and were removed after
confirmation.
Furthermore, the script is fixed so it actually does this - before it
would sometimes add services to the list that were not actually removed.
In the course of this, the type of the column was changed from a vector
to an ordered set.
Due to the column rename, the policy script itself is also renamed.
Also adds a NEWS entry for the DPD changes.
This introduces ian options, DPD::track_removed_services_in_connection.
It adds failed services to the services column, prefixed with a
"-".
Alternatively, this commit also adds
policy/protocols/conn/failed-services.zeek, which provides the same
information in a new column in conn.log.
This commit revamps the handling of analyzer violations that happen
before an analyzer confirms the protocol.
The current state is that an analyzer is disabled after 5 violations, if
it has not been confirmed. If it has been confirmed, it is disabled
after a single violation.
The reason for this is a historic mistake. In Zeek up to versions 1.5,
analyzers were unconditianally removed when they raised the first
protocol violation.
When this script was ported to the new layout for Zeek 2.0 in
b4b990cfb5, a logic error was introduced
that caused analyzers to no longer be disabled if they were not
confirmed.
This was the state for ~8 years, till the DPD::max_violations options
was added, which instates the current approach of disabling unconfirmed
analyzers after 5 violations. Sadly, there is not much discussion about
this change - from my hazy memory, I think this was discovered during
performance tests and the new behavior was added without checking into
the history of previous changes.
This commit reinstates the originally intended behavior of DPD. When an
analyzer that has not been confirmed raises a protocol violation, it is
immediately removed from the connection. This also makes a lot of sense
- this allows the analyzer to be in a "tasting" phase at the beginning
of the connection, and to error out quickly once it realizes that it was
attached to a connection not containing the desired protocol.
This change also removes the DPD::max_violations option, as it no longer
serves any purpose after this change. (In practice, the option remains
with an &deprecated warning, but it is no longer used for anything).
There are relatively minimal test-baseline changes due to this; they are
mostly triggered by the removal of the data structure and by less
analyzer errors being thrown, as unconfirmed analyzers are disabled
after the first error.
* origin/topic/johanna/sqlite-pragmas:
Options for SQLite log writer, eliminate duplicate definitions
Test synchronous/journal mode options for SQLite log writer
Added default options for synchronous and journal mode
Support for synchronous and journal_mode
* origin/topic/awelzel/pluggable-cluster-backends-part1:
btest: Test Broker::make_event() together with Cluster::publish_hrw()
btest: Add cluster dir, minimal test for enum value
broker: Add shim plugin adding a backend component
zeek-setup: Instantiate backend::manager
cluster: Add to src/CMakeLists.txt
cluster: Add Components and ComponentManager for new components
cluster/Backend: Interface for cluster backends
cluster/Serializer: Interface for event and log serializers
logging: Introduce logging/Types.h
SerialTypes/Field: Allow default construction and add move constructor
DebugLogger: Add cluster debugging stream
plugin: Add component enums for pluggable cluster backends
broker: Pass frame to MakeEvent()
@Sheco reported that standalone epoch processing may exclude scheduled
events when the final sumstat epoch runs before. For example, this easily
happens when attempting to do sumstat observations within connection_state_remove().
Delay final epoch processing to zeek_done() instead.
This doesn't deal with the clustered version - this would need something
more elaborate and potentially a mechanism to delay the shutdown of
other cluster nodes until/after sumstat processing completed.
This stops invoking Telemetry::sync() via a scheduled event and instead
only invokes it on-demand. This makes metric collection network time
independent and lazier, too.
With Prometheus scrape requests being processed on Zeek's main thread
now, we can safely invoke the script layer Telemetry::sync() hook.
Closes#3947
This isn't a straightforward fix, unfortunately. The existing GetLine()
implementation didn't deal well with input that's incrementally produced
where individually read chunks wouldn't end with the separator.
The prior implementation increased the buffer each time it failed to find
a separator in the current buffer, but then also ended up not searching the
full new buffer size for the terminator, doing that endlessly.
This change reworks the Raw reader to rely only on bufpos for reading
and searching purposes and skip reallocation if the buffer size if it
wasn't actually exhausted.
Closes#3957
It seems like other similar tests get by because they have more "stuff"
before they call `terminate()` most likely. But, to be safe, just
removing the "received termination signal" line seems like the best
approach.
Invalid lines in a file was the one case that would not suppress future
warnings. Just make it suppress warnings too, but clear that suppression
if there is a field in between that doesn't error.
Fixes#3692
* topic/christian/broker-prometheus-cpp:
Update the scripts.base.frameworks.telemetry.internal-metrics test
Revert "Temporarily disable the scripts/base/frameworks/telemetry/internal-metrics btest"
Bump Broker to pull in new Prometheus support and pass in Zeek's registry
This now uses different record fields, and for now we no longer have CAF
telemetry. We indicate we're running under test to get reliable ordering in the
baselined output.
rule_added_policy allows the modification of rules just after they have
been added. This allows the implementation of some more complex features
- like changing rule states depending on insertion in other plugins.
This introduces a new hook into the Intel::seen() function that allows
users to directly interact with the result of a find() call via external
scripts.
This should solve the use-case brought up by @chrisanag1985 in
discussion #3256: Recording and acting on "no intel match found".
@Canon88 was recently asking on Slack about enabling HTTP logging for a
given connection only when an Intel match occurred and found that the
Intel::match() event would only occur on the manager. The
Intel::match_remote() event might be a workaround, but possibly running a
bit too late and also it's just an internal "detail" event that might not
be stable.
Another internal use case revolved around enabling packet recording
based on Intel matches which necessarily needs to happen on the worker
where the match happened. The proposed workaround is similar to the above
using Intel::match_remote().
This hook also provides an opportunity to rate-limit heavy hitter intel
items locally on the worker nodes, or even replacing the event approach
currently used with a customized approach.
With a bit of tweaking in the JavaScript plugin to support opaque types, this
will allow the delay functionality to work there, too.
Making the LogDelayToken an actual opaque seems reasonable, too. It's not
supposed to be user inspected.
This is a verbose, opinionated and fairly restrictive version of the log delay idea.
Main drivers are explicitly, foot-gun-avoidance and implementation simplicity.
Calling the new Log::delay() function is only allowed within the execution
of a Log::log_stream_policy() hook for the currently active log write.
Conceptually, the delay is placed between the execution of the global stream
policy hook and the individual filter policy hooks. A post delay callback
can be registered with every Log::delay() invocation. Post delay callbacks
can (1) modify a log record as they see fit, (2) veto the forwarding of the
log record to the log filters and (3) extend the delay duration by calling
Log::delay() again. The last point allows to delay a record by an indefinite
amount of time, rather than a fixed maximum amount. This should be rare and
is therefore explicit.
Log::delay() increases an internal reference count and returns an opaque
token value to be passed to Log::delay_finish() to release a delay reference.
Once all references are released, the record is forwarded to all filters
attached to a stream when the delay completes.
This functionality separates Log::log_stream_policy() and individual filter
policy hooks. One consequence is that a common use-case of filter policy hooks,
removing unproductive log records, may run after a record was delayed. Users
can lift their filtering logic to the stream level (or replicate the condition
before the delay decision). The main motivation here is that deciding on a
stream-level delay in per-filter hooks is too late. Attaching multiple filters
to a stream can additionally result in hard to understand behavior.
On the flip side, filter policy hooks are guaranteed to run after the delay
and can be used for further mangling or filtering of a delayed record.
This field isn't required by a worker and it's certainly not used by a
worker to listen on that specific interface. It also isn't required to
be set consistently and its use in-tree limited to the old load-balancing
script.
There's a bif called packet_source() which on a worker will provide
information about the actually used packet source.
Relates to zeek/zeek#2877.
Unsure what it's used for today and also results in the situation that on
some platforms we generate a reporter.log in bare mode, while on others
where spicy is disabled, we do not.
If we want base/frameworks/version loaded by default, should put it into
init-bare.zeek and possibly remove the loading of the reporter framework
from it - Reporter::error() would still work and be visible on stderr,
just not create a reporter.log.
The input framework currently gives a rather opaque error message when
encountering a line in which a required value is not provided. This
change updates this behavior; the error message now provides the record
element (or the name or the index element) which was not set in the
input data, even though it is required to be set by the underlying Zeek
type.
This test triggered ubsan by putting a function with the wrong type
as a post-processor into the .shadow file. Don't do that.
Likely Zeek should provide a better error message, but hand-crafting
.shadow files isn't what is normally done and this is to fix the
master build for now.