* topic/christian/cluster-controller:
Add a cluster controller testcase for agent-controller checkin
Add zeek-client via new submodule
Update baselines affected by cluster controller changes
Introduce cluster controller and cluster agent scripting
Establish a separate init script when using the supervisor
Add optional bare-mode boolean flag to Supervisor's node configuration
Add support for making the supervisor listen for requests
Add support for setting environment variables via supervisor
This is a preliminary implementation of a subset of the functionality set out in
our cluster controller architecture. The controller is the central management
node, existing once in any Zeek cluster. The agent is a node that runs once per
instance, where an instance will commonly be a physical machine. The agent in
turn manages the "data cluster", i.e. the traditional notion of a Zeek cluster
with manager, worker nodes, etc.
Agent and controller live in the policy folder, and are activated when loading
policy/frameworks/cluster/agent and policy/frameworks/cluster/controller,
respectively. Both run in nodes forked by the supervisor. When Zeek doesn't use
the supervisor, they do nothing. Otherwise, boot.zeek instructs the supervisor
to create the respective node, running main.zeek.
Both controller and agent have their own config.zeek with relevant knobs. For
both, controller/types.zeek provides common data types, and controller/log.zeek
provides basic logging (without logger communication -- no such node might
exist).
A primitive request-tracking abstraction can be found in controller/request.zeek
to track outstanding request events and their subsequent responses.
The added disable-certificate-events-known-certs.zeek disables repeated
X509 events in SSL connections, given that the connection terminates at
the same server and used the samt SNI as a previously seen connection
with the same certificate.
For people that see significant amounts of TLS 1.2 traffic, this could
reduce the amount of raised events significantly - especially when a
lot of connections are repeat connections to the same servers.
The practical impact of not raising these events is actually very little
- unless a script directly interacts with the x509 events, everything
works as before - the x509 variables in the connection records are still
being set (from the cache).
This policy script significantly extends the details that are logged
about SSL/TLS handshakes.
I am a bit tempted to just make this part of the default log - but it
does add a bunch logging overhead for each connection.
Extract-certs-pem writes pem files to a dedicated file; since it does
not really work in cluster-environments it was never super helpful.
This commit deprecates this file and, instead, adds
log-certs-base64.zeek, which adds the base64-encoded certificate (which
is basically equivalent with a PEM) to the log-file. Since, nowadays,
the log-files are deduplicates this should not add a huge overhead.
The ICSI notary is pretty much inactive. Furthermore - this approach
does no longer make much sense at this point of time - performing, e.g.,
signed certificate timestamp validation is much more worthwhile.
This commit changes the SSL and X.509 logging formats to something that,
hopefully, slowly approaches what they will look like in the future.
X.509 log is not yet deduplicated; this will come in the future.
This commit introduces two new options, which determine if certificate
issuers and subjects are still logged in ssl.log. The default is to have
the host subject/issuer logged, but to remove client-certificate
information. Client-certificates are not a typically used feature
nowadays.
In the past I thought that this is not super interesting. However, it
turns out that this can actually contain a slew of interresting
information - like operating systems querying for the revocation of
software signing certificates, e.g.
So - let's just enable this as a default log for the future.
The larger number was substracted from the smaller one leading to an
integer overflow. However, no information was lost due to everything
also being present in the notice message.
Fixes GH-1454
Changes \x00-\x37 ranges to \x00-\x1f with assumption that the former
was attempting to match ASCII control characters, but mistook an octal
range for hex. This change reduces some false positives.
- Tweaked the Too_Little_Traffic notice message to avoid
cluster-specific terminology.
* origin/topic/vlad/caploss_no_traffic:
Fix scheduling due to network_time being 0 in zeek_init
Add test for CaptureLoss::Too_Little_Traffic
Add CaptureLoss::Too_Little_Traffic
Add CaptureLoss::initial_watch_interval for a quick read on cluster health after startup.
Documentation update, reference the threshold variable. [nomail] [skip ci]
Whitespace fixes only [nomail] [skip ci]
generate_all_events causes all events to be raised internally; this
makes it possible for dump_events to really capture all events (and not
just those that were handled).
Addresses GH-169
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:
hook(rec: any, id: Log::ID, filter: Log::Filter);
The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.
The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:
hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
if ( some_veto_reason(rec) )
break;
}
By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.
The existing predicates are deprecated for removal in 4.1 but continue
to work.
This adds two new functions: `Conn::register_removal_hook()` and
`Conn::unregister_removal_hook()` for registering a hook function to be
called back during `connection_state_remove`. The benefit of using hook
callback approach is better scalability: the overhead of unrelated
protocols having to dispatch no-op `connection_state_remove` handlers is
avoided.
ACTION_DROP is not only part of catch-n-release subsystem.
Also, historically ACTION_DROP has been bundled with ACTION_LOG, ACTION_ALARM, ACTION_EMAIL... and its helpful that this verb remains in base/frameworks/notice/main.zeek
``NetControl::DROP`` had 3 conflicting definitions that could potentially
be used incorrectly without any warnings or type-checking errors.
Such enum redefinition conflicts are now caught and treated as errors,
so the ``NetControl::DROP`` enums had to be renamed:
* The use as enum of type ``Log::ID`` is renamed to ``NetControl::DROP_LOG``
* The use as enum of type ``NetControl::CatchReleaseInfo`` is renamed to
``NetControl::DROP_REQUESTED``
* The use as enum of type ``NetControl::RuleType`` is unchanged and still
named ``NetControl::DROP``
Previously, a single `icmp_conn` record was built per ICMP "connection"
and re-used for all events generated from it. This may have been a
historical attempt at performance optimization, but:
* By default, Zeek does not load any scripts that handle ICMP events.
* The one script Zeek ships with that does handle ICMP events,
"detect-traceroute", is already noted as being disabled due to
potential performance problems of doing that kind of analysis.
* Re-use of the original `icmp_conn` record tends to misreport
TTL and length values since they come from original packet instead
of the current one.
* Even if we chose to still re-use `icmp_conn` records and just fill
in a new TTL and length value each packet, a user script could have
stored a reference to the record and not be expecting those values
to be changed out from underneath them.
Now, a new `icmp_info` record is created/populated in all ICMP events
and should be used instead of `icmp_conn`. It also removes the
orig_h/resp_h fields as those are redundant with what's already
available in the connection record.
Changes during merge
- Changed the policy script to use an event handler that behaves
for like the base script: &priority=5, msg$opcode != early-out,
no record field existence checks
- Also extended dns_query_reply event with original_query param
- Removed ExtractName overload, and just use default param
* 'dns-original-query-case' of https://github.com/rvictory/zeek:
Fixed some places where tabs became spaces
Stricter checking if we have a dns field on the connection being processed
Modified the DNS protocol analyzer to add a new parameter to the dns_request event which includes the DNS query in its original case. Added a policy script that will add the original_case to the dns.log file as well. Created new btests to test both.
- Updated the logic significantly: still filters out ICMP from being
considered an active service (like before) and adds a new
"Known::service_udp_requires_response" option (defaults to true) for
whether to require UDP server response before being considered an
active service.
* 'topic/dopheide/known-services' of https://github.com/dopheide-esnet/zeek:
Log services with unknown protocols
- Added test case and adjusted whitespace in merge
* 'stats-logging-fix' of https://github.com/brittanydonowho/zeek:
Fixed stats.zeek to log all data before zeek terminates rather than return too soon
* 'master' of https://github.com/mmguero-dev/zeek:
check for the existance of f?$conns in file_sniff event in policy/protocols/ssl/log-hostcerts-only.zeek
In using the corelight/bro-xor-exe-plugin (https://github.com/corelight/bro-xor-exe-plugin) I noticed this error when running the PCAP trace file in its tests directory:
1428602842.525435 expression error in /opt/zeek/share/zeek/policy/protocols/ssl/log-hostcerts-only.zeek, line 44: field value missing (X509::f$conns)
Examining log-hostcerts-only.zeek, I saw that although f$conns is being checked for length, it's not being checked to see if it exists first.
This commit changes "if ( |f$conns| != 1 )" to "if (( ! f?$conns ) || ( |f$conns| != 1 ))" so that the script returns if there is no f$conns field.
In my local testing, this seems to fix the error. My testing was being done with v3.0.5, but I think this patch can be applied to both the 3.0.x and 3.1.x branches.
For event/hook handlers that had a previous declaration, any &default
arguments are ineffective. Only &default uses in the initial
prototype's arguments have an effect (that includes if the handler
is actually the site at which the declaration occurs).