OSS Fuzz generated a CWD request and reply followed by very many EPRT
requests. This caused Zeek to re-log the CWD request and invoke `build_url_ftp()`
over and over again resulting in long processing times.
Avoid this scenario by not logging commands that aren't pending anymore.
(cherry picked from commit b05dd31667ff634ec7d017f09d122f05878fdf65)
OSS-Fuzz generated traffic containing a CWD command with a single very large
path argument (427kb) starting with ".___/` \x00\x00...", This is followed
by a large number of ftp replies with code 250. The directory logic in
ftp_reply() would match every incoming reply with the one pending CWD command,
triggering path buildup ending with something 120MB in size.
Protect from re-using a directory command by setting a flag in the
CmdArg record when it was consumed for the path traversal logic.
This doesn't prevent unbounded path build-up generally, but does prevent the
amplification of a single large command with very many small ftp_replies.
Re-using a pending path command seems like a bug as well.
The medium.trace in the private external test suite contains one
session/server that violates the multi-line reply protocol and
happened to work out fairly well regardless due to how we looked
up the pending commands unconditionally before.
Continue to match up reply lines that "look like they contain status codes"
even if cont_resp = T. This still improves runtime for the OSS-Fuzz
generated test case and keeps the external baselines valid.
The affected session can be extracted as follows:
zcat Traces/medium.trace.gz | tcpdump -r - 'port 1491 and port 21'
We could push this into the analyzer, too, minimally the RFC says:
> If an intermediary line begins with a 3-digit number, the Server
> must pad the front to avoid confusion.
Intermediate lines of multiline replies usually do not contain valid status
codes (even if servers may opt to include them). Their content may be anything
and likely unrelated to the original command. There's little reason for us
trying to match them with a corresponding command.
OSS-Fuzz generated a large command reply with very many intermediate lines
which caused long processing times due to matching every line with all
currently pending commands.
This is a DoS vector against Zeek. The new ipv6-multiline-reply.trace and
ipv6-retr-samba.trace files have been extracted from the external ipv6.trace.
The user and password fields are replicated to each of the ftp.log
entries. Using a very large username (100s of KBs) allows to bloat
the log without actually sending much traffic. Further, limit the
arg and reply_msg columns to large, but not unbounded values.
This commit changes the SSL and X.509 logging formats to something that,
hopefully, slowly approaches what they will look like in the future.
X.509 log is not yet deduplicated; this will come in the future.
This commit introduces two new options, which determine if certificate
issuers and subjects are still logged in ssl.log. The default is to have
the host subject/issuer logged, but to remove client-certificate
information. Client-certificates are not a typically used feature
nowadays.
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:
hook(rec: any, id: Log::ID, filter: Log::Filter);
The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.
The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:
hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
if ( some_veto_reason(rec) )
break;
}
By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.
The existing predicates are deprecated for removal in 4.1 but continue
to work.
This adds two new functions: `Conn::register_removal_hook()` and
`Conn::unregister_removal_hook()` for registering a hook function to be
called back during `connection_state_remove`. The benefit of using hook
callback approach is better scalability: the overhead of unrelated
protocols having to dispatch no-op `connection_state_remove` handlers is
avoided.
Previously, more data than could effectively be utilized by any remote
Zeek was published (e.g. full list of pending commands or other
transient state that may add up to non-trivial amount of bytes).
And switch Zeek's base scripts over to using it in place of
"connection_state_remove". The difference between the two is
that "connection_state_remove" is raised for all events while
"successful_connection_remove" excludes TCP connections that were never
established (just SYN packets). There can be performance benefits
to this change for some use-cases.
There's also a new event called ``connection_successful`` and a new
``connection`` record field named "successful" to help indicate this new
property of connections.
Minor cleanup in merge: remove print statements and unnecessary @if
directive.
* 'topic/jsbarber/ftp-cluster-fix-patch' of https://github.com/jsbarber/zeek:
Publish ftp_data_expected updates to other workers for synchronization
* All "Broxygen" usages have been replaced in
code, documentation, filenames, etc.
* Sphinx roles/directives like ":bro:see" are now ":zeek:see"
* The "--broxygen" command-line option is now "--zeexygen"
The bytes_threshold_crossed event in the gridftp analyzer is not first
checking to see if the connection passed the initial criteria. This
causes the script to add the gridftp-data service to any connection that
crosses a threshold that is the same as or greater than the gridftp
size_threshold.
The server-reported file size was being collected poorly and if
a file name had a number in it, that was reported as the file
size instead of the actual size.
A new test is included to avoid reintroducing the problem.
* origin/topic/seth/more-file-type-ident-fixes:
File API updates complete.
Fixes for file type identification.
API changes to file analysis mime type detection.
Make HTTP 206 reassembly require ETags by default.
More file type identification improvements
Fix an issue with files having gaps before the bof_buffer is filled.
Fix an issue with packet loss in http file reporting.
Adding WOFF fonts to file type identification.
Extended JSON matching and added OCSP responses.
Another large signature update.
More signature updates.
Even more file type ident clean up.
Lots of fixes for file type identification.
BIT-1368 #merged
* origin/topic/johanna/conn-threshold:
Wrap threshold stuff up - fix two small bugs and update baselines.
update GridFTP analyzer to use connection thresholding instead of polling
Add high level api for thresholding that holds lists of thresholds and raises an event for each threshold exactly once.
Allow setting packet and byte thresholds for connections.
BIT-1377 #merged
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred". It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply. The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).
Addresses BIT-1368.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.
The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.
Adapted the existing Log::create_stream calls to explicitly specify a
path value.
Addresses BIT-1324
These functions are now deprecated in favor of alternative versions that
return a vector of strings rather than a table of strings.
Deprecated functions:
- split: use split_string instead.
- split1: use split_string1 instead.
- split_all: use split_string_all instead.
- split_n: use split_string_n instead.
- cat_string_array: see join_string_vec instead.
- cat_string_array_n: see join_string_vec instead.
- join_string_array: see join_string_vec instead.
- sort_string_array: use sort instead instead.
- find_ip_addresses: use extract_ip_addresses instead.
Changed functions:
- has_valid_octets: uses a string_vec parameter instead of string_array.
Addresses BIT-924, BIT-757.