The current_entity tracking in HTTP assumes that client/server never
send HTTP entities at the same time. The attached pcap (generated
artificially) violates this and triggers:
1663698249.307259 expression error in <...>base/protocols/http/./entities.zeek, line 89: field value missing (HTTP::c$http$current_entity)
For the http-no-crlf test, include weird.log as baseline. Now that weird is
@load'ed from http, it is actually created and seems to make sense
to btest-diff it, too.
We used to attempt to remove any port specification before recording
HTTP host headers in logs. Doing so would (1) remove potentially useful
information, (2) not match what the documentation seemed to suggest, and
(3) fail for IP6 addresses containing colons.
We now record the original HOST header as is.
Addresses #1844.
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:
hook(rec: any, id: Log::ID, filter: Log::Filter);
The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.
The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:
hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
if ( some_veto_reason(rec) )
break;
}
By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.
The existing predicates are deprecated for removal in 4.1 but continue
to work.
This adds two new functions: `Conn::register_removal_hook()` and
`Conn::unregister_removal_hook()` for registering a hook function to be
called back during `connection_state_remove`. The benefit of using hook
callback approach is better scalability: the overhead of unrelated
protocols having to dispatch no-op `connection_state_remove` handlers is
avoided.
This is to avoid missing large sessions where a single side exceeds
the DPD buffer size. It comes with the trade-off that now the analyzer
can be triggered by anybody controlling one of the endpoints (instead
of both).
Test suite changes are minor, and nothing in "external".
Closes#343.
And switch Zeek's base scripts over to using it in place of
"connection_state_remove". The difference between the two is
that "connection_state_remove" is raised for all events while
"successful_connection_remove" excludes TCP connections that were never
established (just SYN packets). There can be performance benefits
to this change for some use-cases.
There's also a new event called ``connection_successful`` and a new
``connection`` record field named "successful" to help indicate this new
property of connections.
* All "Broxygen" usages have been replaced in
code, documentation, filenames, etc.
* Sphinx roles/directives like ":bro:see" are now ":zeek:see"
* The "--broxygen" command-line option is now "--zeexygen"
The "orig_fuids", "orig_filenames", "orig_mime_types" http.log fields as
well as their "resp" counterparts are now limited to having
"HTTP::max_files_orig" or "HTTP::max_files_resp" entries, which are 15
by default. The limit can also be ignored case-by-case via the
"HTTP::max_files_policy" hook.
Fixes GH-289
* origin/topic/jsiwek/empty-lines:
Add 'smtp_excessive_pending_cmds' weird
Fix SMTP command string comparisons
Improve handling of empty lines in several text protocol analyzers
Add rate-limiting sampling mechanism for weird events
Teach timestamp canonifier about timestamps before ~2001
The generation of weird events, by default, are now rate-limited
according to these tunable options:
- Weird::sampling_whitelist
- Weird::sampling_threshold
- Weird::sampling_rate
- Weird::sampling_duration
The new get_reporter_stats() BIF also allows one to query the
total number of weirds generated (pre-sampling) which the new
policy/misc/weird-stats.bro script uses periodically to populate
a weird_stats.log.
There's also new reporter BIFs to allow generating weirds from the
script-layer such that they go through the same, internal
rate-limiting/sampling mechanisms:
- Reporter::conn_weird
- Reporter::flow_weird
- Reporter::net_weird
Some of the code was adapted from previous work by Johanna Amann.
The HTTP "Origin" header is a useful header for CSRF, Chrome plugins making requests, and other scenarios where referrer may not be present.
Reference:
https://tools.ietf.org/html/rfc6454#section-7 ---- "In some sense, the origin granularity is a historical artifact of how the security model evolved."
Especially useful if origin/referrer is a "file://" ---- https://tools.ietf.org/html/rfc6454#section-4
This changes the HTTP log format slightly but shouldn't mess
up anything that anyone was doing because the old "filename"
field was never actually filled out. Tests are updated as well.
* 'topic/jgras/base64-logging' of https://github.com/J-Gras/bro:
Update calls of Base64 functions.
Refactoring of Base64 functions.
I've removed the additional bif for encoding with a connection, as I'm
not sure there's much of a use case for it; we can always add it back
later if it turns out there is. I've also renamed
decode_base64_intern() to decode_base64_conn() to be a bit more
explicit about the difference.
Base64 encoding-errors during authentication in POP3 analyzer,
authentication in FTP analyzer (using GSI) and basic
authentication on HTTP will be logged to Weird.
* origin/topic/seth/more-file-type-ident-fixes:
File API updates complete.
Fixes for file type identification.
API changes to file analysis mime type detection.
Make HTTP 206 reassembly require ETags by default.
More file type identification improvements
Fix an issue with files having gaps before the bof_buffer is filled.
Fix an issue with packet loss in http file reporting.
Adding WOFF fonts to file type identification.
Extended JSON matching and added OCSP responses.
Another large signature update.
More signature updates.
Even more file type ident clean up.
Lots of fixes for file type identification.
BIT-1368 #merged
- Backed out eTag changes. The real world is more complicated
than just using eTags to identify the same file.
- A bit of code simplication in the http base scripts.
- Test updates (more existing small problems were identified!).
-
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred". It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply. The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).
Addresses BIT-1368.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.
The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.
Adapted the existing Log::create_stream calls to explicitly specify a
path value.
Addresses BIT-1324
These functions are now deprecated in favor of alternative versions that
return a vector of strings rather than a table of strings.
Deprecated functions:
- split: use split_string instead.
- split1: use split_string1 instead.
- split_all: use split_string_all instead.
- split_n: use split_string_n instead.
- cat_string_array: see join_string_vec instead.
- cat_string_array_n: see join_string_vec instead.
- join_string_array: see join_string_vec instead.
- sort_string_array: use sort instead instead.
- find_ip_addresses: use extract_ip_addresses instead.
Changed functions:
- has_valid_octets: uses a string_vec parameter instead of string_array.
Addresses BIT-924, BIT-757.
- Removed "binary" and "octet-stream" mime type detections. They don't
provide any more information than an uninitialized mime_type field
which implicitly means no magic signature matches and so the media
type is unknown to Bro.
- Slight change to "text/plain" signature. It's still not the most
accurate, which is reflected in its -20 strength value.
- The logic for adding file ids to {orig,resp}_fuids fields of
the http.log incorrectly depended on the state of
{orig,resp}_mime_types fields, so sometimes not all file ids
associated w/ the session were logged.
* topic/robin/http-connect:
HTTP fix for output handlers.
Expanding the HTTP methods used in the signature to detect HTTP traffic.
Updating submodule(s).
Fixing removal of support analyzers, plus some tweaking and cleanup of CONNECT code.
HTTP CONNECT proxy support.
BIT-1132 #merged