The "orig_fuids", "orig_filenames", "orig_mime_types" http.log fields as
well as their "resp" counterparts are now limited to having
"HTTP::max_files_orig" or "HTTP::max_files_resp" entries, which are 15
by default. The limit can also be ignored case-by-case via the
"HTTP::max_files_policy" hook.
Fixes GH-289
* origin/topic/jsiwek/empty-lines:
Add 'smtp_excessive_pending_cmds' weird
Fix SMTP command string comparisons
Improve handling of empty lines in several text protocol analyzers
Add rate-limiting sampling mechanism for weird events
Teach timestamp canonifier about timestamps before ~2001
The generation of weird events, by default, are now rate-limited
according to these tunable options:
- Weird::sampling_whitelist
- Weird::sampling_threshold
- Weird::sampling_rate
- Weird::sampling_duration
The new get_reporter_stats() BIF also allows one to query the
total number of weirds generated (pre-sampling) which the new
policy/misc/weird-stats.bro script uses periodically to populate
a weird_stats.log.
There's also new reporter BIFs to allow generating weirds from the
script-layer such that they go through the same, internal
rate-limiting/sampling mechanisms:
- Reporter::conn_weird
- Reporter::flow_weird
- Reporter::net_weird
Some of the code was adapted from previous work by Johanna Amann.
The HTTP "Origin" header is a useful header for CSRF, Chrome plugins making requests, and other scenarios where referrer may not be present.
Reference:
https://tools.ietf.org/html/rfc6454#section-7 ---- "In some sense, the origin granularity is a historical artifact of how the security model evolved."
Especially useful if origin/referrer is a "file://" ---- https://tools.ietf.org/html/rfc6454#section-4
This changes the HTTP log format slightly but shouldn't mess
up anything that anyone was doing because the old "filename"
field was never actually filled out. Tests are updated as well.
* 'topic/jgras/base64-logging' of https://github.com/J-Gras/bro:
Update calls of Base64 functions.
Refactoring of Base64 functions.
I've removed the additional bif for encoding with a connection, as I'm
not sure there's much of a use case for it; we can always add it back
later if it turns out there is. I've also renamed
decode_base64_intern() to decode_base64_conn() to be a bit more
explicit about the difference.
Base64 encoding-errors during authentication in POP3 analyzer,
authentication in FTP analyzer (using GSI) and basic
authentication on HTTP will be logged to Weird.
* origin/topic/seth/more-file-type-ident-fixes:
File API updates complete.
Fixes for file type identification.
API changes to file analysis mime type detection.
Make HTTP 206 reassembly require ETags by default.
More file type identification improvements
Fix an issue with files having gaps before the bof_buffer is filled.
Fix an issue with packet loss in http file reporting.
Adding WOFF fonts to file type identification.
Extended JSON matching and added OCSP responses.
Another large signature update.
More signature updates.
Even more file type ident clean up.
Lots of fixes for file type identification.
BIT-1368 #merged
- Backed out eTag changes. The real world is more complicated
than just using eTags to identify the same file.
- A bit of code simplication in the http base scripts.
- Test updates (more existing small problems were identified!).
-
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred". It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply. The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).
Addresses BIT-1368.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.
The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.
Adapted the existing Log::create_stream calls to explicitly specify a
path value.
Addresses BIT-1324
These functions are now deprecated in favor of alternative versions that
return a vector of strings rather than a table of strings.
Deprecated functions:
- split: use split_string instead.
- split1: use split_string1 instead.
- split_all: use split_string_all instead.
- split_n: use split_string_n instead.
- cat_string_array: see join_string_vec instead.
- cat_string_array_n: see join_string_vec instead.
- join_string_array: see join_string_vec instead.
- sort_string_array: use sort instead instead.
- find_ip_addresses: use extract_ip_addresses instead.
Changed functions:
- has_valid_octets: uses a string_vec parameter instead of string_array.
Addresses BIT-924, BIT-757.
- Removed "binary" and "octet-stream" mime type detections. They don't
provide any more information than an uninitialized mime_type field
which implicitly means no magic signature matches and so the media
type is unknown to Bro.
- Slight change to "text/plain" signature. It's still not the most
accurate, which is reflected in its -20 strength value.
- The logic for adding file ids to {orig,resp}_fuids fields of
the http.log incorrectly depended on the state of
{orig,resp}_mime_types fields, so sometimes not all file ids
associated w/ the session were logged.
* topic/robin/http-connect:
HTTP fix for output handlers.
Expanding the HTTP methods used in the signature to detect HTTP traffic.
Updating submodule(s).
Fixing removal of support analyzers, plus some tweaking and cleanup of CONNECT code.
HTTP CONNECT proxy support.
BIT-1132 #merged
CONNECT code.
Removal of support analyzers was broken. The code now actually doesn't
delete them immediately anymore but instead just flags them as
disabled. They'll be destroyed with the parent analyzer later.
Also includes a new leak tests exercising the CONNECT code.
Lines starting # with '#' will be ignored, and an empty message aborts
the commit. # On branch topic/robin/http-connect # Changes to be
committed: # modified: scripts/base/protocols/http/main.bro #
modified: scripts/base/protocols/ssl/consts.bro # modified:
src/analyzer/Analyzer.cc # modified: src/analyzer/Analyzer.h #
modified: src/analyzer/protocol/http/HTTP.cc # new file:
testing/btest/core/leaks/http-connect.bro # modified:
testing/btest/scripts/base/protocols/http/http-connect.bro # #
Untracked files: # .tags # changes.txt # conn.log # debug.log # diff #
mpls-in-vlan.patch # newfile.pcap # packet_filter.log # reporter.log #
src/PktSrc.cc.orig # weird.log #
Add a "broxygen" domain Sphinx extension w/ directives to allow
on-the-fly documentation to be generated w/ Bro and included in files.
This means all autogenerated reST docs are now done by Bro. The odd
CMake/Python glue scipts which used to generate some portions are now
gone. Bro and the Sphinx extension handle checking for outdated docs
themselves.
Parallel builds of `make doc` target should now work (mostly because
I don't think there's any tasks that can be done in parallel anymore).
Overall, this seems to simplify things and make the Broxygen-generated
portions of the documentation visible/traceable from the main Sphinx
source tree. The one odd thing still is that per-script documentation
is rsync'd in to a shadow copy of the Sphinx source tree within the
build dir. This is less elegant than using the new broxygen extension
to make per-script docs, but rsync is faster and simpler. Simpler as in
less code because it seems like, in the best case, I'd need to write a
custom Sphinx Builder to be able to get that to even work.
* origin/topic/seth/faf-updates: (27 commits)
Undoing the FTP tests I updated earlier.
Update the last two btest FAF tests.
File analysis fixes and test updates.
Fix a bug with getting analyzer tags.
A few test updates.
Some tests work now (at least they all don't fail anymore!)
Forgot a file.
Added protocol description functions that provide a super compressed log representation.
Fix a bug where orig file information in http wasn't working right.
Added mime types to http.log
Clean up queued but unused file_over_new_connections event args.
Add jar files to the default MHR lookups.
Adding CAB files for MHR checking.
Improve malware hash registry script.
Fix a small issue with finding smtp entities.
Added support for files to the notice framework.
Make the custom libmagic database a git submodule.
Add an is_orig parameter to file_over_new_connection event.
Make magic for emitting application/msword mime type less strict.
Disable more libmagic builtin checks that override the magic database.
...
Conflicts:
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/test-all-policy.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
- Several places were just using old variable names or not loading
scripts correctly after they'd been renamed/moved.
- Revert/adjust a change in how HTTP file handles are generated that
broke partial content responses.
- Turn some libmagic builtin checks back on; seems some are actually
useful (e.g. text detection seems to be a builtin). The rule going
forward probably will be only to turn off a builtin if we confirm it
causes issues.
- Removed some tests that are redundant or not necessary anymore because
the generic file analysis tests cover them.
- A couple FTP tests still fail that I think need an actual solution via
script changes.
- This caused us to lose signatures for POP3 and Bittorrent. These will
need discovered in the repository again when we add scripts
for those analyzers.