- Removed "binary" and "octet-stream" mime type detections. They don't
provide any more information than an uninitialized mime_type field
which implicitly means no magic signature matches and so the media
type is unknown to Bro.
- Slight change to "text/plain" signature. It's still not the most
accurate, which is reflected in its -20 strength value.
- The logic for adding file ids to {orig,resp}_fuids fields of
the http.log incorrectly depended on the state of
{orig,resp}_mime_types fields, so sometimes not all file ids
associated w/ the session were logged.
* topic/robin/http-connect:
HTTP fix for output handlers.
Expanding the HTTP methods used in the signature to detect HTTP traffic.
Updating submodule(s).
Fixing removal of support analyzers, plus some tweaking and cleanup of CONNECT code.
HTTP CONNECT proxy support.
BIT-1132 #merged
CONNECT code.
Removal of support analyzers was broken. The code now actually doesn't
delete them immediately anymore but instead just flags them as
disabled. They'll be destroyed with the parent analyzer later.
Also includes a new leak tests exercising the CONNECT code.
Lines starting # with '#' will be ignored, and an empty message aborts
the commit. # On branch topic/robin/http-connect # Changes to be
committed: # modified: scripts/base/protocols/http/main.bro #
modified: scripts/base/protocols/ssl/consts.bro # modified:
src/analyzer/Analyzer.cc # modified: src/analyzer/Analyzer.h #
modified: src/analyzer/protocol/http/HTTP.cc # new file:
testing/btest/core/leaks/http-connect.bro # modified:
testing/btest/scripts/base/protocols/http/http-connect.bro # #
Untracked files: # .tags # changes.txt # conn.log # debug.log # diff #
mpls-in-vlan.patch # newfile.pcap # packet_filter.log # reporter.log #
src/PktSrc.cc.orig # weird.log #
Add a "broxygen" domain Sphinx extension w/ directives to allow
on-the-fly documentation to be generated w/ Bro and included in files.
This means all autogenerated reST docs are now done by Bro. The odd
CMake/Python glue scipts which used to generate some portions are now
gone. Bro and the Sphinx extension handle checking for outdated docs
themselves.
Parallel builds of `make doc` target should now work (mostly because
I don't think there's any tasks that can be done in parallel anymore).
Overall, this seems to simplify things and make the Broxygen-generated
portions of the documentation visible/traceable from the main Sphinx
source tree. The one odd thing still is that per-script documentation
is rsync'd in to a shadow copy of the Sphinx source tree within the
build dir. This is less elegant than using the new broxygen extension
to make per-script docs, but rsync is faster and simpler. Simpler as in
less code because it seems like, in the best case, I'd need to write a
custom Sphinx Builder to be able to get that to even work.
* origin/topic/seth/faf-updates: (27 commits)
Undoing the FTP tests I updated earlier.
Update the last two btest FAF tests.
File analysis fixes and test updates.
Fix a bug with getting analyzer tags.
A few test updates.
Some tests work now (at least they all don't fail anymore!)
Forgot a file.
Added protocol description functions that provide a super compressed log representation.
Fix a bug where orig file information in http wasn't working right.
Added mime types to http.log
Clean up queued but unused file_over_new_connections event args.
Add jar files to the default MHR lookups.
Adding CAB files for MHR checking.
Improve malware hash registry script.
Fix a small issue with finding smtp entities.
Added support for files to the notice framework.
Make the custom libmagic database a git submodule.
Add an is_orig parameter to file_over_new_connection event.
Make magic for emitting application/msword mime type less strict.
Disable more libmagic builtin checks that override the magic database.
...
Conflicts:
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/test-all-policy.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
- Several places were just using old variable names or not loading
scripts correctly after they'd been renamed/moved.
- Revert/adjust a change in how HTTP file handles are generated that
broke partial content responses.
- Turn some libmagic builtin checks back on; seems some are actually
useful (e.g. text detection seems to be a builtin). The rule going
forward probably will be only to turn off a builtin if we confirm it
causes issues.
- Removed some tests that are redundant or not necessary anymore because
the generic file analysis tests cover them.
- A couple FTP tests still fail that I think need an actual solution via
script changes.
- This caused us to lose signatures for POP3 and Bittorrent. These will
need discovered in the repository again when we add scripts
for those analyzers.
- Recorrected the module name to Files.
- Added Files::analyzer_name to get a more readable name for a
file analyzer.
- Improved and just overall better handled multipart mime
transfers in HTTP and SMTP. HTTP now has orig_fuids and resp_fuids
log fields since multiple "files" can be transferred with
multipart mime in a single request/response pair. SMTP has
an fuids field which has file unique IDs for all parts
transferred. FTP and IRC have a log field named fuid added
because only a single file can be transferred per irc and ftp
log line.
http.log now has files taken from request and response bodies in
different fields for each, and can now track multiple files per body.
That is, the "extraction_file" field is now "extracted_request_files"
and "extracted_response_files".
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).
Conflicts:
cmake
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/base/protocols/ftp/main.bro
scripts/base/protocols/irc/dcc-send.bro
scripts/test-all-policy.bro
src/AnalyzerTags.h
src/CMakeLists.txt
src/analyzer/Analyzer.cc
src/analyzer/protocol/file/File.cc
src/analyzer/protocol/file/File.h
src/analyzer/protocol/http/HTTP.cc
src/analyzer/protocol/http/HTTP.h
src/analyzer/protocol/mime/MIME.cc
src/event.bif
src/main.cc
src/util-config.h.in
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/istate.events-ssl/receiver.http.log
testing/btest/Baseline/istate.events-ssl/sender.http.log
testing/btest/Baseline/istate.events/receiver.http.log
testing/btest/Baseline/istate.events/sender.http.log
And added an event called "event_queue_flush_point" to mark where that
occured in the event stream. The FAF now uses an explicit event queue
flush instead of buffering input in order to wait for a file handle to
be returned from script-layer.
- FileAnalysis::Info is now just a record used for logging, the fa_file
record type is defined in init-bare.bro as the analogue to a
connection record.
- Starting to transfer policy hook triggers and analyzer results to
events.
This reverts commit fc267d010d.
There were some diffs caused by this in external test suites I'm
unsure about, I'm going to go over optimizations more closely in
a different branch.
When a file handle is needed and the last event in the queue is also
a get_file_handle event with the same arguments, instead of queueing
a new event, just remember to cache/re-use the resulting handle from
the previous event. This depends on get_file_handle handlers not
changing global state that is also used to derive the file handle
string.
- Add a timeout flag to file_analysis.log so it's easy to tell what
has had at least one timeout trigger happen.
- Fix ftp-data service tag not being set for reused connections.
- Fix HTTP::Incorrect_File_Type because mime types returned by FAF have
the charset still in them, but the HTTP::mime_types_extensions table
does not and it requires an exact string match. (still ugly)
- Add TRIGGER_NEW_CONN to track files going over multiple connections.
- Add an initial file/mime type guess for non-linear file transfers.
- Fix a case where file/mime type detection would never be attempted
if the start of the file was a content gap.
- Improve mime type tracking of HTTP byte-range/partial-content,
even if the requests are pipelined or over multiple connections.
- I changed the modbus.events test because having the baseline output
be 80+ MB is nuts and it was sensitive to connection record redefs.
This is a larger internal change that moves the analyzer
infrastructure to a more flexible model where the available analyzers
don't need to be hardcoded at compile time anymore. While currently
they actually still are, this will in the future enable external
analyzer plugins. For now, it does already add the capability to
dynamically enable/disable analyzers from script-land, replacing the
old Analyzer::Available() methods.
There are three major parts going into this:
- A new plugin infrastructure in src/plugin. This is independent
of analyzers and will eventually support plugins for other parts
of Bro as well (think: readers and writers). The goal is that
plugins can be alternatively compiled in statically or loadead
dynamically at runtime from a shared library. While the latter
isn't there yet, there'll be almost no code change for a plugin
to make it dynamic later (hopefully :)
- New analyzer infrastructure in src/analyzer. I've moved a number
of analyzer-related classes here, including Analyzer and DPM;
the latter now renamed to Analyzer::Manager. More will move here
later. Currently, there's only one plugin here, which provides
*all* existing analyzers. We can modularize this further in the
future (or not).
- A new script interface in base/framework/analyzer. I think that
this will eventually replace the dpm framework, but for now
that's still there as well, though some parts have moved over.
I've also remove the dpd_config table; ports are now configured via
the analyzer framework. For exmaple, for SSH:
const ports = { 22/tcp } &redef;
event bro_init() &priority=5
{
...
Analyzer::register_for_ports(Analyzer::ANALYZER_SSH, ports);
}
As you can see, the old ANALYZER_SSH constants have more into an enum
in the Analyzer namespace.
This is all hardly tested right now, and not everything works yet.
There's also a lot more cleanup to do (moving more classes around;
removing no longer used functionality; documenting script and C++
interfaces; regression tests). But it seems to generally work with a
small trace at least.
The debug stream "dpm" shows more about the loaded/enabled analyzers.
A new option -N lists loaded plugins and what they provide (including
those compiled in statically; i.e., right now it outputs all the
analyzers).
This is all not cast-in-stone yet, for some things we need to see if
they make sense this way. Feedback welcome.
Versus from synchronous function calls, which doesn't work well because
the function call can see a script-layer state that doesn't reflect
the state as it will be in terms of the event/network stream.
Other misc:
- Remove HTTP::MD5 notice.
- Add "last_active" field to FileAnalysis::Info record.
- Replace "conn_uids", "conn_ids" fields in FileAnalysis::Info record
with just a "conns" fields containing full connection records.
- The http-methods unit test is failing now, but I think it will be
fixed once I change the file handle callback mechanism to use events
instead.
A retry happens on every new input and also periodically based on a
timer. If a file handle is returned at those times, the input is
forwarded for analysis, else it keeps retrying until a timeout
threshold.
The framework now cycles through callbacks based on a table indexed
by analyzer tags, or the special case of service strings if a given
analyzer is overloaded for multiple protocols (FTP/IRC data). This
lets each protocol script bundle implement the callback locally and
reduces the FAF's external dependencies.