Notable changes:
- libmagic is no longer used at all. All MIME type detection is
done through new Bro signatures, and there's no longer a means to get
verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
The majority of the default file magic signatures are derived
from the default magic database of libmagic ~5.17.
- File magic signatures consist of two new constructs in the
signature rule parsing grammar: "file-magic" gives a regular
expression to match against, and "file-mime" gives the MIME type
string of content that matches the magic and an optional strength
value for the match.
- Modified signature/rule syntax for identifiers: they can no longer
start with a '-', which made for ambiguous syntax when doing negative
strength values in "file-mime". Also brought syntax for Bro script
identifiers in line with reality (they can't start with numbers or
include '-' at all).
- A new Built-In Function, "file_magic", can be used to get all
file magic matches and their corresponding strength against a given
chunk of data
- The second parameter of the "identify_data" Built-In Function
can no longer be used to get verbose file type descriptions, though it
can still be used to get the strongest matching file magic signature.
- The "file_transferred" event's "descr" parameter no longer
contains verbose file type descriptions.
- The BROMAGIC environment variable no longer changes any behavior
in Bro as magic databases are no longer used/installed.
- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
(it's back to being the same requirement as the Bro v2.2 release).
The bump was to accomodate building libmagic as an external project,
which is no longer needed.
Addresses BIT-1143.
If a file is nothing but gaps (e.g. due to missing/dropped packets), Bro
can sometimes detect a file is supposed to have been present and never
saw any of its content, but failed to raise file_over_new_connection
events for it. This was mostly apparent because the tx_hosts/rx_hosts
fields in files.log would not be populated in such cases (but are now
with this change).
- The reassembly behavior can be modified per-file by enabling or
disabling the reassembler and/or modifying the size of the reassembly
buffer.
- Changed the file extraction analyzer to use the stream to avoid
issues with the chunk based approach not immediately triggering
the file_new event due to mime-type detection delay. Early chunks
frequently ended up lost before.
- Generally things are working now and I'd consider this in testing.
- Move more functionality into base class.
- Remove cctors and assignment operators (weren't actually needed anymore)
- Switch from const char* to std::string.
- Enable manager to associate analyzers with a MIME type. With that,
one can now say enable all analyzers for, e.g., "image/gif". This is
exposed to script-land as
Files::add_analyzers_for_mime_type(f: fa_file, mtype: string)
For MIME types identified via libmagic, this happens automatically
(via the file_new() handler in files/main.bro).
- Extend the analyzer API to better match that of protocol analyzers:
- Adding unique analyzer IDs so that we can refer to instances
from script-land.
- Adding subtypes to Components so that a single analyzer
implementation can support different types of analyzers
internally.
- Add an analyzer method SetTag() that allows to set the tag after
construction.
- Adding Init() and Done() methods for consistency with what other
classes offer.
- Add debug logging to the file_analysis stream.
TODO: test cases missing for the new script-land functionality.
Made some class templates for code that seemed duplicated between
file/protocol tags and managers. Seems like it helps a bit and
hopefully can be also be used to transition other things that have
enum value "tags" (e.g. logging writers, input readers) to the
plugin system.
This cleans up internals of how analyzer instances get identified by the
tag plus any args given to it and doesn't change script code a user
would write.
* origin/topic/seth/faf-updates: (27 commits)
Undoing the FTP tests I updated earlier.
Update the last two btest FAF tests.
File analysis fixes and test updates.
Fix a bug with getting analyzer tags.
A few test updates.
Some tests work now (at least they all don't fail anymore!)
Forgot a file.
Added protocol description functions that provide a super compressed log representation.
Fix a bug where orig file information in http wasn't working right.
Added mime types to http.log
Clean up queued but unused file_over_new_connections event args.
Add jar files to the default MHR lookups.
Adding CAB files for MHR checking.
Improve malware hash registry script.
Fix a small issue with finding smtp entities.
Added support for files to the notice framework.
Make the custom libmagic database a git submodule.
Add an is_orig parameter to file_over_new_connection event.
Make magic for emitting application/msword mime type less strict.
Disable more libmagic builtin checks that override the magic database.
...
Conflicts:
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/test-all-policy.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
- Remove script-layer data input interface (will be managed directly
by input framework later).
- Only track files internally by file id hash. Chance of collision
too small to justify also tracking unique file string.
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).
Conflicts:
cmake
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/base/protocols/ftp/main.bro
scripts/base/protocols/irc/dcc-send.bro
scripts/test-all-policy.bro
src/AnalyzerTags.h
src/CMakeLists.txt
src/analyzer/Analyzer.cc
src/analyzer/protocol/file/File.cc
src/analyzer/protocol/file/File.h
src/analyzer/protocol/http/HTTP.cc
src/analyzer/protocol/http/HTTP.h
src/analyzer/protocol/mime/MIME.cc
src/event.bif
src/main.cc
src/util-config.h.in
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/istate.events-ssl/receiver.http.log
testing/btest/Baseline/istate.events-ssl/sender.http.log
testing/btest/Baseline/istate.events/receiver.http.log
testing/btest/Baseline/istate.events/sender.http.log
This works around a bug in libmagic since version 5.12 (current at
time of writing is 5.14) -- second call to magic_load() w/ non-default
database segfaults.
And added an event called "event_queue_flush_point" to mark where that
occured in the event stream. The FAF now uses an explicit event queue
flush instead of buffering input in order to wait for a file handle to
be returned from script-layer.
- FileAnalysis::Info is now just a record used for logging, the fa_file
record type is defined in init-bare.bro as the analogue to a
connection record.
- Starting to transfer policy hook triggers and analyzer results to
events.