* origin/topic/jsiwek/file-signatures:
File type detection changes and fix https.log {orig,resp}_fuids fields.
Various minor changes related to file mime type detection.
Refactor common MIME magic matching code.
Replace libmagic w/ Bro signatures for file MIME type identification.
Conflicts:
scripts/base/init-default.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
BIT-1143 #merged
* origin/topic/bernhard/file-analysis-x509:
Forgot the preamble for the new leak test
(hopefully) last change -> return real opaque vec instead of any_vec
Fix dump-events - it cannot be used with ssl anymore, because openssl does not give the same string results in all versions.
Finishing touches of the x509 file analyzer.
Revert change to only log certificates once per hour.
Change x509 log - now certificates are only logged once per hour.
Fix circular reference problem and a few other small things.
X509 file analyzer nearly done. Verification and most other policy scripts work fine now.
Add verify functionality, including the ability to get the validated chain. This means that it is now possible to get information about the root-certificates that were used to secure a connection.
Second try on the event interface.
Backport crash fix that made it into master with the x509_extension backport from here.
Make x509 certificates an opaque type
rip out x509 code from ssl analyzer. Note that since at the moment the file analyzer does not yet re-populate the info record that means quite a lot of information is simply not available.
parse out extension. One event for general extensions (just returns the openssl-parsed string-value), one event for basicconstraints (is a certificate a CA or not) and one event for subject-alternative-names (only DNS parts).
Very basic file-analyzer for x509 certificates. Mostly ripped from the ssl-analyzer and the topic/bernhard/x509 branch.
Add parsing of several more types to SAN extension.
Make error messages of x509 file analyzer more useful.
Fix file ID generation.
You apparently have to be very careful which EndOfFile function of
the file analysis framework you call... otherwhise it might try
to close another file id. This took me quite a while to find.
addresses BIT-953, BIT-760, BIT-1150
- Improve or just remove some file magic signatures ported from libmagic
that were too general and matched incorrectly too often.
- Fix MHR script's use of fa_file$mime_type before checking if it's
initialized. It may be uninitialized if no signatures match.
- The "fa_file" record now contains a "mime_types" field that contains
all magic signatures that matched the file content (where the
"mime_type" field is just a shortcut for the strongest match).
Put some methods in file_analysis::Manager that can perform the
matching process and return MIME type results. Also helps to
centralize the management/re-use of a signature matcher object.
Notable changes:
- libmagic is no longer used at all. All MIME type detection is
done through new Bro signatures, and there's no longer a means to get
verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
The majority of the default file magic signatures are derived
from the default magic database of libmagic ~5.17.
- File magic signatures consist of two new constructs in the
signature rule parsing grammar: "file-magic" gives a regular
expression to match against, and "file-mime" gives the MIME type
string of content that matches the magic and an optional strength
value for the match.
- Modified signature/rule syntax for identifiers: they can no longer
start with a '-', which made for ambiguous syntax when doing negative
strength values in "file-mime". Also brought syntax for Bro script
identifiers in line with reality (they can't start with numbers or
include '-' at all).
- A new Built-In Function, "file_magic", can be used to get all
file magic matches and their corresponding strength against a given
chunk of data
- The second parameter of the "identify_data" Built-In Function
can no longer be used to get verbose file type descriptions, though it
can still be used to get the strongest matching file magic signature.
- The "file_transferred" event's "descr" parameter no longer
contains verbose file type descriptions.
- The BROMAGIC environment variable no longer changes any behavior
in Bro as magic databases are no longer used/installed.
- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
(it's back to being the same requirement as the Bro v2.2 release).
The bump was to accomodate building libmagic as an external project,
which is no longer needed.
Addresses BIT-1143.
SSL::Info now holds a reference to Files::Info instead of the
fa_files record.
Everything should work now, if everyone thinks that the interface is
ok I will update the test baselines in a bit.
addresses BIT-953, BIT-760
work fine now.
Todo:
* update all baselines
* fix the circular reference to the fa_file structure I introduced :)
Sadly this does not seem to be entirely straightforward.
addresses BIT-953, BIT-760
chain. This means that it is now possible to get information about the
root-certificates that were used to secure a connection.
Intermediate commit before changing the script interface again.
addresses BIT-953, BIT-760
* origin/topic/jsiwek/http-file-id-caching:
Revert use of HTTP file ID caching for gaps range request content.
Extend file analysis API to allow file ID caching, adapt HTTP to use it.
BIT-1125 #merged
This allows an analyzer to either provide file IDs associated with some
file content or to cache a file ID that was already determined by
script-layer logic so that subsequent calls to the file analysis
interface can bypass costly detours through script-layer. This can
yield a decent performance improvement for analyzers that are able to
take advantage of it and deal with streaming content (like HTTP).
If a file is nothing but gaps (e.g. due to missing/dropped packets), Bro
can sometimes detect a file is supposed to have been present and never
saw any of its content, but failed to raise file_over_new_connection
events for it. This was mostly apparent because the tx_hosts/rx_hosts
fields in files.log would not be populated in such cases (but are now
with this change).
- The reassembly behavior can be modified per-file by enabling or
disabling the reassembler and/or modifying the size of the reassembly
buffer.
- Changed the file extraction analyzer to use the stream to avoid
issues with the chunk based approach not immediately triggering
the file_new event due to mime-type detection delay. Early chunks
frequently ended up lost before.
- Generally things are working now and I'd consider this in testing.
- Move more functionality into base class.
- Remove cctors and assignment operators (weren't actually needed anymore)
- Switch from const char* to std::string.
- Enable manager to associate analyzers with a MIME type. With that,
one can now say enable all analyzers for, e.g., "image/gif". This is
exposed to script-land as
Files::add_analyzers_for_mime_type(f: fa_file, mtype: string)
For MIME types identified via libmagic, this happens automatically
(via the file_new() handler in files/main.bro).
- Extend the analyzer API to better match that of protocol analyzers:
- Adding unique analyzer IDs so that we can refer to instances
from script-land.
- Adding subtypes to Components so that a single analyzer
implementation can support different types of analyzers
internally.
- Add an analyzer method SetTag() that allows to set the tag after
construction.
- Adding Init() and Done() methods for consistency with what other
classes offer.
- Add debug logging to the file_analysis stream.
TODO: test cases missing for the new script-land functionality.
Fixed reference to wrong field name.
Added documentation of a function arg.
Added a couple references to other parts of the documentation.
Explained how not specifying extraction filename results in automatic
filename generation.
Several other minor clarifications.
Replaced some with InternalWarning or InternalAnalyzerError, the later
being a new method which signals the analyzer to not process further
input. Some usages I just removed if they didn't make sense or clearly
couldn't happen. Also did some minor refactors of related code while
reviewing/exploring ways to get rid of InternalError usages.
Also, for TCP content file write failures there's a new event:
"contents_file_write_failure".
openssl-parsed string-value), one event for basicconstraints (is a certificate
a CA or not) and one event for subject-alternative-names (only DNS parts).
the ssl-analyzer and the topic/bernhard/x509 branch.
Simply prints information about the encountered certificates (I have
not yet my mind up, what I will log...).
Next step: extensions...
BIT-1054 #merged
* origin/topic/seth/unified2-analyzer:
Fixes in case a packet isn't seen that matches an event.
Finished work on unified2 analyzer.
Fixed some tests.
Working unified2 analyzer.
Unified2 file analyzer updated to new plugin style.
Adding the unified2 analyzer.
Conflicts:
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
Made some class templates for code that seemed duplicated between
file/protocol tags and managers. Seems like it helps a bit and
hopefully can be also be used to transition other things that have
enum value "tags" (e.g. logging writers, input readers) to the
plugin system.
This cleans up internals of how analyzer instances get identified by the
tag plus any args given to it and doesn't change script code a user
would write.