There are two new script level functions to query and lookup files
from the core by their IDs. These are adding feature parity for
similarly named functions for files. The function prototypes are
as follows:
Files::file_exists(fuid: string): bool
Files::lookup_File(fuid: string): fa_file
Closes#1830.
* origin/topic/johanna/ocsp-sct-validate: (82 commits)
Tiny script changes for SSL.
Update CT Log list
SSL: Update OCSP/SCT scripts and documentation.
Revert "add parameter 'status_type' to event ssl_stapled_ocsp"
Revert "parse multiple OCSP stapling responses"
SCT: Fix script error when mime type of file unknown.
SCT: another memory leak in SCT parsing.
SCT validation: fix small memory leak (public keys were not freed)
Change end-of-connection handling for validation
OCSP/TLS/SCT: Fix a number of test failures.
SCT Validate: make caching a bit less aggressive.
SSL: Fix type of ssl validation result
TLS-SCT: compile on old versions of OpenSSL (1.0.1...)
SCT: Add caching support for validation
SCT: Add signed certificate timestamp validation script.
SCT: Allow verification of SCTs in Certs.
SCT: only compare correct OID/NID for Cert/OCSP.
SCT: add validation of proofs for extensions and OCSP.
SCT: pass timestamp as uint64 instead of time
Add CT log information to Bro
...
This makes it much easier for protocols where the mime type is known in
advance like, for example, TLS. We now do no longer have to perform deep
script-level magic.
Broke out the stats collection into a bunch of new Bifs
in stats.bif. Scripts that use stats collection functions
have also been updated. More work to do.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
type) get updated/set for consistency.
- Add more robust mechanisms for flushing the reassembly buffer.
The goal being to report all gaps and deliveries to file analyzers
regardless of the state of the reassembly buffer at the time it has to
be flushed.
* origin/topic/jsiwek/file-signatures:
File type detection changes and fix https.log {orig,resp}_fuids fields.
Various minor changes related to file mime type detection.
Refactor common MIME magic matching code.
Replace libmagic w/ Bro signatures for file MIME type identification.
Conflicts:
scripts/base/init-default.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
BIT-1143 #merged
Add parsing of several more types to SAN extension.
Make error messages of x509 file analyzer more useful.
Fix file ID generation.
You apparently have to be very careful which EndOfFile function of
the file analysis framework you call... otherwhise it might try
to close another file id. This took me quite a while to find.
addresses BIT-953, BIT-760, BIT-1150
- Improve or just remove some file magic signatures ported from libmagic
that were too general and matched incorrectly too often.
- Fix MHR script's use of fa_file$mime_type before checking if it's
initialized. It may be uninitialized if no signatures match.
- The "fa_file" record now contains a "mime_types" field that contains
all magic signatures that matched the file content (where the
"mime_type" field is just a shortcut for the strongest match).
Put some methods in file_analysis::Manager that can perform the
matching process and return MIME type results. Also helps to
centralize the management/re-use of a signature matcher object.
* origin/topic/jsiwek/http-file-id-caching:
Revert use of HTTP file ID caching for gaps range request content.
Extend file analysis API to allow file ID caching, adapt HTTP to use it.
BIT-1125 #merged
This allows an analyzer to either provide file IDs associated with some
file content or to cache a file ID that was already determined by
script-layer logic so that subsequent calls to the file analysis
interface can bypass costly detours through script-layer. This can
yield a decent performance improvement for analyzers that are able to
take advantage of it and deal with streaming content (like HTTP).
- The reassembly behavior can be modified per-file by enabling or
disabling the reassembler and/or modifying the size of the reassembly
buffer.
- Changed the file extraction analyzer to use the stream to avoid
issues with the chunk based approach not immediately triggering
the file_new event due to mime-type detection delay. Early chunks
frequently ended up lost before.
- Generally things are working now and I'd consider this in testing.
- Enable manager to associate analyzers with a MIME type. With that,
one can now say enable all analyzers for, e.g., "image/gif". This is
exposed to script-land as
Files::add_analyzers_for_mime_type(f: fa_file, mtype: string)
For MIME types identified via libmagic, this happens automatically
(via the file_new() handler in files/main.bro).
- Extend the analyzer API to better match that of protocol analyzers:
- Adding unique analyzer IDs so that we can refer to instances
from script-land.
- Adding subtypes to Components so that a single analyzer
implementation can support different types of analyzers
internally.
- Add an analyzer method SetTag() that allows to set the tag after
construction.
- Adding Init() and Done() methods for consistency with what other
classes offer.
- Add debug logging to the file_analysis stream.
TODO: test cases missing for the new script-land functionality.
Made some class templates for code that seemed duplicated between
file/protocol tags and managers. Seems like it helps a bit and
hopefully can be also be used to transition other things that have
enum value "tags" (e.g. logging writers, input readers) to the
plugin system.
This cleans up internals of how analyzer instances get identified by the
tag plus any args given to it and doesn't change script code a user
would write.
- Fix examples/references in the file analysis how-to/usage doc.
- Add Broxygen-generated docs for file analyzer plugins.
- Break FTP::Info type declaration out in to its own file to get
rid of some circular dependencies (between s/b/p/ftp/main and
s/b/p/ftp/utils).
- Remove script-layer data input interface (will be managed directly
by input framework later).
- Only track files internally by file id hash. Chance of collision
too small to justify also tracking unique file string.
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).
Conflicts:
cmake
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/base/protocols/ftp/main.bro
scripts/base/protocols/irc/dcc-send.bro
scripts/test-all-policy.bro
src/AnalyzerTags.h
src/CMakeLists.txt
src/analyzer/Analyzer.cc
src/analyzer/protocol/file/File.cc
src/analyzer/protocol/file/File.h
src/analyzer/protocol/http/HTTP.cc
src/analyzer/protocol/http/HTTP.h
src/analyzer/protocol/mime/MIME.cc
src/event.bif
src/main.cc
src/util-config.h.in
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/istate.events-ssl/receiver.http.log
testing/btest/Baseline/istate.events-ssl/sender.http.log
testing/btest/Baseline/istate.events/receiver.http.log
testing/btest/Baseline/istate.events/sender.http.log
And added an event called "event_queue_flush_point" to mark where that
occured in the event stream. The FAF now uses an explicit event queue
flush instead of buffering input in order to wait for a file handle to
be returned from script-layer.
- FileAnalysis::Info is now just a record used for logging, the fa_file
record type is defined in init-bare.bro as the analogue to a
connection record.
- Starting to transfer policy hook triggers and analyzer results to
events.
This reverts commit fc267d010d.
There were some diffs caused by this in external test suites I'm
unsure about, I'm going to go over optimizations more closely in
a different branch.
When a file handle is needed and the last event in the queue is also
a get_file_handle event with the same arguments, instead of queueing
a new event, just remember to cache/re-use the resulting handle from
the previous event. This depends on get_file_handle handlers not
changing global state that is also used to derive the file handle
string.
Versus from synchronous function calls, which doesn't work well because
the function call can see a script-layer state that doesn't reflect
the state as it will be in terms of the event/network stream.