Commit graph

58 commits

Author SHA1 Message Date
Johanna Amann
d5678418da SSL SCT/OCSP: small fixes by robin; mostly update comments.
SetMime now only works on the first call (as it was documented) and
unused code was used from one of the x.509 functions.
2017-08-01 16:30:08 -07:00
Johanna Amann
9fd7816501 Allow File analyzers to direcly pass mime type.
This makes it much easier for protocols where the mime type is known in
advance like, for example, TLS. We now do no longer have to perform deep
script-level magic.
2017-02-10 17:03:33 -08:00
Johanna Amann
1de6cfc2e3 Fix memory leak in file analyzer.
This undoes the changes applied in merge 9db27a6d60
and goes back to the state in the branch as of the merge 5ab3b86.

Getting rid of the additional layer of removing analyzers and just
keeping them in the set introduced subtle differences in behavior since
a few calls were still passed along. Skipping all of these with SetSkip
introduced yet other subtle behavioral differences.
2017-02-04 16:47:07 -08:00
Johanna Amann
9db27a6d60 Merge remote-tracking branch 'origin/topic/robin/file-analysis-fixes'
* origin/topic/robin/file-analysis-fixes:
  Adding test with command line that used to trigger a crash.
  Cleaning up a couple of comments.
  Fix delay in disabling file analyzers.
  Fix file analyzer memory management.

The merge changes around functionality a bit again - instead of having
a list of done analyzers, analyzers are simply set to skipping when they
are removed, and cleaned up later on destruction of the AnalyzerSet.

BIT-1782 #merged
2017-02-01 14:20:14 -08:00
Robin Sommer
fead5f5d5e Fix delay in disabling file analyzers.
When a file analyzer signaled being done with data delivery, the
analyzer would only be scheduled for removal at that poing, meaning it
could still receive more data until that action actually took effect.
Now we make sure to not send any more data to an analyzer.
2017-01-28 13:24:13 -08:00
Robin Sommer
3ce6a031d4 Fix file analyzer memory management.
File analyzers got deleted immediately once the queue with the
corresponding removal operation got drained. That however can happen
while the analyzer is still doing stuff: the queue is drained whenever
any the "special" file analysis events needing immediate attention has
been executed. This fix now only schedules the analyzer for deletion
at that time, but postpones the actual operation until file object
itself is being destroyed.
2017-01-28 13:07:51 -08:00
Johanna Amann
c464cf78dd Fix a number of format errors when using debug macros. 2016-08-12 15:42:02 -07:00
Johanna Amann
2756dfe581 Make x509 intel seen script robust against file analyzer ordering.
Now it consistently works, even if the SHA1 file analyzer gets the data
before the X509 file analyzer.
2016-08-11 16:12:08 -07:00
Robin Sommer
ed91732e09 Merge remote-tracking branch 'origin/topic/seth/more-file-type-ident-fixes'
* origin/topic/seth/more-file-type-ident-fixes:
  File API updates complete.
  Fixes for file type identification.
  API changes to file analysis mime type detection.
  Make HTTP 206 reassembly require ETags by default.
  More file type identification improvements
  Fix an issue with files having gaps before the bof_buffer is filled.
  Fix an issue with packet loss in http file reporting.
  Adding WOFF fonts to file type identification.
  Extended JSON matching and added OCSP responses.
  Another large signature update.
  More signature updates.
  Even more file type ident clean up.
  Lots of fixes for file type identification.

BIT-1368 #merged
2015-04-20 13:31:00 -07:00
Seth Hall
ed375167c8 File API updates complete.
Addresses BIT-1368.
2015-04-20 10:46:48 -04:00
Seth Hall
038e4c24f6 Merge remote-tracking branch 'origin/topic/jsiwek/bit-1368' into topic/seth/more-file-type-ident-fixes
Conflicts:
	src/file_analysis/File.cc
	testing/btest/Baseline/plugins.hooks/output
2015-04-20 09:36:40 -04:00
Jon Siwek
a55ce01ef3 API changes to file analysis mime type detection.
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred".  It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply.  The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).

Addresses BIT-1368.
2015-04-10 16:31:29 -05:00
Seth Hall
6162d986a2 Fix an issue with files having gaps before the bof_buffer is filled.
When files had gaps prior to the bof_buffer completely filling, the
file gap handling code was never sniffing and passing along as much
data as possible so file type identification wasn't working correctly.
2015-04-08 13:41:03 -04:00
Seth Hall
ee3e885712 Lots of fixes for file type identification.
- Plain text now identified with BOMs for UTF8,16,32
   (even though 16 and 32 wouldn't get identified as plain text, oh-well)
 - X.509 certificates are now populating files.log with
   the mime type application/pkix-cert.
 - File signatures are split apart into file types
   to help group and organize signatures a bit better.
 - Normalized some FILE_ANALYSIS debug messages.
 - Improved Javascript detection.
 - Improved HTML detection.
 - Removed a bunch of bad signatures.
 - Merged a bunch of signatures that ultimately detected
   the same mime type.
 - Added detection for MS LNK files.
 - Added detection for cross-domain-policy XML files.
 - Added detection for SOAP envelopes.
2015-03-13 22:14:44 -04:00
Jon Siwek
1012539ded Merge branch 'topic/seth/small-files-bof-handling-fix'
* topic/seth/small-files-bof-handling-fix:
  Fix a bug in the core files framework with handling the BOF buffer.

BIT-1310 #merged
2015-02-05 10:10:00 -06:00
Seth Hall
a97cd1f3a2 Fix a bug in the core files framework with handling the BOF buffer.
- Any files where the total size was below the size of the
   default bof_buffer size couldn't have stream analyzers successfully
   attached because the bof_buffer never reached the full size
   and was never flushed.  This branch explicitly marks the buf_buffer
   as full and flushes it when the file is being removed.
2015-02-05 09:09:08 -05:00
Jon Siwek
6941538f81 Fix reference counting bug in refactored file reassembly code. 2014-12-16 20:58:27 -06:00
Jon Siwek
cbbe7b52dc Review/fix/change file reassembly functionality.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
  type) get updated/set for consistency.

- Add more robust mechanisms for flushing the reassembly buffer.
  The goal being to report all gaps and deliveries to file analyzers
  regardless of the state of the reassembly buffer at the time it has to
  be flushed.
2014-12-16 14:05:15 -06:00
Seth Hall
cafd35e746 Updates the files event api and brings file reassembly up to master. 2014-09-26 00:40:37 -04:00
Seth Hall
42b2d56279 Merge remote-tracking branch 'origin/master' into topic/seth/files-tracking
Conflicts:
	scripts/base/frameworks/files/main.bro
	src/file_analysis/File.cc
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.actions.data_event/out
2014-09-23 13:05:39 -04:00
Jon Siwek
7a46a70b77 BIT-1240: Fix MIME entity file data/gap ordering.
MIME entities buffered data and passed it along to protocol analyzers in
discrete amounts, but a gap is always passed along right away, so the
ordering of these "events" can cause incorrect file analysis.  The
change here is to never leave any MIME data buffered -- it should now be
passed along line by line as it is seen, but may still temporarily make
use of a buffer allocated by the analyzer as it works on decoding
content.
2014-09-08 18:04:03 -05:00
Seth Hall
8d9940c8c3 Merge remote-tracking branch 'origin/master' into topic/seth/files-tracking
Conflicts:
	src/Reassem.cc
	src/Reassem.h
	src/analyzer/protocol/tcp/TCP_Reassembler.cc
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.bifs.set_timeout_interval/bro..stdout
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.http.partial-content/b.out
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.http.partial-content/c.out
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.logging/files.log
2014-05-27 10:56:11 -04:00
Robin Sommer
bbd409d274 Merge remote-tracking branch 'origin/master' into topic/robin/dynamic-plugins-2.3
(Never good to name a branch after version anticipated to include it ...)
2014-05-14 16:23:04 -07:00
Jon Siwek
8126f06ffb Enforce data size limit when checking files for MIME matches.
The value of *bof_buffer_size* in the *fa_file* record was supposed to
always limit the amount of data used by the signature matching engine,
but some corner cases would cause matching to be performed on data
beyond that.
2014-04-21 16:51:45 -05:00
Robin Sommer
9efb549236 Merge remote-tracking branch 'origin/topic/jsiwek/file-signatures'
* origin/topic/jsiwek/file-signatures:
  File type detection changes and fix https.log {orig,resp}_fuids fields.
  Various minor changes related to file mime type detection.
  Refactor common MIME magic matching code.
  Replace libmagic w/ Bro signatures for file MIME type identification.

Conflicts:
	scripts/base/init-default.bro
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log

BIT-1143 #merged
2014-03-30 22:51:05 +02:00
Jon Siwek
095a68b2ec Various minor changes related to file mime type detection.
- Improve or just remove some file magic signatures ported from libmagic
  that were too general and matched incorrectly too often.

- Fix MHR script's use of fa_file$mime_type before checking if it's
  initialized.  It may be uninitialized if no signatures match.

- The "fa_file" record now contains a "mime_types" field that contains
  all magic signatures that matched the file content (where the
  "mime_type" field is just a shortcut for the strongest match).
2014-03-06 11:41:10 -06:00
Jon Siwek
0865b152bb Refactor common MIME magic matching code.
Put some methods in file_analysis::Manager that can perform the
matching process and return MIME type results.  Also helps to
centralize the management/re-use of a signature matcher object.
2014-03-05 10:49:57 -06:00
Jon Siwek
b22ca5d0a3 Replace libmagic w/ Bro signatures for file MIME type identification.
Notable changes:

- libmagic is no longer used at all.  All MIME type detection is
  done through new Bro signatures, and there's no longer a means to get
  verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
  The majority of the default file magic signatures are derived
  from the default magic database of libmagic ~5.17.

- File magic signatures consist of two new constructs in the
  signature rule parsing grammar: "file-magic" gives a regular
  expression to match against, and "file-mime" gives the MIME type
  string of content that matches the magic and an optional strength
  value for the match.

- Modified signature/rule syntax for identifiers: they can no longer
  start with a '-', which made for ambiguous syntax when doing negative
  strength values in "file-mime".  Also brought syntax for Bro script
  identifiers in line with reality (they can't start with numbers or
  include '-' at all).

- A new Built-In Function, "file_magic", can be used to get all
  file magic matches and their corresponding strength against a given
  chunk of data

- The second parameter of the "identify_data" Built-In Function
  can no longer be used to get verbose file type descriptions, though it
  can still be used to get the strongest matching file magic signature.

- The "file_transferred" event's "descr" parameter no longer
  contains verbose file type descriptions.

- The BROMAGIC environment variable no longer changes any behavior
  in Bro as magic databases are no longer used/installed.

- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
  (it's back to being the same requirement as the Bro v2.2 release).
  The bump was to accomodate building libmagic as an external project,
  which is no longer needed.

Addresses BIT-1143.
2014-03-04 11:12:06 -06:00
Jon Siwek
e09763e061 Fix file_over_new_connection event to trigger when entire file is missed.
If a file is nothing but gaps (e.g. due to missing/dropped packets), Bro
can sometimes detect a file is supposed to have been present and never
saw any of its content, but failed to raise file_over_new_connection
events for it.  This was mostly apparent because the tx_hosts/rx_hosts
fields in files.log would not be populated in such cases (but are now
with this change).
2014-01-24 16:47:00 -06:00
Seth Hall
38dbba7622 More file reassembly work.
- The reassembly behavior can be modified per-file by enabling or
   disabling the reassembler and/or modifying the size of the reassembly
   buffer.

 - Changed the file extraction analyzer to use the stream to avoid
   issues with the chunk based approach not immediately triggering
   the file_new event due to mime-type detection delay.  Early chunks
   frequently ended up lost before.

 - Generally things are working now and I'd consider this in testing.
2014-01-05 04:58:01 -05:00
Seth Hall
0b78f444a1 Initial commit of file reassembly. 2013-12-20 00:05:08 -05:00
Robin Sommer
987452beff Cleanup of plugin component API.
- Move more functionality into base class.
- Remove cctors and assignment operators (weren't actually needed anymore)
- Switch from const char* to std::string.
2013-12-16 10:07:20 -08:00
Robin Sommer
d34f23c8d4 A set of file analysis extensions.
- Enable manager to associate analyzers with a MIME type. With that,
  one can now say enable all analyzers for, e.g., "image/gif". This is
  exposed to script-land as

    Files::add_analyzers_for_mime_type(f: fa_file, mtype: string)

  For MIME types identified via libmagic, this happens automatically
  (via the file_new() handler in files/main.bro).

- Extend the analyzer API to better match that of protocol analyzers:

    - Adding unique analyzer IDs so that we can refer to instances
      from script-land.

    - Adding subtypes to Components so that a single analyzer
      implementation can support different types of analyzers
      internally.

    - Add an analyzer method SetTag() that allows to set the tag after
      construction.

    - Adding Init() and Done() methods for consistency with what other
      classes offer.

- Add debug logging to the file_analysis stream.

TODO: test cases missing for the new script-land functionality.
2013-11-26 11:20:14 -08:00
Jon Siwek
89ae4ffd05 Add options to limit extracted file sizes w/ 100MB default. 2013-08-22 16:37:58 -05:00
Jon Siwek
99c89b42d7 Internal refactoring of how plugin components are tagged/managed.
Made some class templates for code that seemed duplicated between
file/protocol tags and managers.  Seems like it helps a bit and
hopefully can be also be used to transition other things that have
enum value "tags" (e.g. logging writers, input readers) to the
plugin system.
2013-08-01 10:35:47 -05:00
Jon Siwek
9bd7a65071 Merge branch 'master' into topic/jsiwek/faf-updates
Conflicts:
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
2013-07-31 10:05:36 -05:00
Jon Siwek
5fa9c5865b Factor out the need for a tag field in Files::AnalyzerArgs record.
This cleans up internals of how analyzer instances get identified by the
tag plus any args given to it and doesn't change script code a user
would write.
2013-07-31 09:48:19 -05:00
Robin Sommer
984e9793db Merge remote-tracking branch 'origin/topic/seth/faf-updates'
* origin/topic/seth/faf-updates: (27 commits)
  Undoing the FTP tests I updated earlier.
  Update the last two btest FAF tests.
  File analysis fixes and test updates.
  Fix a bug with getting analyzer tags.
  A few test updates.
  Some tests work now (at least they all don't fail anymore!)
  Forgot a file.
  Added protocol description functions that provide a super compressed log representation.
  Fix a bug where orig file information in http wasn't working right.
  Added mime types to http.log
  Clean up queued but unused file_over_new_connections event args.
  Add jar files to the default MHR lookups.
  Adding CAB files for MHR checking.
  Improve malware hash registry script.
  Fix a small issue with finding smtp entities.
  Added support for files to the notice framework.
  Make the custom libmagic database a git submodule.
  Add an is_orig parameter to file_over_new_connection event.
  Make magic for emitting application/msword mime type less strict.
  Disable more libmagic builtin checks that override the magic database.
  ...

Conflicts:
	doc/scripts/DocSourcesList.cmake
	scripts/base/init-bare.bro
	scripts/test-all-policy.bro
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
2013-07-29 14:21:52 -07:00
Jon Siwek
1a60fae41c Clean up queued but unused file_over_new_connections event args. 2013-07-11 11:36:49 -05:00
Jon Siwek
73155c321b Add an is_orig parameter to file_over_new_connection event. 2013-07-09 15:58:28 -05:00
Jon Siwek
6a5b825058 Delay file_over_new_connection events until after file_new occurs. 2013-07-09 14:25:41 -05:00
Seth Hall
58d133e764 Merge remote-tracking branch 'origin/master' into topic/seth/faf-updates
Conflicts:
	scripts/base/frameworks/files/main.bro
	scripts/base/init-bare.bro
	scripts/base/protocols/ftp/file-analysis.bro
	scripts/base/protocols/http/file-analysis.bro
	scripts/base/protocols/irc/file-analysis.bro
	scripts/base/protocols/smtp/file-analysis.bro
	src/const.bif
	src/event.bif
	src/file_analysis/Analyzer.h
	src/file_analysis/file_analysis.bif
2013-07-05 02:13:27 -04:00
Seth Hall
2b48396d23 Check file_over_new_connetion to fire for each connection (including the first). 2013-07-05 02:00:35 -04:00
Jon Siwek
f2574636b6 Merge branch 'master' into topic/jsiwek/faf-cleanup
Conflicts:
	scripts/base/protocols/ftp/file-analysis.bro
	scripts/base/protocols/http/file-analysis.bro
	scripts/base/protocols/irc/file-analysis.bro
	scripts/base/protocols/smtp/file-analysis.bro
	src/file_analysis/File.cc
	src/file_analysis/File.h
	src/file_analysis/Manager.cc
	src/file_analysis/Manager.h
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.logging/file_analysis.log
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-0.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-1.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-2.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-3.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-BTsa70Ua9x7-1.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-BTsa70Ua9x7.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-Rqjkzoroau4-0.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-Rqjkzoroau4.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-VLQvJybrm38-2.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-VLQvJybrm38.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-zrfwSs9K1yk-3.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp-item-zrfwSs9K1yk.dat
	testing/btest/Baseline/scripts.base.protocols.ftp.ftp-extract/ftp.log
	testing/btest/Baseline/scripts.base.protocols.http.http-extract-files/http-item-BFymS6bFgT3-0.dat
	testing/btest/Baseline/scripts.base.protocols.http.http-extract-files/http-item-BFymS6bFgT3.dat
	testing/btest/Baseline/scripts.base.protocols.http.http-extract-files/http-item.dat
	testing/btest/Baseline/scripts.base.protocols.http.http-extract-files/http.log
	testing/btest/Baseline/scripts.base.protocols.irc.dcc-extract/irc-dcc-item-wqKMAamJVSb-0.dat
	testing/btest/Baseline/scripts.base.protocols.irc.dcc-extract/irc-dcc-item-wqKMAamJVSb.dat
	testing/btest/Baseline/scripts.base.protocols.irc.dcc-extract/irc-dcc-item.dat
	testing/btest/Baseline/scripts.base.protocols.irc.dcc-extract/irc.log
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-0.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-1.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-Ltd7QO7jEv3-1.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-Ltd7QO7jEv3.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-cwR7l6Zctxb-0.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp-entity-cwR7l6Zctxb.dat
	testing/btest/Baseline/scripts.base.protocols.smtp.mime-extract/smtp_entities.log
	testing/btest/scripts/base/protocols/ftp/ftp-extract.bro
	testing/btest/scripts/base/protocols/http/http-extract-files.bro
	testing/btest/scripts/base/protocols/irc/dcc-extract.test
	testing/btest/scripts/base/protocols/smtp/mime-extract.test
2013-06-07 15:44:36 -05:00
Jon Siwek
0ef074594d Add input interface to forward data for file analysis.
The new Input::add_analysis function is used to automatically forward
input data on to the file analysis framework.
2013-05-21 10:29:22 -05:00
Jon Siwek
90fa331279 File analysis framework interface simplifications.
- Remove script-layer data input interface (will be managed directly
  by input framework later).

- Only track files internally by file id hash.  Chance of collision
  too small to justify also tracking unique file string.
2013-05-20 12:02:48 -05:00
Robin Sommer
eb637f9f3e Merge remote-tracking branch 'origin/master' into topic/robin/plugins
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).

Conflicts:
	cmake
	doc/scripts/DocSourcesList.cmake
	scripts/base/init-bare.bro
	scripts/base/protocols/ftp/main.bro
	scripts/base/protocols/irc/dcc-send.bro
	scripts/test-all-policy.bro
	src/AnalyzerTags.h
	src/CMakeLists.txt
	src/analyzer/Analyzer.cc
	src/analyzer/protocol/file/File.cc
	src/analyzer/protocol/file/File.h
	src/analyzer/protocol/http/HTTP.cc
	src/analyzer/protocol/http/HTTP.h
	src/analyzer/protocol/mime/MIME.cc
	src/event.bif
	src/main.cc
	src/util-config.h.in
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/istate.events-ssl/receiver.http.log
	testing/btest/Baseline/istate.events-ssl/sender.http.log
	testing/btest/Baseline/istate.events/receiver.http.log
	testing/btest/Baseline/istate.events/sender.http.log
2013-05-16 17:58:48 -07:00
Robin Sommer
7610aa31b6 Various smalle tweaks in preparation for merging. 2013-05-13 16:47:00 -07:00
Jon Siwek
0141f51801 FileAnalysis: load custom mime magic database just once.
This works around a bug in libmagic since version 5.12 (current at
time of writing is 5.14) -- second call to magic_load() w/ non-default
database segfaults.
2013-04-29 12:49:22 -05:00
Jon Siwek
f07760ba00 FileAnalysis: add is_orig field to fa_file & Info. 2013-04-23 10:50:43 -05:00