Commit graph

60 commits

Author SHA1 Message Date
Daniel Thayer
01a899255e Convert more redef-able constants to runtime options 2018-08-24 16:05:44 -05:00
Seth Hall
7cb6cf24a6 Functions for retrieving files by their id.
There are two new script level functions to query and lookup files
from the core by their IDs.  These are adding feature parity for
similarly named functions for files.  The function prototypes are
as follows:

  Files::file_exists(fuid: string): bool
  Files::lookup_File(fuid: string): fa_file
2018-01-09 12:16:17 -05:00
Seth Hall
809660d48a Tiny mime-type fix from Dan Caselden. 2017-02-14 07:21:00 -08:00
Seth Hall
645ec39f4b New file types sigs from Keith Lehigh. 2017-01-31 23:33:58 -05:00
Seth Hall
04d41dce5c Tiny xlsx file signature fix.
Thanks to Dan Caselden for noticing!
2016-12-08 08:32:45 -05:00
Seth Hall
15f5deed87 Add a files framework signature for VIM tmp files. 2016-11-02 11:51:38 -04:00
Seth Hall
514dfc3479 Merge remote-tracking branch 'origin/master' into topic/seth/smb
# Conflicts:
#	testing/btest/Baseline/plugins.hooks/output
#	testing/btest/Baseline/scripts.policy.misc.dump-events/all-events.log
#	testing/btest/Baseline/scripts.policy.misc.dump-events/smtp-events.log
2016-06-29 09:43:31 -04:00
Seth Hall
2e9491482f Add ACE archive files to the identified file types.
Addresses BIT-1609.  Thanks Stephen Hosom!
2016-06-14 22:27:09 -04:00
Seth Hall
1fe9e522fb Merge remote-tracking branch 'origin/master' into topic/seth/smb 2016-04-14 21:39:48 -04:00
Seth Hall
9aa9618473 Additional mime types for file identification and a few fixes.
Some of the existing mime types received extended matchers
to fix problems with UTF-16 BOMs.

New file mime types:
 - .ini files
 - MS Registry policy files
 - MS Registry files
 - MS Registry format files (e.g. DESKTOP.DAT)
 - MS Outlook PST files
 - Apple AFPInfo files

Mime type fixes:
 - MP3 files with ID3 tags.
 - JSON and XML matchers were extended
2016-04-14 10:06:58 -04:00
Seth Hall
017fa13393 Fix mime type identification for Windows LNK files. 2016-04-04 15:20:03 -04:00
dmfreemon@users.noreply.github.com
b14b189d12 add support for MIME type video/MP2T
BIT-1457 #merged
2015-08-21 17:32:19 -07:00
Seth Hall
217ccf6063 Add signature support for F4M files. 2015-06-02 12:48:53 -04:00
Robin Sommer
ed91732e09 Merge remote-tracking branch 'origin/topic/seth/more-file-type-ident-fixes'
* origin/topic/seth/more-file-type-ident-fixes:
  File API updates complete.
  Fixes for file type identification.
  API changes to file analysis mime type detection.
  Make HTTP 206 reassembly require ETags by default.
  More file type identification improvements
  Fix an issue with files having gaps before the bof_buffer is filled.
  Fix an issue with packet loss in http file reporting.
  Adding WOFF fonts to file type identification.
  Extended JSON matching and added OCSP responses.
  Another large signature update.
  More signature updates.
  Even more file type ident clean up.
  Lots of fixes for file type identification.

BIT-1368 #merged
2015-04-20 13:31:00 -07:00
Seth Hall
ed375167c8 File API updates complete.
Addresses BIT-1368.
2015-04-20 10:46:48 -04:00
Seth Hall
038e4c24f6 Merge remote-tracking branch 'origin/topic/jsiwek/bit-1368' into topic/seth/more-file-type-ident-fixes
Conflicts:
	src/file_analysis/File.cc
	testing/btest/Baseline/plugins.hooks/output
2015-04-20 09:36:40 -04:00
Seth Hall
faabe8a5e3 Fixes for file type identification.
- Backed out eTag changes.  The real world is more complicated
   than just using eTags to identify the same file.
 - A bit of code simplication in the http base scripts.
 - Test updates (more existing small problems were identified!).
 -
2015-04-20 09:34:09 -04:00
Jon Siwek
a55ce01ef3 API changes to file analysis mime type detection.
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred".  It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply.  The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).

Addresses BIT-1368.
2015-04-10 16:31:29 -05:00
Seth Hall
49926ad7bf Merge remote-tracking branch 'origin/master' into topic/seth/more-file-type-ident-fixes 2015-04-09 23:58:52 -04:00
Seth Hall
e8c87e19bd More file type identification improvements
- Split fonts into their own file.
 - Improved JSON matching.
 - Added XML-RPC content matching using application/xml-rpc
 - Added OCSP requests
2015-04-09 01:23:55 -04:00
Seth Hall
8fd5e7f382 Adding WOFF fonts to file type identification. 2015-04-07 02:06:02 -04:00
Seth Hall
422e558d77 Extended JSON matching and added OCSP responses. 2015-04-07 00:46:10 -04:00
Seth Hall
99061fff4c Another large signature update.
- Lots of cleanup and expansion of XML match types.
   - Signatures for ATOM and RSS (text/atom, text/rss).
   - Improved SOAP signature.
   - Improved text/cross-domain-policy signature
 - Improved and expanded javascript matching a bit.
 - Removed a lot of potentially problematic signatures (performance)
 - Split out more signatures from libmagic.sig
 - Added a signature for matching JSON.  Seems to work ok.
 - Signature for MPEGv4 audio.
 - Expanded java applet signature.
 - Improved PNG matching.
 - Improved MP3 matching.
2015-04-06 23:40:20 -04:00
Seth Hall
6861ecc046 More signature updates. 2015-04-06 17:21:53 -04:00
Jon Siwek
186e67ec1d Allow logging filters to inherit default path from stream.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.

The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.

Adapted the existing Log::create_stream calls to explicitly specify a
path value.

Addresses BIT-1324
2015-03-19 14:49:55 -05:00
Seth Hall
19f498b4a4 Even more file type ident clean up.
- Add detection for ColdFusion scripts.
 - Support detection of XML/HTML with prefixed comment blocks.
2015-03-14 00:25:13 -04:00
Seth Hall
ee3e885712 Lots of fixes for file type identification.
- Plain text now identified with BOMs for UTF8,16,32
   (even though 16 and 32 wouldn't get identified as plain text, oh-well)
 - X.509 certificates are now populating files.log with
   the mime type application/pkix-cert.
 - File signatures are split apart into file types
   to help group and organize signatures a bit better.
 - Normalized some FILE_ANALYSIS debug messages.
 - Improved Javascript detection.
 - Improved HTML detection.
 - Removed a bunch of bad signatures.
 - Merged a bunch of signatures that ultimately detected
   the same mime type.
 - Added detection for MS LNK files.
 - Added detection for cross-domain-policy XML files.
 - Added detection for SOAP envelopes.
2015-03-13 22:14:44 -04:00
Robin Sommer
23b9705a7b Fixing analyzer tag types for some Files::* functions. 2015-02-08 18:23:22 -08:00
Jon Siwek
cbbe7b52dc Review/fix/change file reassembly functionality.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
  type) get updated/set for consistency.

- Add more robust mechanisms for flushing the reassembly buffer.
  The goal being to report all gaps and deliveries to file analyzers
  regardless of the state of the reassembly buffer at the time it has to
  be flushed.
2014-12-16 14:05:15 -06:00
Seth Hall
e879aa78f5 Merge remote-tracking branch 'origin/topic/seth/mime-updates' into topic/seth/files-reassembly-and-mime-updates
Conflicts:
	scripts/base/init-bare.bro
	testing/btest/Baseline/scripts.policy.misc.dump-events/all-events-no-args.log
	testing/btest/Baseline/scripts.policy.misc.dump-events/all-events.log
2014-11-05 11:42:34 -05:00
Seth Hall
842dfd8b4a Merge remote-tracking branch 'origin/topic/seth/files-tracking' into topic/seth/files-reassembly-and-mime-updates
Conflicts:
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.http.multipart/out
	testing/btest/Baseline/scripts.policy.misc.dump-events/all-events.log
2014-11-05 11:40:26 -05:00
Seth Hall
7ee34981aa Improve TAR file detection and other small changes.
- Remove all of the x-c detections.  Nearly all false
    positives.

  - Remove the back up TAR detections.  Not very helpful.

  - Remove one of the x-elc detections that was too loose
    and caused many false positives.
2014-11-05 11:31:48 -05:00
Seth Hall
d77243823f Updates for file mime type identification.
- Change to the default BOF buffer size to 3000 (was 1024).
 - Reorganized MS signatures into a separate file
 - Improved lots of the signatures and added new ones.
2014-10-08 02:12:10 -04:00
Seth Hall
80656d5294 Improves shockwave flash file signatures.
- This moves the signatures out of the libmagic imported signatures
   and into our own general.sig.

 - Expand the detection to LZMA compressed flash files.
2014-10-06 11:13:13 -04:00
Seth Hall
cafd35e746 Updates the files event api and brings file reassembly up to master. 2014-09-26 00:40:37 -04:00
Seth Hall
42b2d56279 Merge remote-tracking branch 'origin/master' into topic/seth/files-tracking
Conflicts:
	scripts/base/frameworks/files/main.bro
	src/file_analysis/File.cc
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.actions.data_event/out
2014-09-23 13:05:39 -04:00
Daniel Thayer
d226fef723 Fixed some "make doc" warnings caused by reST formatting 2014-09-16 12:44:51 -05:00
Jon Siwek
69b1ba653d Minor adjustments to plugin code/docs.
Mostly whitespace/typos.
Moved some Plugin methods out from public access.
2014-07-30 16:48:23 -05:00
Robin Sommer
c9524757d2 Adding Files::register_for_mime_type() to associate a file analyzer
with a MIME type.

Whenever that MIME is detected, Bro will now automatically activate
the analyzer. The interface mimics how well-known ports are defined
for protocol analyzers.

This isn't actually used by any existing file analyzer (because we
don't have any yet that target a specific file format), but there's a
test making sure it works.
2014-07-21 16:31:22 +02:00
Robin Sommer
9616cd8e61 Further polishing and cleanup in preparation for merge. 2014-07-12 18:12:09 -07:00
Seth Hall
8d9940c8c3 Merge remote-tracking branch 'origin/master' into topic/seth/files-tracking
Conflicts:
	src/Reassem.cc
	src/Reassem.h
	src/analyzer/protocol/tcp/TCP_Reassembler.cc
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.bifs.set_timeout_interval/bro..stdout
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.http.partial-content/b.out
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.http.partial-content/c.out
	testing/btest/Baseline/scripts.base.frameworks.file-analysis.logging/files.log
2014-05-27 10:56:11 -04:00
Robin Sommer
bbd409d274 Merge remote-tracking branch 'origin/master' into topic/robin/dynamic-plugins-2.3
(Never good to name a branch after version anticipated to include it ...)
2014-05-14 16:23:04 -07:00
Robin Sommer
9efb549236 Merge remote-tracking branch 'origin/topic/jsiwek/file-signatures'
* origin/topic/jsiwek/file-signatures:
  File type detection changes and fix https.log {orig,resp}_fuids fields.
  Various minor changes related to file mime type detection.
  Refactor common MIME magic matching code.
  Replace libmagic w/ Bro signatures for file MIME type identification.

Conflicts:
	scripts/base/init-default.bro
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log

BIT-1143 #merged
2014-03-30 22:51:05 +02:00
Jon Siwek
8dad5026fd File type detection changes and fix https.log {orig,resp}_fuids fields.
- Removed "binary" and "octet-stream" mime type detections. They don't
  provide any more information than an uninitialized mime_type field
  which implicitly means no magic signature matches and so the media
  type is unknown to Bro.

- Slight change to "text/plain" signature.  It's still not the most
  accurate, which is reflected in its -20 strength value.

- The logic for adding file ids to {orig,resp}_fuids fields of
  the http.log incorrectly depended on the state of
  {orig,resp}_mime_types fields, so sometimes not all file ids
  associated w/ the session were logged.
2014-03-25 12:44:11 -05:00
Jon Siwek
b1fd161274 Improve type checking of records, addresses BIT-1159. 2014-03-20 13:54:26 -05:00
Jon Siwek
095a68b2ec Various minor changes related to file mime type detection.
- Improve or just remove some file magic signatures ported from libmagic
  that were too general and matched incorrectly too often.

- Fix MHR script's use of fa_file$mime_type before checking if it's
  initialized.  It may be uninitialized if no signatures match.

- The "fa_file" record now contains a "mime_types" field that contains
  all magic signatures that matched the file content (where the
  "mime_type" field is just a shortcut for the strongest match).
2014-03-06 11:41:10 -06:00
Jon Siwek
b22ca5d0a3 Replace libmagic w/ Bro signatures for file MIME type identification.
Notable changes:

- libmagic is no longer used at all.  All MIME type detection is
  done through new Bro signatures, and there's no longer a means to get
  verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
  The majority of the default file magic signatures are derived
  from the default magic database of libmagic ~5.17.

- File magic signatures consist of two new constructs in the
  signature rule parsing grammar: "file-magic" gives a regular
  expression to match against, and "file-mime" gives the MIME type
  string of content that matches the magic and an optional strength
  value for the match.

- Modified signature/rule syntax for identifiers: they can no longer
  start with a '-', which made for ambiguous syntax when doing negative
  strength values in "file-mime".  Also brought syntax for Bro script
  identifiers in line with reality (they can't start with numbers or
  include '-' at all).

- A new Built-In Function, "file_magic", can be used to get all
  file magic matches and their corresponding strength against a given
  chunk of data

- The second parameter of the "identify_data" Built-In Function
  can no longer be used to get verbose file type descriptions, though it
  can still be used to get the strongest matching file magic signature.

- The "file_transferred" event's "descr" parameter no longer
  contains verbose file type descriptions.

- The BROMAGIC environment variable no longer changes any behavior
  in Bro as magic databases are no longer used/installed.

- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
  (it's back to being the same requirement as the Bro v2.2 release).
  The bump was to accomodate building libmagic as an external project,
  which is no longer needed.

Addresses BIT-1143.
2014-03-04 11:12:06 -06:00
Seth Hall
38dbba7622 More file reassembly work.
- The reassembly behavior can be modified per-file by enabling or
   disabling the reassembler and/or modifying the size of the reassembly
   buffer.

 - Changed the file extraction analyzer to use the stream to avoid
   issues with the chunk based approach not immediately triggering
   the file_new event due to mime-type detection delay.  Early chunks
   frequently ended up lost before.

 - Generally things are working now and I'd consider this in testing.
2014-01-05 04:58:01 -05:00
Robin Sommer
d34f23c8d4 A set of file analysis extensions.
- Enable manager to associate analyzers with a MIME type. With that,
  one can now say enable all analyzers for, e.g., "image/gif". This is
  exposed to script-land as

    Files::add_analyzers_for_mime_type(f: fa_file, mtype: string)

  For MIME types identified via libmagic, this happens automatically
  (via the file_new() handler in files/main.bro).

- Extend the analyzer API to better match that of protocol analyzers:

    - Adding unique analyzer IDs so that we can refer to instances
      from script-land.

    - Adding subtypes to Components so that a single analyzer
      implementation can support different types of analyzers
      internally.

    - Add an analyzer method SetTag() that allows to set the tag after
      construction.

    - Adding Init() and Done() methods for consistency with what other
      classes offer.

- Add debug logging to the file_analysis stream.

TODO: test cases missing for the new script-land functionality.
2013-11-26 11:20:14 -08:00
Daniel Thayer
b5af589246 Improvements to file analysis docs
Fixed reference to wrong field name.
Added documentation of a function arg.
Added a couple references to other parts of the documentation.
Explained how not specifying extraction filename results in automatic
filename generation.
Several other minor clarifications.
2013-10-11 16:31:53 -05:00