Commit graph

4 commits

Author SHA1 Message Date
Robin Sommer
9efb549236 Merge remote-tracking branch 'origin/topic/jsiwek/file-signatures'
* origin/topic/jsiwek/file-signatures:
  File type detection changes and fix https.log {orig,resp}_fuids fields.
  Various minor changes related to file mime type detection.
  Refactor common MIME magic matching code.
  Replace libmagic w/ Bro signatures for file MIME type identification.

Conflicts:
	scripts/base/init-default.bro
	testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
	testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log

BIT-1143 #merged
2014-03-30 22:51:05 +02:00
Jon Siwek
8dad5026fd File type detection changes and fix https.log {orig,resp}_fuids fields.
- Removed "binary" and "octet-stream" mime type detections. They don't
  provide any more information than an uninitialized mime_type field
  which implicitly means no magic signature matches and so the media
  type is unknown to Bro.

- Slight change to "text/plain" signature.  It's still not the most
  accurate, which is reflected in its -20 strength value.

- The logic for adding file ids to {orig,resp}_fuids fields of
  the http.log incorrectly depended on the state of
  {orig,resp}_mime_types fields, so sometimes not all file ids
  associated w/ the session were logged.
2014-03-25 12:44:11 -05:00
Jon Siwek
095a68b2ec Various minor changes related to file mime type detection.
- Improve or just remove some file magic signatures ported from libmagic
  that were too general and matched incorrectly too often.

- Fix MHR script's use of fa_file$mime_type before checking if it's
  initialized.  It may be uninitialized if no signatures match.

- The "fa_file" record now contains a "mime_types" field that contains
  all magic signatures that matched the file content (where the
  "mime_type" field is just a shortcut for the strongest match).
2014-03-06 11:41:10 -06:00
Jon Siwek
b22ca5d0a3 Replace libmagic w/ Bro signatures for file MIME type identification.
Notable changes:

- libmagic is no longer used at all.  All MIME type detection is
  done through new Bro signatures, and there's no longer a means to get
  verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
  The majority of the default file magic signatures are derived
  from the default magic database of libmagic ~5.17.

- File magic signatures consist of two new constructs in the
  signature rule parsing grammar: "file-magic" gives a regular
  expression to match against, and "file-mime" gives the MIME type
  string of content that matches the magic and an optional strength
  value for the match.

- Modified signature/rule syntax for identifiers: they can no longer
  start with a '-', which made for ambiguous syntax when doing negative
  strength values in "file-mime".  Also brought syntax for Bro script
  identifiers in line with reality (they can't start with numbers or
  include '-' at all).

- A new Built-In Function, "file_magic", can be used to get all
  file magic matches and their corresponding strength against a given
  chunk of data

- The second parameter of the "identify_data" Built-In Function
  can no longer be used to get verbose file type descriptions, though it
  can still be used to get the strongest matching file magic signature.

- The "file_transferred" event's "descr" parameter no longer
  contains verbose file type descriptions.

- The BROMAGIC environment variable no longer changes any behavior
  in Bro as magic databases are no longer used/installed.

- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
  (it's back to being the same requirement as the Bro v2.2 release).
  The bump was to accomodate building libmagic as an external project,
  which is no longer needed.

Addresses BIT-1143.
2014-03-04 11:12:06 -06:00