Replace libmagic w/ Bro signatures for file MIME type identification.

Notable changes:

- libmagic is no longer used at all.  All MIME type detection is
  done through new Bro signatures, and there's no longer a means to get
  verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
  The majority of the default file magic signatures are derived
  from the default magic database of libmagic ~5.17.

- File magic signatures consist of two new constructs in the
  signature rule parsing grammar: "file-magic" gives a regular
  expression to match against, and "file-mime" gives the MIME type
  string of content that matches the magic and an optional strength
  value for the match.

- Modified signature/rule syntax for identifiers: they can no longer
  start with a '-', which made for ambiguous syntax when doing negative
  strength values in "file-mime".  Also brought syntax for Bro script
  identifiers in line with reality (they can't start with numbers or
  include '-' at all).

- A new Built-In Function, "file_magic", can be used to get all
  file magic matches and their corresponding strength against a given
  chunk of data

- The second parameter of the "identify_data" Built-In Function
  can no longer be used to get verbose file type descriptions, though it
  can still be used to get the strongest matching file magic signature.

- The "file_transferred" event's "descr" parameter no longer
  contains verbose file type descriptions.

- The BROMAGIC environment variable no longer changes any behavior
  in Bro as magic databases are no longer used/installed.

- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
  (it's back to being the same requirement as the Bro v2.2 release).
  The bump was to accomodate building libmagic as an external project,
  which is no longer needed.

Addresses BIT-1143.
This commit is contained in:
Jon Siwek 2014-03-04 11:12:06 -06:00
parent f2f817c8b1
commit b22ca5d0a3
40 changed files with 4636 additions and 173 deletions

View file

@ -3,6 +3,10 @@
#include <limits.h>
#include <vector>
#include <map>
#include <functional>
#include <set>
#include <string>
#include "IPAddr.h"
#include "BroString.h"
@ -191,6 +195,31 @@ private:
int_list matched_rules; // Rules for which all conditions have matched
};
/**
* A state object used for matching file magic signatures.
*/
class RuleFileMagicState {
friend class RuleMatcher;
public:
~RuleFileMagicState();
private:
// Ctor is private; use RuleMatcher::InitFileMagic() for instantiation.
RuleFileMagicState()
{ }
struct Matcher {
RE_Match_State* state;
};
declare(PList, Matcher);
typedef PList(Matcher) matcher_list;
matcher_list matchers;
};
// RuleMatcher is the main class which builds up the data structures
// and performs the actual matching.
@ -205,6 +234,42 @@ public:
// Parse the given files and built up data structures.
bool ReadFiles(const name_list& files);
/**
* Inititialize a state object for matching file magic signatures.
* @return A state object that can be used for file magic mime type
* identification.
*/
RuleFileMagicState* InitFileMagic() const;
/**
* Data structure containing a set of matching file magic signatures.
* Ordered from greatest to least strength. Matches of the same strength
* will be in the set in lexicographic order of the MIME type string.
*/
typedef map<int, set<string>, std::greater<int> > MIME_Matches;
/**
* Matches a chunk of data against file magic signatures.
* @param state A state object previously returned from
* RuleMatcher::InitFileMagic()
* @param data Chunk of data to match signatures against.
* @param len Length of \a data in bytes.
* @param matches An optional pre-existing match result object to
* modify with additional matches. If it's a null
* pointer, one will be instantiated and returned from
* this method.
* @return The results of the signature matching.
*/
MIME_Matches* Match(RuleFileMagicState* state, const u_char* data,
uint64 len, MIME_Matches* matches = 0) const;
/**
* Resets a state object used with matching file magic signatures.
* @param state The state object to reset to an initial condition.
*/
void ClearFileMagicState(RuleFileMagicState* state) const;
// Initialize the matching state for a endpoind of a connection based on
// the given packet (which should be the first packet encountered for
// this endpoint). If the matching is triggered by an PIA, a pointer to