* topic/robin/ascii-escape-normalization:
Updating NEWS.
In bifs, change ODesc objects to have RAW_STYLE.
Changing what's escaped when printing.
Remove several BroString escaping methods that are no longer useful.
BIT-1333 #merged
- Backed out eTag changes. The real world is more complicated
than just using eTags to identify the same file.
- A bit of code simplication in the http base scripts.
- Test updates (more existing small problems were identified!).
-
With this patch the model is:
- "print" cleans the data so that non-printable characters get
escaped. This is not necessarily reversible.
- to print in a reversible way, one can go through
escape_string(); this escapes backslashes as well to make the
decoding non-ambigious.
- Logging always escapes similar to escape_string(), making it
reversible.
Compared to master, we also change the escaping as follows:
- We now only escape with "\xXX", no more "^X" or "\0". Exception:
backslashes.
- We escape backlashes as "\\".
- There's no "alternative" output style anymore, i.e., fmt() '%A'
qualifier is gone.
Baselines in testing/btest are updated, external tests not yet.
Addresses BIT-1333.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
type) get updated/set for consistency.
- Add more robust mechanisms for flushing the reassembly buffer.
The goal being to report all gaps and deliveries to file analyzers
regardless of the state of the reassembly buffer at the time it has to
be flushed.
- Removed "binary" and "octet-stream" mime type detections. They don't
provide any more information than an uninitialized mime_type field
which implicitly means no magic signature matches and so the media
type is unknown to Bro.
- Slight change to "text/plain" signature. It's still not the most
accurate, which is reflected in its -20 strength value.
- The logic for adding file ids to {orig,resp}_fuids fields of
the http.log incorrectly depended on the state of
{orig,resp}_mime_types fields, so sometimes not all file ids
associated w/ the session were logged.
Notable changes:
- libmagic is no longer used at all. All MIME type detection is
done through new Bro signatures, and there's no longer a means to get
verbose file type descriptions (e.g. "PNG image data, 1435 x 170").
The majority of the default file magic signatures are derived
from the default magic database of libmagic ~5.17.
- File magic signatures consist of two new constructs in the
signature rule parsing grammar: "file-magic" gives a regular
expression to match against, and "file-mime" gives the MIME type
string of content that matches the magic and an optional strength
value for the match.
- Modified signature/rule syntax for identifiers: they can no longer
start with a '-', which made for ambiguous syntax when doing negative
strength values in "file-mime". Also brought syntax for Bro script
identifiers in line with reality (they can't start with numbers or
include '-' at all).
- A new Built-In Function, "file_magic", can be used to get all
file magic matches and their corresponding strength against a given
chunk of data
- The second parameter of the "identify_data" Built-In Function
can no longer be used to get verbose file type descriptions, though it
can still be used to get the strongest matching file magic signature.
- The "file_transferred" event's "descr" parameter no longer
contains verbose file type descriptions.
- The BROMAGIC environment variable no longer changes any behavior
in Bro as magic databases are no longer used/installed.
- Reverted back to minimum requirement of CMake 2.6.3 from 2.8.0
(it's back to being the same requirement as the Bro v2.2 release).
The bump was to accomodate building libmagic as an external project,
which is no longer needed.
Addresses BIT-1143.
- The reassembly behavior can be modified per-file by enabling or
disabling the reassembler and/or modifying the size of the reassembly
buffer.
- Changed the file extraction analyzer to use the stream to avoid
issues with the chunk based approach not immediately triggering
the file_new event due to mime-type detection delay. Early chunks
frequently ended up lost before.
- Generally things are working now and I'd consider this in testing.
- Several places were just using old variable names or not loading
scripts correctly after they'd been renamed/moved.
- Revert/adjust a change in how HTTP file handles are generated that
broke partial content responses.
- Turn some libmagic builtin checks back on; seems some are actually
useful (e.g. text detection seems to be a builtin). The rule going
forward probably will be only to turn off a builtin if we confirm it
causes issues.
- Removed some tests that are redundant or not necessary anymore because
the generic file analysis tests cover them.
- A couple FTP tests still fail that I think need an actual solution via
script changes.
Thanks to git this merge was less troublesome that I was afraid it
would be. Not all tests pass yet though (and file hashes have changed
unfortunately).
Conflicts:
cmake
doc/scripts/DocSourcesList.cmake
scripts/base/init-bare.bro
scripts/base/protocols/ftp/main.bro
scripts/base/protocols/irc/dcc-send.bro
scripts/test-all-policy.bro
src/AnalyzerTags.h
src/CMakeLists.txt
src/analyzer/Analyzer.cc
src/analyzer/protocol/file/File.cc
src/analyzer/protocol/file/File.h
src/analyzer/protocol/http/HTTP.cc
src/analyzer/protocol/http/HTTP.h
src/analyzer/protocol/mime/MIME.cc
src/event.bif
src/main.cc
src/util-config.h.in
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/istate.events-ssl/receiver.http.log
testing/btest/Baseline/istate.events-ssl/sender.http.log
testing/btest/Baseline/istate.events/receiver.http.log
testing/btest/Baseline/istate.events/sender.http.log
- It's derived from the magic database of libmagic 5.14, but with most
everything not related to mime types removed.
- The custom database is always used by default for mime detection, but
the more verbose file type detection will fall back on the default
libmagic installation's database. The result is: mime type strings
are now guaranteed to be consistent across platforms, but the verbose
file type descriptions are not.
- The custom database gets installed in $prefix/share/bro/magic, and
should even be extensible if files with new patterns are added inside
the directory.
- The search path for the mime magic database can be controlled via
BROMAGIC environment variable.
- Remove mime_desc field from ftp.log.
- Stop using the mime/file type canonifier with unit tests.
- libmagic >= 5.04 is now a requirement.
- FileAnalysis::Info is now just a record used for logging, the fa_file
record type is defined in init-bare.bro as the analogue to a
connection record.
- Starting to transfer policy hook triggers and analyzer results to
events.
- Add a timeout flag to file_analysis.log so it's easy to tell what
has had at least one timeout trigger happen.
- Fix ftp-data service tag not being set for reused connections.
- Fix HTTP::Incorrect_File_Type because mime types returned by FAF have
the charset still in them, but the HTTP::mime_types_extensions table
does not and it requires an exact string match. (still ugly)
- Add TRIGGER_NEW_CONN to track files going over multiple connections.
- Add an initial file/mime type guess for non-linear file transfers.
- Fix a case where file/mime type detection would never be attempted
if the start of the file was a content gap.
- Improve mime type tracking of HTTP byte-range/partial-content,
even if the requests are pipelined or over multiple connections.
- I changed the modbus.events test because having the baseline output
be 80+ MB is nuts and it was sensitive to connection record redefs.
Other misc:
- Remove HTTP::MD5 notice.
- Add "last_active" field to FileAnalysis::Info record.
- Replace "conn_uids", "conn_ids" fields in FileAnalysis::Info record
with just a "conns" fields containing full connection records.
- The http-methods unit test is failing now, but I think it will be
fixed once I change the file handle callback mechanism to use events
instead.