Merge remote-tracking branch 'origin/master' into topic/vladg/ssh

2025-10-09 18:18:19 +00:00 · 2015-02-06 18:58:38 -05:00 · 2015-02-06 18:58:38 -05:00 · fc721d2d25
commit fc721d2d25
parent 05ecac2497 1012539ded
197 changed files with 9026 additions and 2574 deletions
--- a/137
+++ b/137
@ -1,4 +1,141 @@

+2.3-411 | 2015-02-05 10:05:48 -0600
+
+  * Fix file analysis of files with total size below the bof_buffer size
+    never delivering content to stream analyzers. (Seth Hall)
+
+  * Add/fix log fields in x509 diff canonifier. (Jon Siwek)
+
+  * "id" not defined for debug code when using -DPROFILE_BRO_FUNCTIONS
+    (Mike Smiley)
+
+2.3-406 | 2015-02-03 17:02:45 -0600
+
+  * Add x509 canonifier to a unit test. (Jon Siwek)
+
+2.3-405 | 2015-02-02 11:14:24 -0600
+
+  * Fix memory leak in new split_string* functions. (Jon Siwek)
+
+2.3-404 | 2015-01-30 14:23:27 -0800
+
+  * Update documentation (broken links, outdated tests). (Jon Siwek)
+
+  * Deprecate split* family of BIFs. (Jon Siwek)
+
+    These functions are now deprecated in favor of alternative versions that
+    return a vector of strings rather than a table of strings.
+
+    Deprecated functions:
+
+    - split: use split_string instead.
+    - split1: use split_string1 instead.
+    - split_all: use split_string_all instead.
+    - split_n: use split_string_n instead.
+    - cat_string_array: see join_string_vec instead.
+    - cat_string_array_n: see join_string_vec instead.
+    - join_string_array: see join_string_vec instead.
+    - sort_string_array: use sort instead instead.
+    - find_ip_addresses: use extract_ip_addresses instead.
+
+    Changed functions:
+
+    - has_valid_octets: uses a string_vec parameter instead of string_array.
+
+    Addresses BIT-924.
+
+  * Add a new attribute: &deprecated. While scripts are parsed, a
+    warning is raised for each usage of an identifier marked as
+    &deprecated.  This also works for BIFs. Addresses BIT-924,
+    BIT-757. (Jon Siwek)
+
+2.3-397 | 2015-01-27 10:13:10 -0600
+
+  * Handle guess_lexer exceptions in pygments reST directive (Jon Siwek)
+
+2.3-396 | 2015-01-23 10:49:15 -0600
+
+  * DNP3: fix reachable assertion and buffer over-read/overflow.
+    CVE number pending. (Travis Emmert, Jon Siwek)
+
+  * Update binpac: Fix potential out-of-bounds memory reads in generated
+    code. CVE-2014-9586. (John Villamil and Chris Rohlf - Yahoo
+    Paranoids, Jon Siwek)
+
+  * Fixing (harmless) Coverity warning. (Robin Sommer)
+
+2.3-392 | 2015-01-15 09:44:15 -0800
+
+  * Small changes to EC curve names in a newer draft. (Johanna Amann)
+
+2.3-390 | 2015-01-14 13:27:34 -0800
+
+  * Updating MySQL analyses. (Vlad Grigorescu)
+     - Use a boolean success instead of a result string.
+     - Change the affected_rows response detail string to a "rows" count.
+     - Fix the state tracking to log incomplete command.
+
+  * Extend DNP3 to support communication over UDP. (Hui Lin)
+
+  * Fix a bug in DNP3 determining the length of an object in some
+    cases. (Hui Lin)
+
+2.3-376 | 2015-01-12 09:38:10 -0600
+
+  * Improve documentation for connection_established event. (Jon Siwek)
+
+2.3-375 | 2015-01-08 13:10:09 -0600
+
+  * Increase minimum required CMake version to 2.8. (Jon Siwek)
+
+2.3-374 | 2015-01-07 10:03:17 -0600
+
+  * Improve documentation of the Intelligence Framework. (Daniel Thayer)
+
+2.3-371 | 2015-01-06 09:58:09 -0600
+
+  * Update/improve file mime type identification. (Seth Hall)
+
+     - Change to the default BOF buffer size to 3000 (was 1024).
+
+     - Reorganized MS signatures into a separate file.
+
+     - Remove all of the x-c detections.  Nearly all false positives.
+
+     - Improve TAR detections, removing old, back up TAR detections.
+
+     - Remove one of the x-elc detections that was too loose
+       and caused many false positives.
+
+     - Improved lots of the signatures and added new ones. (Seth Hall)
+
+  * Add support for file reassembly in the file analysis framework
+    (Seth Hall, Jon Siwek).
+
+     - The reassembly behavior can be modified per-file by enabling or
+       disabling the reassembler and/or modifying the size of the
+       reassembly buffer.
+
+     - Changed the file extraction analyzer to use stream-wise input to
+       avoid issues with the chunk-wise approach not immediately
+       triggering the file_new event due to mime-type detection delay.
+       Before, early chunks frequently ended up lost.  Extraction also
+       will now explicitly NUL-fill gaps in the file instead of
+       implicitly relying on pwrite to do it.
+
+2.3-349 | 2015-01-05 15:21:13 -0600
+
+  * Fix race condition in unified2 file analyzer startup. (Jon siwek)
+
+2.3-348 | 2014-12-31 09:19:34 -0800
+
+  * Changing Makefile's test-all to run test-all for broctl, which now
+    executes trace-summary tests as well. (Robin Sommer)
+
+2.3-345 | 2014-12-31 09:06:15 -0800
+
+  * Correct a typo in the Notice framework doc. (Daniel Thayer)
+
 2.3-343 | 2014-12-12 12:43:46 -0800

  * Fix PIA packet replay to deliver copy of IP header. This prevented
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -2,7 +2,7 @@ project(Bro C CXX)

 # When changing the minimum version here, also adapt
 # aux/bro-aux/plugin-support/skeleton/CMakeLists.txt
-cmake_minimum_required(VERSION 2.6.3 FATAL_ERROR)
+cmake_minimum_required(VERSION 2.8 FATAL_ERROR)

 include(cmake/CommonCMakeConfig.cmake)

--- a/2
+++ b/2
@ -54,7 +54,7 @@ test:
 	@( cd testing && make )

 test-all: test
-	test -d aux/broctl && ( cd aux/broctl && make test )
+	test -d aux/broctl && ( cd aux/broctl && make test-all )
 	test -d aux/btest  && ( cd aux/btest && make test )
 	test -d aux/bro-aux && ( cd aux/bro-aux && make test )
 	test -d aux/plugins && ( cd aux/plugins && make test-all )
--- a/52
+++ b/52
@ -28,11 +28,63 @@ New Functionality
 - Bro now has supoprt for the MySQL wire protocol. Activity gets
  logged into mysql.log.

+- Bro's file analysis now supports reassembly of files that are not
+  transferred/seen sequentially.
+
 Changed Functionality
 ---------------------

 - bro-cut has been rewritten in C, and is hence much faster.

+- File analysis
+
+    * Removed ``fa_file`` record's ``mime_type`` and ``mime_types``
+      fields.  The events ``file_mime_type`` and ``file_mime_types``
+      have been added which contain the same information.  The
+      ``mime_type`` field of ``Files::Info`` also still has this info.
+
+    * Removed ``Files::add_analyzers_for_mime_type`` function.
+
+    * Removed ``offset`` parameter of the ``file_extraction_limit``
+      event.  Since file extraction now internally depends on file
+      reassembly for non-sequential files, "offset" can be obtained
+      with other information already available -- adding together
+      ``seen_bytes`` and ``missed_bytes`` fields of the ``fa_file``
+      record gives the how many bytes have been written so far (i.e.
+      the "offset").
+
+- has_valid_octets: now uses a string_vec parameter instead of
+  string_array.
+
+Deprecated Functionality
+------------------------
+
+- The split* family of functions are to be replaced with alternate
+  versions that return a vector of strings rather than a table of
+  strings. This also allows deprecation for some related string
+  concatenation/extraction functions. Note that the new functions use
+  0-based indexing, rather than 1-based.
+
+  The full list of now deprecation functions is:
+
+    * split: use split_string instead.
+
+    * split1: use split_string1 instead.
+
+    * split_all: use split_string_all instead.
+
+    * split_n: use split_string_n instead.
+
+    * cat_string_array: see join_string_vec instead.
+
+    * cat_string_array_n: see join_string_vec instead.
+
+    * join_string_array: see join_string_vec instead.
+
+    * sort_string_array: use sort instead.
+
+    * find_ip_addresses: use extract_ip_addresses instead.
+
 Bro 2.3
 =======

--- a/2
+++ b/2
@ -1 +1 @@
-2.3-343
+2.3-411
--- a/aux/bro-aux
+++ b/aux/bro-aux
@ -1 +1 @@
-Subproject commit 43a9f360c9bf6b35fcb25d61ebff80c7feb1812b
+Subproject commit 0b713c027d3efaaca50e5df995c02656175573cd
--- a/aux/broccoli
+++ b/aux/broccoli
@ -1 +1 @@
-Subproject commit acb8fbe8e7bc6ace5135fb73dca8e29432cdc1ca
+Subproject commit d43cc790e5b8709b5e032e52ad0e00936494739b
--- a/aux/broctl
+++ b/aux/broctl
@ -1 +1 @@
-Subproject commit 90f9ca0ffa2306f0d1d2ac208cdbb7787199f890
+Subproject commit 8c9b87bc73e1ddaa304e3d89028c1e7b95d37a91
--- a/aux/btest
+++ b/aux/btest
@ -1 +1 @@
-Subproject commit d67d89aaee32ad5edb9068db55d1310c2f36970a
+Subproject commit 93d4989ed1537e4d143cf09d44077159f869a4b2
--- a/doc/ext/rst_directive.py
+++ b/doc/ext/rst_directive.py
@ -135,7 +135,10 @@ class Pygments(Directive):
                # lexer not found, use default.
                lexer = TextLexer()
        else:
+            try:
                lexer = guess_lexer(content)
+            except:
+                lexer = TextLexer()

        # import sys
        # print >>sys.stderr, self.arguments, lexer.__class__
--- a/doc/frameworks/file_analysis_02.bro
+++ b/doc/frameworks/file_analysis_02.bro
@ -1,7 +1,7 @@
-event file_new(f: fa_file)
+event file_mime_type(f: fa_file, mime_type: string)
    {
    print "new file", f$id;
-    if ( f?$mime_type && f$mime_type == "text/plain" )
+    if ( mime_type == "text/plain" )
        Files::add_analyzer(f, Files::ANALYZER_MD5);
    }

--- a/doc/frameworks/intel.rst
+++ b/doc/frameworks/intel.rst
@ -14,32 +14,35 @@ consume that data, make it available for matching, and provide
 infrastructure around improving performance, memory utilization, and
 generally making all of this easier.

-Data in the Intelligence Framework is the atomic piece of intelligence
+Data in the Intelligence Framework is an atomic piece of intelligence
 such as an IP address or an e-mail address along with a suite of
 metadata about it such as a freeform source field, a freeform
 descriptive field and a URL which might lead to more information about
 the specific item.  The metadata in the default scripts has been
 deliberately kept minimal so that the community can find the
-appropriate fields that need added by writing scripts which extend the
+appropriate fields that need to be added by writing scripts which extend the
 base record using the normal record extension mechanism.

 Quick Start
 -----------

-Load the package of scripts that sends data into the Intelligence
-Framework to be checked by loading this script in local.bro::
-
-	@load policy/frameworks/intel/seen
-
 Refer to the "Loading Intelligence" section below to see the format
 for Intelligence Framework text files, then load those text files with
 this line in local.bro::

 	redef Intel::read_files += { "/somewhere/yourdata.txt" };

-The data itself only needs to reside on the manager if running in a
+The text files need to reside only on the manager if running in a
 cluster.

+Add the following line to local.bro in order to load the scripts
+that send "seen" data into the Intelligence Framework to be checked against
+the loaded intelligence data::
+
+	@load policy/frameworks/intel/seen
+
+Intelligence data matches will be logged to the intel.log file.
+
 Architecture
 ------------

@ -58,8 +61,10 @@ manager is the only node that needs the intelligence data.  The
 intelligence framework has distribution mechanisms which will push
 data out to all of the nodes that need it.

-Here is an example of the intelligence data format.  Note that all
-whitespace field separators are literal tabs and fields containing only a
+Here is an example of the intelligence data format (note that there will be
+additional fields if you are using CIF intelligence data or if you are
+using the policy/frameworks/intel/do_notice script).  Note that all fields
+must be separated by a single tab character and fields containing only a
 hyphen are considered to be null values. ::

 	#fields	indicator	indicator_type	meta.source	meta.desc	meta.url
@ -69,8 +74,21 @@ hyphen are considered to be null values. ::
 For a list of all built-in `indicator_type` values, please refer to the
 documentation of :bro:see:`Intel::Type`.

-To load the data once files are created, use the following example
-code to define files to load with your own file names of course::
+Note that if you are using data from the Collective Intelligence Framework,
+then you will need to add the following line to your local.bro in order
+to support additional metadata fields used by CIF::
+
+	@load policy/integration/collective-intel
+
+There is a simple mechanism to raise a Bro notice (of type Intel::Notice)
+for user-specified intelligence matches.  To use this feature, add the
+following line to local.bro in order to support additional metadata fields
+(documented in the :bro:see:`Intel::MetaData` record)::
+
+	@load policy/frameworks/intel/do_notice
+
+To load the data once the files are created, use the following example
+to specify which files to load (with your own file names of course)::

 	redef Intel::read_files += {
 		"/somewhere/feed1.txt",
@ -85,24 +103,23 @@ Seen Data

 When some bit of data is extracted (such as an email address in the
 "From" header in a message over SMTP), the Intelligence Framework
-needs to be informed that this data was discovered and it's presence
-should be checked within the intelligence data set.  This is
-accomplished through the :bro:see:`Intel::seen` function.
+needs to be informed that this data was discovered so that its presence
+will be checked within the loaded intelligence data.  This is
+accomplished through the :bro:see:`Intel::seen` function, however
+typically users won't need to work with this function due to the
+scripts included with Bro that will call this function.

-Typically users won't need to work with this function due to built in
-hook scripts that Bro ships with that will "see" data and send it into
-the intelligence framework.  A user may only need to load the entire
-package of hook scripts as a module or pick and choose specific
-scripts to load.  Keep in mind that as more data is sent into the
+To load all of the scripts included with Bro for sending "seen" data to
+the intelligence framework, just add this line to local.bro::
+
+	@load policy/frameworks/intel/seen
+
+Alternatively, specific scripts in that directory can be loaded.
+Keep in mind that as more data is sent into the
 intelligence framework, the CPU load consumed by Bro will increase
 depending on how many times the :bro:see:`Intel::seen` function is
 being called which is heavily traffic dependent.

-The full package of hook scripts that Bro ships with for sending this
-"seen" data into the intelligence framework can be loading by adding
-this line to local.bro::
-
-	@load policy/frameworks/intel/seen

 Intelligence Matches
 ********************
@ -111,6 +128,7 @@ Against all hopes, most networks will eventually have a hit on
 intelligence data which could indicate a possible compromise or other
 unwanted activity.  The Intelligence Framework provides an event that
 is generated whenever a match is discovered named :bro:see:`Intel::match`.
+
 Due to design restrictions placed upon
 the intelligence framework, there is no assurance as to where this
 event will be generated.  It could be generated on the worker where
@ -119,3 +137,7 @@ handled, only the data given as event arguments to the event can be
 assured since the host where the data was seen may not be where
 ``Intel::match`` is handled.

+Intelligence matches are logged to the intel.log file.  For a description of
+each field in that file, see the documentation for the :bro:see:`Intel::Info`
+record.
+
--- a/doc/frameworks/notice.rst
+++ b/doc/frameworks/notice.rst
@ -271,7 +271,7 @@ script that is generating the notice has indicated to the notice framework how
 to identify notices that are intrinsically the same. Identification of these
 "intrinsically duplicate" notices is implemented with an optional field in
 :bro:see:`Notice::Info` records named ``$identifier`` which is a simple string.
-If the ``$identifier`` and ``$type`` fields are the same for two notices, the
+If the ``$identifier`` and ``$note`` fields are the same for two notices, the
 notice framework actually considers them to be the same thing and can use that
 information to suppress duplicates for a configurable period of time.

--- a/doc/httpmonitor/file_extraction.bro
+++ b/doc/httpmonitor/file_extraction.bro
@ -7,18 +7,15 @@ global mime_to_ext: table[string] of string = {
 	["text/html"] = "html",
 };

-event file_new(f: fa_file)
+event file_mime_type(f: fa_file, mime_type: string)
 	{
 	if ( f$source != "HTTP" )
 		return;

-	if ( ! f?$mime_type )
+	if ( mime_type !in mime_to_ext )
 		return;

-	if ( f$mime_type !in mime_to_ext )
-		return;
-
-	local fname = fmt("%s-%s.%s", f$source, f$id, mime_to_ext[f$mime_type]);
+	local fname = fmt("%s-%s.%s", f$source, f$id, mime_to_ext[mime_type]);
 	print fmt("Extracting file %s", fname);
 	Files::add_analyzer(f, Files::ANALYZER_EXTRACT, [$extract_filename=fname]);
 	}
--- a/doc/install/install.rst
+++ b/doc/install/install.rst
@ -35,7 +35,7 @@ before you begin:

 To build Bro from source, the following additional dependencies are required:

-    * CMake 2.6.3 or greater            (http://www.cmake.org)
+    * CMake 2.8 or greater              (http://www.cmake.org)
    * Make
    * C/C++ compiler
    * SWIG                              (http://www.swig.org)
--- a/doc/script-reference/attributes.rst
+++ b/doc/script-reference/attributes.rst
@ -49,6 +49,8 @@ The Bro scripting language supports the following attributes.
 +-----------------------------+-----------------------------------------------+
 | :bro:attr:`&type_column`    |Used by input framework for "port" type.       |
 +-----------------------------+-----------------------------------------------+
+| :bro:attr:`&deprecated`     |Marks an identifier as deprecated.             |
+-----------------------------+-----------------------------------------------+

 Here is a more detailed explanation of each attribute:

@ -230,3 +232,9 @@ Here is a more detailed explanation of each attribute:
            msg: string;
        };

+.. bro:attr:: &deprecated
+
+    The associated identifier is marked as deprecated and will be
+    removed in a future version of Bro.  Look in the NEWS file for more
+    explanation and/or instructions to migrate code that uses deprecated
+    functionality.
--- a/doc/scripting/index.rst
+++ b/doc/scripting/index.rst
@ -103,9 +103,9 @@ In the ``file_hash`` event handler, there is an ``if`` statement that is used
 to check for the correct type of hash, in this case
 a SHA1 hash.  It also checks for a mime type we've defined as
 being of interest as defined in the constant ``match_file_types``.
-The comparison is made against the expression ``f$mime_type``, which uses
+The comparison is made against the expression ``f$info$mime_type``, which uses
 the ``$`` dereference operator to check the value ``mime_type``
-inside the variable ``f``.  If the entire expression evaluates to true,
+inside the variable ``f$info``.  If the entire expression evaluates to true,
 then a helper function is called to do the rest of the work.  In that
 function, a local variable is defined to hold a string comprised of
 the SHA1 hash concatenated with ``.malware.hash.cymru.com``; this
--- a/scripts/base/files/unified2/main.bro
+++ b/scripts/base/files/unified2/main.bro
@ -71,11 +71,50 @@ global classification_map: table[count] of string;
 global sid_map: table[count] of string;
 global gen_map: table[count] of string;

+global num_classification_map_reads = 0;
+global num_sid_map_reads = 0;
+global num_gen_map_reads = 0;
+global watching = F;
+
 # For reading in config files.
 type OneLine: record {
 	line: string;
 };

+function mappings_initialized(): bool
+	{
+	return num_classification_map_reads > 0 &&
+	       num_sid_map_reads > 0 &&
+	       num_gen_map_reads > 0;
+	}
+
+function start_watching()
+	{
+	if ( watching )
+		return;
+
+	watching = T;
+
+	if ( watch_dir != "" )
+		{
+		Dir::monitor(watch_dir, function(fname: string)
+			{
+			Input::add_analysis([$source=fname,
+			                     $reader=Input::READER_BINARY,
+			                     $mode=Input::STREAM,
+			                     $name=fname]);
+			}, 10secs);
+		}
+
+	if ( watch_file != "" )
+		{
+		Input::add_analysis([$source=watch_file,
+		                     $reader=Input::READER_BINARY,
+		                     $mode=Input::STREAM,
+		                     $name=watch_file]);
+		}
+	}
+
 function create_info(ev: IDSEvent): Info
 	{
 	local info = Info($ts=ev$ts,
@ -113,34 +152,56 @@ redef record fa_file += {

 event Unified2::read_sid_msg_line(desc: Input::EventDescription, tpe: Input::Event, line: string)
 	{
-	local parts = split_n(line, / \|\| /, F, 100);
-	if ( |parts| >= 2 && /^[0-9]+$/ in parts[1] )
-		sid_map[to_count(parts[1])] = parts[2];
+	local parts = split_string_n(line, / \|\| /, F, 100);
+	if ( |parts| >= 2 && /^[0-9]+$/ in parts[0] )
+		sid_map[to_count(parts[0])] = parts[1];
 	}

 event Unified2::read_gen_msg_line(desc: Input::EventDescription, tpe: Input::Event, line: string)
 	{
-	local parts = split_n(line, / \|\| /, F, 3);
-	if ( |parts| >= 2 && /^[0-9]+$/ in parts[1] )
-		gen_map[to_count(parts[1])] = parts[3];
+	local parts = split_string_n(line, / \|\| /, F, 3);
+	if ( |parts| >= 2 && /^[0-9]+$/ in parts[0] )
+		gen_map[to_count(parts[0])] = parts[2];
 	}

 event Unified2::read_classification_line(desc: Input::EventDescription, tpe: Input::Event, line: string)
 	{
-	local parts = split_n(line, /: /, F, 2);
+	local parts = split_string_n(line, /: /, F, 2);
 	if ( |parts| == 2 )
 		{
-		local parts2 = split_n(parts[2], /,/, F, 4);
+		local parts2 = split_string_n(parts[1], /,/, F, 4);
 		if ( |parts2| > 1 )
-			classification_map[|classification_map|+1] = parts2[1];
+			classification_map[|classification_map|+1] = parts2[0];
 		}
 	}

+event Input::end_of_data(name: string, source: string)
+	{
+	if ( name == classification_config )
+		++num_classification_map_reads;
+	else if ( name == sid_msg )
+		++num_sid_map_reads;
+	else if ( name == gen_msg )
+		++num_gen_map_reads;
+	else
+		return;
+
+	if ( watching )
+		return;
+
+	if ( mappings_initialized() )
+		start_watching();
+	}
+
 event bro_init() &priority=5
 	{
 	Log::create_stream(Unified2::LOG, [$columns=Info, $ev=log_unified2]);

-	if ( sid_msg != "" )
+	if ( sid_msg == "" )
+		{
+		num_sid_map_reads = 1;
+		}
+	else
 		{
 		Input::add_event([$source=sid_msg,
 		                  $reader=Input::READER_RAW,
@ -151,7 +212,11 @@ event bro_init() &priority=5
 		                  $ev=Unified2::read_sid_msg_line]);
 		}

-	if ( gen_msg != "" )
+	if ( gen_msg == "" )
+		{
+		num_gen_map_reads = 1;
+		}
+	else
 		{
 		Input::add_event([$source=gen_msg,
 		                  $name=gen_msg,
@ -162,7 +227,11 @@ event bro_init() &priority=5
 		                  $ev=Unified2::read_gen_msg_line]);
 		}

-	if ( classification_config != "" )
+	if ( classification_config == "" )
+		{
+		num_classification_map_reads = 1;
+		}
+	else
 		{
 		Input::add_event([$source=classification_config,
 		                  $name=classification_config,
@ -173,32 +242,16 @@ event bro_init() &priority=5
 		                  $ev=Unified2::read_classification_line]);
 		}

-	if ( watch_dir != "" )
-		{
-		Dir::monitor(watch_dir, function(fname: string)
-			{
-			Input::add_analysis([$source=fname,
-			                     $reader=Input::READER_BINARY,
-			                     $mode=Input::STREAM,
-			                     $name=fname]);
-			}, 10secs);
-		}
-
-	if ( watch_file != "" )
-		{
-		Input::add_analysis([$source=watch_file,
-		                     $reader=Input::READER_BINARY,
-		                     $mode=Input::STREAM,
-		                     $name=watch_file]);
-		}
+	if ( mappings_initialized() )
+		start_watching();
 	}

 event file_new(f: fa_file)
 	{
 	local file_dir = "";
-	local parts = split_all(f$source, /\/[^\/]*$/);
+	local parts = split_string_all(f$source, /\/[^\/]*$/);
 	if ( |parts| == 3 )
-		file_dir = parts[1];
+		file_dir = parts[0];

 	if ( (watch_file != "" && f$source == watch_file) || 
 	     (watch_dir != "" && compress_path(watch_dir) == file_dir) )
--- a/scripts/base/frameworks/files/magic/load.bro
+++ b/scripts/base/frameworks/files/magic/load.bro
@ -1,2 +1,3 @@
@load-sigs ./general
+@load-sigs ./msoffice
@load-sigs ./libmagic
--- a/scripts/base/frameworks/files/magic/general.sig
+++ b/scripts/base/frameworks/files/magic/general.sig
@ -1,16 +1,137 @@
 # General purpose file magic signatures.

 signature file-plaintext {
-    file-magic /([[:print:][:space:]]{10})/
+    file-magic /^([[:print:][:space:]]{10})/
    file-mime "text/plain", -20
 }

 signature file-tar {
-    file-magic /([[:print:]\x00]){100}(([[:digit:]\x00\x20]){8}){3}/
-    file-mime "application/x-tar", 150
+    file-magic /^[[:print:]\x00]{100}([[:digit:]\x20]{7}\x00){3}([[:digit:]\x20]{11}\x00){2}([[:digit:]\x00\x20]{7}[\x20\x00])[0-7\x00]/
+    file-mime "application/x-tar", 100
 }

+signature file-zip {
+	file-mime "application/zip", 10
+	file-magic /^PK\x03\x04.{2}/
+}
+
+signature file-jar {
+	file-mime "application/java-archive", 100
+	file-magic /^PK\x03\x04.{1,200}\x14\x00..META-INF\/MANIFEST\.MF/
+}
+
+signature file-java-applet {
+	file-magic /^\xca\xfe\xba\xbe...[\x2e-\x34]/
+	file-mime "application/x-java-applet", 71
+}
+
+# Shockwave flash
 signature file-swf {
-	file-magic /(F|C|Z)WS/
+	file-magic /^(F|C|Z)WS/
 	file-mime "application/x-shockwave-flash", 60
 }
+
+# Microsoft Outlook's Transport Neutral Encapsulation Format
+signature file-tnef {
+	file-magic /^\x78\x9f\x3e\x22/
+	file-mime "application/vnd.ms-tnef", 100
+}
+
+# Mac OS X DMG files
+signature file-dmg {
+	file-magic /^(\x78\x01\x73\x0D\x62\x62\x60|\x78\xDA\x63\x60\x18\x05|\x78\x01\x63\x60\x18\x05|\x78\xDA\x73\x0D|\x78[\x01\xDA]\xED[\xD0-\xD9])/
+	file-mime "application/x-dmg", 100
+}
+
+# Mac OS X Mach-O executable
+signature file-mach-o {
+	file-magic /^[\xce\xcf]\xfa\xed\xfe/
+	file-mime "application/x-mach-o-executable", 100
+}
+
+# Mac OS X Universal Mach-O executable
+signature file-mach-o-universal {
+	file-magic /^\xca\xfe\xba\xbe..\x00[\x01-\x14]/
+	file-mime "application/x-mach-o-executable", 100
+}
+
+# XAR (eXtensible ARchive) format. 
+# Mac OS X uses this for the .pkg format.
+signature file-xar {
+	file-magic /^xar\!/
+	file-mime "application/x-xar", 100
+}
+
+signature file-pkcs7 {
+	file-magic /^MIME-Version:.*protocol=\"application\/pkcs7-signature\"/
+	file-mime "application/pkcs7-signature", 100
+}
+
+# Concatenated X.509 certificates in textual format.
+signature file-pem {
+	file-magic /^-----BEGIN CERTIFICATE-----/
+	file-mime "application/x-pem"
+}
+
+# Java Web Start file.
+signature file-jnlp {
+	file-magic /^\<jnlp\x20/
+	file-mime "application/x-java-jnlp-file", 100
+}
+
+signature file-ico {
+	file-magic /^\x00\x00\x01\x00/
+	file-mime "image/x-icon", 70
+}
+
+signature file-cur {
+	file-magic /^\x00\x00\x02\x00/
+	file-mime "image/x-cursor", 70
+}
+
+signature file-pcap {
+	file-magic /^(\xa1\xb2\xc3\xd4|\xd4\xc3\xb2\xa1)/
+	file-mime "application/vnd.tcpdump.pcap", 70
+}
+
+signature file-pcap-ng {
+	file-magic /^\x0a\x0d\x0d\x0a.{4}(\x1a\x2b\x3c\x4d|\x4d\x3c\x2b\x1a)/
+	file-mime "application/vnd.tcpdump.pcap", 100
+}
+
+signature file-shellscript {
+	file-mime "text/x-shellscript", 250
+	file-magic /^\x23\x21[^\n]{1,15}bin\/(env[[:space:]]+)?(ba|tc|c|z|fa|ae|k)?sh/
+}
+
+signature file-perl {
+	file-magic /^\x23\x21[^\n]{1,15}bin\/(env[[:space:]]+)?perl/
+	file-mime "text/x-perl", 60
+}
+
+signature file-ruby {
+	file-magic /^\x23\x21[^\n]{1,15}bin\/(env[[:space:]]+)?ruby/
+	file-mime "text/x-ruby", 60
+}
+
+signature file-python {
+	file-magic /^\x23\x21[^\n]{1,15}bin\/(env[[:space:]]+)?python/
+	file-mime "text/x-python", 60
+}
+
+signature file-php {
+	file-magic /^.*<\?php/
+	file-mime "text/x-php", 40
+}
+
+# Stereolithography ASCII format
+signature file-stl-ascii {
+	file-magic /^solid\x20/
+	file-mime "application/sla", 10
+}
+
+# Sketchup model file
+signature file-skp {
+	file-magic /^\xFF\xFE\xFF\x0E\x53\x00\x6B\x00\x65\x00\x74\x00\x63\x00\x68\x00\x55\x00\x70\x00\x20\x00\x4D\x00\x6F\x00\x64\x00\x65\x00\x6C\x00/
+	file-mime "application/skp", 100
+}
--- a/scripts/base/frameworks/files/magic/libmagic.sig
+++ b/scripts/base/frameworks/files/magic/libmagic.sig
@ -7,42 +7,18 @@
 # The instrumented version of the `file` command used to generate these
 # is located at: https://github.com/jsiwek/file/tree/bro-signatures.

-# >2080  string,=Foglio di lavoro Microsoft Exce (len=31), ["%s"], swap_endian=0
-signature file-magic-auto0 {
-	file-mime "application/vnd.ms-excel", 340
-	file-magic /(.{2080})(Foglio di lavoro Microsoft Exce)/
-}
-
 # >2  string,=---BEGIN PGP PUBLIC KEY BLOCK- (len=30), ["PGP public key block"], swap_endian=0
 signature file-magic-auto1 {
 	file-mime "application/pgp-keys", 330
 	file-magic /(.{2})(\x2d\x2d\x2dBEGIN PGP PUBLIC KEY BLOCK\x2d)/
 }

-# >2080  string,=Microsoft Excel 5.0 Worksheet (len=29), ["%s"], swap_endian=0
-signature file-magic-auto2 {
-	file-mime "application/vnd.ms-excel", 320
-	file-magic /(.{2080})(Microsoft Excel 5\x2e0 Worksheet)/
-}
-
 # >11  string,=must be converted with BinHex (len=29), ["BinHex binary text"], swap_endian=0
 signature file-magic-auto3 {
 	file-mime "application/mac-binhex40", 320
 	file-magic /(.{11})(must be converted with BinHex)/
 }

-# >2080  string,=Microsoft Word 6.0 Document (len=27), ["%s"], swap_endian=0
-signature file-magic-auto4 {
-	file-mime "application/msword", 300
-	file-magic /(.{2080})(Microsoft Word 6\x2e0 Document)/
-}
-
-# >2080  string,=Documento Microsoft Word 6 (len=26), ["Spanish Microsoft Word 6 document data"], swap_endian=0
-signature file-magic-auto5 {
-	file-mime "application/msword", 290
-	file-magic /(.{2080})(Documento Microsoft Word 6)/
-}
-
 # >0  string,=-----BEGIN PGP SIGNATURE- (len=25), ["PGP signature"], swap_endian=0
 signature file-magic-auto6 {
 	file-mime "application/pgp-signature", 280
@ -92,36 +68,6 @@ signature file-magic-auto13 {
 	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fgawk)/
 }

-# >0  string/wt,=#! /usr/local/bin/bash (len=22), ["Bourne-Again shell script text executable"], swap_endian=0
-signature file-magic-auto14 {
-	file-mime "text/x-shellscript", 250
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fbash)/
-}
-
-# >0  string/wt,=#! /usr/local/bin/tcsh (len=22), ["Tenex C shell script text executable"], swap_endian=0
-signature file-magic-auto15 {
-	file-mime "text/x-shellscript", 250
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2ftcsh)/
-}
-
-# >0  string/wt,=#! /usr/local/bin/zsh (len=21), ["Paul Falstad's zsh script text executable"], swap_endian=0
-signature file-magic-auto16 {
-	file-mime "text/x-shellscript", 240
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fzsh)/
-}
-
-# >0  string/wt,=#! /usr/local/bin/ash (len=21), ["Neil Brown's ash script text executable"], swap_endian=0
-signature file-magic-auto17 {
-	file-mime "text/x-shellscript", 240
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fash)/
-}
-
-# >0  string/wt,=#! /usr/local/bin/ae (len=20), ["Neil Brown's ae script text executable"], swap_endian=0
-signature file-magic-auto18 {
-	file-mime "text/x-shellscript", 230
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fae)/
-}
-
 # >0  string,=# PaCkAgE DaTaStReAm (len=20), ["pkg Datastream (SVR4)"], swap_endian=0
 signature file-magic-auto19 {
 	file-mime "application/x-svr4-package", 230
@ -140,30 +86,12 @@ signature file-magic-auto21 {
 	file-magic /(\x5bKDE Desktop Entry\x5d)/
 }

-# >512  string,=R\000o\000o\000t\000 \000E\000n\000t\000r\000y (len=19), ["Microsoft Word Document"], swap_endian=0
-signature file-magic-auto22 {
-	file-mime "application/msword", 220
-	file-magic /(.{512})(R\x00o\x00o\x00t\x00 \x00E\x00n\x00t\x00r\x00y)/
-}
-
 # >0  string,=!<arch>\n__________E (len=19), ["MIPS archive"], swap_endian=0
 signature file-magic-auto23 {
 	file-mime "application/x-archive", 220
 	file-magic /(\x21\x3carch\x3e\x0a\x5f\x5f\x5f\x5f\x5f\x5f\x5f\x5f\x5f\x5fE)/
 }

-# >0  string/wt,=#! /usr/local/tcsh (len=18), ["Tenex C shell script text executable"], swap_endian=0
-signature file-magic-auto24 {
-	file-mime "text/x-shellscript", 210
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2ftcsh)/
-}
-
-# >0  string/wt,=#! /usr/local/bash (len=18), ["Bourne-Again shell script text executable"], swap_endian=0
-signature file-magic-auto25 {
-	file-mime "text/x-shellscript", 210
-	file-magic /(\x23\x21 ?\x2fusr\x2flocal\x2fbash)/
-}
-
 # >0  string/t,=# KDE Config File (len=17), ["KDE config file"], swap_endian=0
 signature file-magic-auto26 {
 	file-mime "application/x-kdelnk", 200
@ -189,12 +117,6 @@ signature file-magic-auto29 {
 	file-magic /(\x23\x21 ?\x2fusr\x2fbin\x2fnawk)/
 }

-# >0  string/wt,=#! /usr/bin/tcsh (len=16), ["Tenex C shell script text executable"], swap_endian=0
-signature file-magic-auto30 {
-	file-mime "text/x-shellscript", 190
-	file-magic /(\x23\x21 ?\x2fusr\x2fbin\x2ftcsh)/
-}
-
 # >0  string/wt,=#! /usr/bin/gawk (len=16), ["GNU awk script text executable"], swap_endian=0
 signature file-magic-auto31 {
 	file-mime "text/x-gawk", 190
@ -207,12 +129,6 @@ signature file-magic-auto32 {
 	file-magic /(.{369})(MICROSOFT PIFEX\x00)/
 }

-# >0  string/wt,=#! /usr/bin/bash (len=16), ["Bourne-Again shell script text executable"], swap_endian=0
-signature file-magic-auto33 {
-	file-mime "text/x-shellscript", 190
-	file-magic /(\x23\x21 ?\x2fusr\x2fbin\x2fbash)/
-}
-
 # >0  string/w,=#VRML V1.0 ascii (len=16), ["VRML 1 file"], swap_endian=0
 signature file-magic-auto34 {
 	file-mime "model/vrml", 190
@ -334,12 +250,6 @@ signature file-magic-auto51 {
 	file-magic /(\x23\x21 ?\x2fusr\x2fbin\x2fawk)/
 }

-# >0  string/wt,=#! /usr/bin/zsh (len=15), ["Paul Falstad's zsh script text executable"], swap_endian=0
-signature file-magic-auto52 {
-	file-mime "text/x-shellscript", 180
-	file-magic /(\x23\x21 ?\x2fusr\x2fbin\x2fzsh)/
-}
-
 # >0  string,=MAS_UTrack_V00 (len=14), [""], swap_endian=0
 # >>14  string,>/0 (len=2), ["ultratracker V1.%.1s module sound data"], swap_endian=0
 signature file-magic-auto53 {
@ -457,12 +367,6 @@ signature file-magic-auto70 {
 	file-magic /(\x3cmap ?version)/
 }

-# >0  string/wt,=#! /bin/tcsh (len=12), ["Tenex C shell script text executable"], swap_endian=0
-signature file-magic-auto71 {
-	file-mime "text/x-shellscript", 150
-	file-magic /(\x23\x21 ?\x2fbin\x2ftcsh)/
-}
-
 # >0  string/wt,=#! /bin/nawk (len=12), ["new awk script text executable"], swap_endian=0
 signature file-magic-auto72 {
 	file-mime "text/x-nawk", 150
@ -475,12 +379,6 @@ signature file-magic-auto73 {
 	file-magic /(\x23\x21 ?\x2fbin\x2fgawk)/
 }

-# >0  string/wt,=#! /bin/bash (len=12), ["Bourne-Again shell script text executable"], swap_endian=0
-signature file-magic-auto74 {
-	file-mime "text/x-shellscript", 150
-	file-magic /(\x23\x21 ?\x2fbin\x2fbash)/
-}
-
 # >0  string/wt,=#! /bin/awk (len=11), ["awk script text executable"], swap_endian=0
 signature file-magic-auto75 {
 	file-mime "text/x-awk", 140
@ -505,24 +403,6 @@ signature file-magic-auto78 {
 	file-magic /(d8\x3aannounce)/
 }

-# >0  string/wt,=#! /bin/csh (len=11), ["C shell script text executable"], swap_endian=0
-signature file-magic-auto79 {
-	file-mime "text/x-shellscript", 140
-	file-magic /(\x23\x21 ?\x2fbin\x2fcsh)/
-}
-
-# >0  string/wt,=#! /bin/ksh (len=11), ["Korn shell script text executable"], swap_endian=0
-signature file-magic-auto80 {
-	file-mime "text/x-shellscript", 140
-	file-magic /(\x23\x21 ?\x2fbin\x2fksh)/
-}
-
-# >0  string/wt,=#! /bin/zsh (len=11), ["Paul Falstad's zsh script text executable"], swap_endian=0
-signature file-magic-auto81 {
-	file-mime "text/x-shellscript", 140
-	file-magic /(\x23\x21 ?\x2fbin\x2fzsh)/
-}
-
 # >0  string/c,=BEGIN:VCARD (len=11), ["vCard visiting card"], swap_endian=0
 signature file-magic-auto82 {
 	file-mime "text/x-vcard", 140
@ -545,12 +425,6 @@ signature file-magic-auto84 {
 	file-magic /(Forward to)/
 }

-# >0  string/wt,=#! /bin/sh (len=10), ["POSIX shell script text executable"], swap_endian=0
-signature file-magic-auto85 {
-	file-mime "text/x-shellscript", 130
-	file-magic /(\x23\x21 ?\x2fbin\x2fsh)/
-}
-
 # >0  string,=II*\000\020\000\000\000CR (len=10), ["Canon CR2 raw image data"], swap_endian=0
 signature file-magic-auto86 {
 	file-mime "image/x-canon-cr2", 130
@ -585,12 +459,6 @@ signature file-magic-auto90 {
 	file-magic /(\x3cBookFile)/
 }

-# >2112  string,=MSWordDoc (len=9), ["Microsoft Word document data"], swap_endian=0
-signature file-magic-auto91 {
-	file-mime "application/msword", 120
-	file-magic /(.{2112})(MSWordDoc)/
-}
-
 # >0  string/t,=N#! rnews (len=9), ["mailed, batched news text"], swap_endian=0
 signature file-magic-auto92 {
 	file-mime "message/rfc822", 120
@ -656,12 +524,6 @@ signature file-magic-auto100 {
 	file-magic /(MSCF\x00\x00\x00\x00)/
 }

-# >0  string/b,=\320\317\021\340\241\261\032\341 (len=8), ["Microsoft Office Document"], swap_endian=0
-signature file-magic-auto101 {
-	file-mime "application/msword", 110
-	file-magic /(\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1)/
-}
-
 # >21  string/c,=!SCREAM! (len=8), ["Screamtracker 2 module sound data"], swap_endian=0
 signature file-magic-auto102 {
 	file-mime "audio/x-mod", 110
@ -754,10 +616,10 @@ signature file-magic-auto116 {
 }

 # >257  string,=ustar  \000 (len=8), ["GNU tar archive"], swap_endian=0
-signature file-magic-auto117 {
-	file-mime "application/x-tar", 110
-	file-magic /(.{257})(ustar  \x00)/
-}
+#signature file-magic-auto117 {
+#	file-mime "application/x-tar", 110
+#	file-magic /(.{257})(ustar  \x00)/
+#}

 # >0  string,=<MIFFile (len=8), ["FrameMaker MIF (ASCII) file"], swap_endian=0
 signature file-magic-auto118 {
@ -771,12 +633,6 @@ signature file-magic-auto119 {
 	file-magic /(PK\x07\x08PK\x03\x04)/
 }

-# >0  string/b,=\t\004\006\000\000\000\020\000 (len=8), ["Microsoft Excel Worksheet"], swap_endian=0
-signature file-magic-auto120 {
-	file-mime "application/vnd.ms-excel", 110
-	file-magic /(\x09\x04\x06\x00\x00\x00\x10\x00)/
-}
-
 # >0  string/b,=WordPro\000 (len=8), ["Lotus WordPro"], swap_endian=0
 signature file-magic-auto121 {
 	file-mime "application/vnd.lotus-wordpro", 110
@ -844,10 +700,10 @@ signature file-magic-auto130 {
 }

 # >257  string,=ustar\000 (len=6), ["POSIX tar archive"], swap_endian=0
-signature file-magic-auto131 {
-	file-mime "application/x-tar", 90
-	file-magic /(.{257})(ustar\x00)/
-}
+#signature file-magic-auto131 {
+#	file-mime "application/x-tar", 90
+#	file-magic /(.{257})(ustar\x00)/
+#}

 # >0  string,=AC1.40 (len=6), ["DWG AutoDesk AutoCAD Release 1.40"], swap_endian=0
 signature file-magic-auto132 {
@ -994,12 +850,6 @@ signature file-magic-auto155 {
 	file-magic /(\x23 xmcd)/
 }

-# >0  string/b,=\333\245-\000\000\000 (len=6), ["Microsoft Office Document"], swap_endian=0
-signature file-magic-auto156 {
-	file-mime "application/msword", 90
-	file-magic /(\xdb\xa5\x2d\x00\x00\x00)/
-}
-
 # >2  string,=MMXPR3 (len=6), ["Motorola Quark Express Document (English)"], swap_endian=0
 signature file-magic-auto157 {
 	file-mime "application/x-quark-xpress-3", 90
@ -1046,36 +896,6 @@ signature file-magic-auto162 {
 	file-magic /(\x3c\x3fxml)(.{15})(.*)( xmlns\x3d)(['"]http:\x2f\x2fwww.opengis.net\x2fkml)/
 }

-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>30  regex,=[Content_Types].xml|_rels/.rels (len=31), [""], swap_endian=0
-# >>>18 (lelong,+49), search/2000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>&26  search/1000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>>&26  string,=word/ (len=5), ["Microsoft Word 2007+"], swap_endian=0
-signature file-magic-auto163 {
-	file-mime "application/vnd.openxmlformats-officedocument.wordprocessingml.document", 80
-	file-magic /(PK\x03\x04)(.{26})(\[Content_Types\].xml|_rels\x2f.rels)(.*)(PK\x03\x04)(.{26})(.*)(PK\x03\x04)(.{26})(word\x2f)/
-}
-
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>30  regex,=[Content_Types].xml|_rels/.rels (len=31), [""], swap_endian=0
-# >>>18 (lelong,+49), search/2000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>&26  search/1000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>>&26  string,=ppt/ (len=4), ["Microsoft PowerPoint 2007+"], swap_endian=0
-signature file-magic-auto164 {
-	file-mime "application/vnd.openxmlformats-officedocument.presentationml.presentation", 70
-	file-magic /(PK\x03\x04)(.{26})(\[Content_Types\].xml|_rels\x2f.rels)(.*)(PK\x03\x04)(.{26})(.*)(PK\x03\x04)(.{26})(ppt\x2f)/
-}
-
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>30  regex,=[Content_Types].xml|_rels/.rels (len=31), [""], swap_endian=0
-# >>>18 (lelong,+49), search/2000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>&26  search/1000,=PK\003\004 (len=4), [""], swap_endian=0
-# >>>>>&26  string,=xl/ (len=3), ["Microsoft Excel 2007+"], swap_endian=0
-signature file-magic-auto165 {
-	file-mime "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", 60
-	file-magic /(PK\x03\x04)(.{26})(\[Content_Types\].xml|_rels\x2f.rels)(.*)(PK\x03\x04)(.{26})(.*)(PK\x03\x04)(.{26})(xl\x2f)/
-}
-
 # >60  string,=RINEX (len=5), [""], swap_endian=0
 # >>80  search/256,=XXRINEXB (len=8), ["RINEX Data, GEO SBAS Broadcast"], swap_endian=0
 # >>>5  string,x, [", version %6.6s"], swap_endian=0
@ -1229,30 +1049,12 @@ signature file-magic-auto187 {
 	file-magic /(\x00\x01\x00\x00\x00)/
 }

-# >0  string/b,=PO^Q` (len=5), ["Microsoft Word 6.0 Document"], swap_endian=0
-signature file-magic-auto188 {
-	file-mime "application/msword", 80
-	file-magic /(PO\x5eQ\x60)/
-}
-
 # >0  string,=%PDF- (len=5), ["PDF document"], swap_endian=0
 signature file-magic-auto189 {
 	file-mime "application/pdf", 80
 	file-magic /(\x25PDF\x2d)/
 }

-# >2114  string,=Biff5 (len=5), ["Microsoft Excel 5.0 Worksheet"], swap_endian=0
-signature file-magic-auto190 {
-	file-mime "application/vnd.ms-excel", 80
-	file-magic /(.{2114})(Biff5)/
-}
-
-# >2121  string,=Biff5 (len=5), ["Microsoft Excel 5.0 Worksheet"], swap_endian=0
-signature file-magic-auto191 {
-	file-mime "application/vnd.ms-excel", 80
-	file-magic /(.{2121})(Biff5)/
-}
-
 # >0  string/t,=Path: (len=5), ["news text"], swap_endian=0
 signature file-magic-auto192 {
 	file-mime "message/news", 80
@ -1383,12 +1185,6 @@ signature file-magic-auto211 {
 	file-magic /(\x00\x00\x00\x01)([\x07\x27\x47\x67\x87\xa7\xc7\xe7])/
 }

-# >0  belong&,=-889275714 (0xcafebabe), [""], swap_endian=0
-signature file-magic-auto212 {
-	file-mime "application/x-java-applet", 71
-	file-magic /(\xca\xfe\xba\xbe)/
-}
-
 # >0  belong&ffffffffffffff00,=256 (0x00000100), [""], swap_endian=0
 # >>3  byte&,=0xba, ["MPEG sequence"], swap_endian=0
 signature file-magic-auto213 {
@ -1706,46 +1502,6 @@ signature file-magic-auto245 {
 	file-magic /(PK\x03\x04)(.{22})(\x08\x00\x00\x00mimetypeapplication\x2f)(epub\x2bzip)/
 }

-# Seems redundant with other zip signature below.
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>26  string,=\b\000\000\000mimetypeapplication/ (len=24), [""], swap_endian=0
-# >>>50  string,!epub+zip (len=8), [""], swap_endian=0
-# >>>>50  string,!vnd.oasis.opendocument. (len=23), [""], swap_endian=0
-# >>>>>50  string,!vnd.sun.xml. (len=12), [""], swap_endian=0
-# >>>>>>50  string,!vnd.kde. (len=8), [""], swap_endian=0
-# >>>>>>>38  regex,=[!-OQ-~]+ (len=9), ["Zip data (MIME type "%s"?)"], swap_endian=0
-#signature file-magic-auto246 {
-#	file-mime "application/zip", 39
-#	file-magic /(PK\x03\x04)(.{22})(\x08\x00\x00\x00mimetypeapplication\x2f)/
-#}
-
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>26  string,=\b\000\000\000mimetype (len=12), [""], swap_endian=0
-# >>>38  string,!application/ (len=12), [""], swap_endian=0
-# >>>>38  regex,=[!-OQ-~]+ (len=9), ["Zip data (MIME type "%s"?)"], swap_endian=0
-signature file-magic-auto247 {
-	file-mime "application/zip", 39
-	file-magic /(PK\x03\x04)(.{22})(\x08\x00\x00\x00mimetype)/
-}
-
-# The indirect offset makes this difficult to convert.
-# The (.*) may be too generous.
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>26 (leshort,+30), leshort&,=-13570 (0xcafe), ["Java archive data (JAR)"], swap_endian=0
-signature file-magic-auto248 {
-	file-mime "application/java-archive", 50
-	file-magic /(PK\x03\x04)(.*)(\xfe\xca)/
-}
-
-# The indeirect offset and string inequality make this difficult to convert.
-# >0  string,=PK\003\004 (len=4), [""], swap_endian=0
-# >>26 (leshort,+30), leshort&,!-13570 (0xcafe), [""], swap_endian=0
-# >>>26  string,!\b\000\000\000mimetype (len=12), ["Zip archive data"], swap_endian=0
-signature file-magic-auto249 {
-	file-mime "application/zip", 10
-	file-magic /(PK\x03\x04)(.{2})/
-}
-
 # >0  belong&,=442 (0x000001ba), [""], swap_endian=0
 # >>4  byte&,&0x40, [""], swap_endian=0
 signature file-magic-auto250 {
@ -2065,18 +1821,6 @@ signature file-magic-auto299 {
 	file-magic /(PDN3)/
 }

-# >0  ulelong&,=2712847316 (0xa1b2c3d4), ["tcpdump capture file (little-endian)"], swap_endian=0
-signature file-magic-auto300 {
-	file-mime "application/vnd.tcpdump.pcap", 70
-	file-magic /(\xd4\xc3\xb2\xa1)/
-}
-
-# >0  ubelong&,=2712847316 (0xa1b2c3d4), ["tcpdump capture file (big-endian)"], swap_endian=0
-signature file-magic-auto301 {
-	file-mime "application/vnd.tcpdump.pcap", 70
-	file-magic /(\xa1\xb2\xc3\xd4)/
-}
-
 # >0  belong&,=-17957139 (0xfeedfeed), ["Java KeyStore"], swap_endian=0
 signature file-magic-auto302 {
 	file-mime "application/x-java-keystore", 70
@ -2297,12 +2041,6 @@ signature file-magic-auto335 {
 	file-magic /(SIT\x21)/
 }

-# >0  lelong&,=574529400 (0x223e9f78), ["Transport Neutral Encapsulation Format"], swap_endian=0
-signature file-magic-auto336 {
-	file-mime "application/vnd.ms-tnef", 70
-	file-magic /(\x78\x9f\x3e\x22)/
-}
-
 # >0  string,=<ar> (len=4), ["System V Release 1 ar archive"], swap_endian=0
 signature file-magic-auto337 {
 	file-mime "application/x-archive", 70
@ -2433,48 +2171,6 @@ signature file-magic-auto357 {
 	file-magic /(RIFF)(.{4})(AVI )/
 }

-# >0  belong&,=834535424 (0x31be0000), ["Microsoft Word Document"], swap_endian=0
-signature file-magic-auto358 {
-	file-mime "application/msword", 70
-	file-magic /(\x31\xbe\x00\x00)/
-}
-
-# >0  string/b,=\3767\000# (len=4), ["Microsoft Office Document"], swap_endian=0
-signature file-magic-auto359 {
-	file-mime "application/msword", 70
-	file-magic /(\xfe7\x00\x23)/
-}
-
-# >0  string/b,=\333\245-\000 (len=4), ["Microsoft WinWord 2.0 Document"], swap_endian=0
-signature file-magic-auto360 {
-	file-mime "application/msword", 70
-	file-magic /(\xdb\xa5\x2d\x00)/
-}
-
-# >0  string/b,=\333\245-\000 (len=4), ["Microsoft WinWord 2.0 Document"], swap_endian=0
-signature file-magic-auto361 {
-	file-mime "application/msword", 70
-	file-magic /(\xdb\xa5\x2d\x00)/
-}
-
-# >0  belong&,=6656 (0x00001a00), ["Lotus 1-2-3"], swap_endian=0
-signature file-magic-auto362 {
-	file-mime "application/x-123", 70
-	file-magic /(\x00\x00\x1a\x00)/
-}
-
-# >0  belong&,=512 (0x00000200), ["Lotus 1-2-3"], swap_endian=0
-signature file-magic-auto363 {
-	file-mime "application/x-123", 70
-	file-magic /(\x00\x00\x02\x00)/
-}
-
-# >0  string/b,=\000\000\001\000 (len=4), ["MS Windows icon resource"], swap_endian=0
-signature file-magic-auto364 {
-	file-mime "image/x-icon", 70
-	file-magic /(\x00\x00\x01\x00)/
-}
-
 # >0  lelong&,=268435536 (0x10000050), ["Psion Series 5"], swap_endian=0
 # >>4  lelong&,=268435565 (0x1000006d), ["database"], swap_endian=0
 # >>>8  lelong&,=268435588 (0x10000084), ["Agenda file"], swap_endian=0
@ -2737,12 +2433,6 @@ signature file-magic-auto403 {
 	file-magic /(SBI)/
 }

-# >0  string/b,=\224\246. (len=3), ["Microsoft Word Document"], swap_endian=0
-signature file-magic-auto404 {
-	file-mime "application/msword", 60
-	file-magic /(\x94\xa6\x2e)/
-}
-
 # >0  string,=\004%! (len=3), ["PostScript document text"], swap_endian=0
 signature file-magic-auto405 {
 	file-mime "application/postscript", 60
@ -2763,17 +2453,11 @@ signature file-magic-auto407 {
 	file-magic /(.*)([ \x09]*(class|module)[ \x09][A-Z])((modul|includ)e [A-Z]|def [a-z])(^[ \x09]*end([ \x09]*[;#].*)?$)/
 }

-# >512  string/b,=\354\245\301 (len=3), ["Microsoft Word Document"], swap_endian=0
-signature file-magic-auto408 {
-	file-mime "application/msword", 60
-	file-magic /(.{512})(\xec\xa5\xc1)/
-}
-
 # >0  regex/20,=^\.[A-Za-z0-9][A-Za-z0-9][ \t] (len=29), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto411 {
-	file-mime "text/troff", 59
-	file-magic /(^\.[A-Za-z0-9][A-Za-z0-9][ \x09])/
-}
+#signature file-magic-auto411 {
+#	file-mime "text/troff", 59
+#	file-magic /(^\.[A-Za-z0-9][A-Za-z0-9][ \x09])/
+#}

 # >0  search/4096,=\documentclass (len=14), ["LaTeX 2e document text"], swap_endian=0
 signature file-magic-auto412 {
@ -2806,10 +2490,10 @@ signature file-magic-auto416 {
 }

 # >0  regex/20,=^\.[A-Za-z0-9][A-Za-z0-9]$ (len=26), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto417 {
-	file-mime "text/troff", 56
-	file-magic /(^\.[A-Za-z0-9][A-Za-z0-9]$)/
-}
+#signature file-magic-auto417 {
+#	file-mime "text/troff", 56
+#	file-magic /(^\.[A-Za-z0-9][A-Za-z0-9]$)/
+#}

 # >0  search/w/1,=#! /usr/bin/php (len=15), ["PHP script text executable"], swap_endian=0
 signature file-magic-auto418 {
@ -2829,30 +2513,12 @@ signature file-magic-auto420 {
 	file-magic /(.*)(eval \x22exec \x2fusr\x2fbin\x2fperl)/
 }

-# >0  search/w/1,=#! /usr/local/bin/python (len=24), ["Python script text executable"], swap_endian=0
-signature file-magic-auto421 {
-	file-mime "text/x-python", 54
-	file-magic /(.*)(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fpython)/
-}
-
 # >0  search/1,=Common subdirectories:  (len=23), ["diff output text"], swap_endian=0
 signature file-magic-auto422 {
 	file-mime "text/x-diff", 53
 	file-magic /(.*)(Common subdirectories\x3a )/
 }

-# >0  search/1,=#! /usr/bin/env python (len=22), ["Python script text executable"], swap_endian=0
-signature file-magic-auto423 {
-	file-mime "text/x-python", 52
-	file-magic /(.*)(\x23\x21 \x2fusr\x2fbin\x2fenv python)/
-}
-
-# >0  search/w/1,=#! /usr/local/bin/ruby (len=22), ["Ruby script text executable"], swap_endian=0
-signature file-magic-auto424 {
-	file-mime "text/x-ruby", 52
-	file-magic /(.*)(\x23\x21 ?\x2fusr\x2flocal\x2fbin\x2fruby)/
-}
-
 # >0  search/w/1,=#! /usr/local/bin/wish (len=22), ["Tcl/Tk script text executable"], swap_endian=0
 signature file-magic-auto425 {
 	file-mime "text/x-tcl", 52
@ -2871,12 +2537,6 @@ signature file-magic-auto427 {
 	file-magic /(\xff\xd8)/
 }

-# >0  search/1,=#!/usr/bin/env python (len=21), ["Python script text executable"], swap_endian=0
-signature file-magic-auto428 {
-	file-mime "text/x-python", 51
-	file-magic /(.*)(\x23\x21\x2fusr\x2fbin\x2fenv python)/
-}
-
 # >0  search/1,=#!/usr/bin/env nodejs (len=21), ["Node.js script text executable"], swap_endian=0
 signature file-magic-auto429 {
 	file-mime "application/javascript", 51
@ -3189,12 +2849,6 @@ signature file-magic-auto474 {
 	file-magic /(\x25\x21)/
 }

-# >0  search/1,=#! /usr/bin/env ruby (len=20), ["Ruby script text executable"], swap_endian=0
-signature file-magic-auto475 {
-	file-mime "text/x-ruby", 50
-	file-magic /(.*)(\x23\x21 \x2fusr\x2fbin\x2fenv ruby)/
-}
-
 # >0  regex/1,=(^[0-9]{5})[acdn][w] (len=20), ["MARC21 Classification"], swap_endian=0
 signature file-magic-auto476 {
 	file-mime "application/marc", 50
@ -3228,10 +2882,10 @@ signature file-magic-auto480 {
 }

 # >0  string,=\n( (len=2), ["Emacs v18 byte-compiled Lisp data"], swap_endian=0
-signature file-magic-auto481 {
-	file-mime "application/x-elc", 50
-	file-magic /(\x0a\x28)/
-}
+#signature file-magic-auto481 {
+#	file-mime "application/x-elc", 50
+#	file-magic /(\x0a\x28)/
+#}

 # >0  string,=\021\t (len=2), ["Award BIOS Logo, 136 x 126"], swap_endian=0
 signature file-magic-auto482 {
@ -3305,17 +2959,17 @@ signature file-magic-auto493 {
 	file-magic /(\xf7\x02)/
 }

-# >2  string,=\000\021 (len=2), ["TeX font metric data"], swap_endian=0
-signature file-magic-auto494 {
-	file-mime "application/x-tex-tfm", 50
-	file-magic /(.{2})(\x00\x11)/
-}
-
-# >2  string,=\000\022 (len=2), ["TeX font metric data"], swap_endian=0
-signature file-magic-auto495 {
-	file-mime "application/x-tex-tfm", 50
-	file-magic /(.{2})(\x00\x12)/
-}
+## >2  string,=\000\021 (len=2), ["TeX font metric data"], swap_endian=0
+#signature file-magic-auto494 {
+#	file-mime "application/x-tex-tfm", 50
+#	file-magic /(.{2})(\x00\x11)/
+#}
+#
+## >2  string,=\000\022 (len=2), ["TeX font metric data"], swap_endian=0
+#signature file-magic-auto495 {
+#	file-mime "application/x-tex-tfm", 50
+#	file-magic /(.{2})(\x00\x12)/
+#}

 # >0  beshort&,=-31486 (0x8502), ["GPG encrypted data"], swap_endian=0
 signature file-magic-auto496 {
@ -3470,12 +3124,6 @@ signature file-magic-auto514 {
 	file-magic /(.*)(\x23\x21 \x2fusr\x2fbin\x2fenv lua)/
 }

-# >0  search/1,=#!/usr/bin/env ruby (len=19), ["Ruby script text executable"], swap_endian=0
-signature file-magic-auto515 {
-	file-mime "text/x-ruby", 49
-	file-magic /(.*)(\x23\x21\x2fusr\x2fbin\x2fenv ruby)/
-}
-
 # >0  search/1,=#! /usr/bin/env tcl (len=19), ["Tcl script text executable"], swap_endian=0
 signature file-magic-auto516 {
 	file-mime "text/x-tcl", 49
@ -3493,12 +3141,6 @@ signature file-magic-auto519 {
 	file-magic /(.*)(\x23\x21\x2fusr\x2fbin\x2fenv lua)/
 }

-# >0  search/w/1,=#! /usr/bin/python (len=18), ["Python script text executable"], swap_endian=0
-signature file-magic-auto520 {
-	file-mime "text/x-python", 48
-	file-magic /(.*)(\x23\x21 ?\x2fusr\x2fbin\x2fpython)/
-}
-
 # >0  search/w/1,=#!/usr/bin/nodejs (len=17), ["Node.js script text executable"], swap_endian=0
 signature file-magic-auto521 {
 	file-mime "application/javascript", 47
@ -3506,10 +3148,10 @@ signature file-magic-auto521 {
 }

 # >0  regex,=^class[ \t\n]+ (len=12), ["C++ source text"], swap_endian=0
-signature file-magic-auto522 {
-	file-mime "text/x-c++", 47
-	file-magic /(.*)(class[ \x09\x0a]+[[:alnum:]_]+)(.*)(\x7b)(.*)(public:)/
-}
+#signature file-magic-auto522 {
+#	file-mime "text/x-c++", 47
+#	file-magic /(.*)(class[ \x09\x0a]+[[:alnum:]_]+)(.*)(\x7b)(.*)(public:)/
+#}

 # >0  search/1,=This is Info file (len=17), ["GNU Info text"], swap_endian=0
 signature file-magic-auto528 {
@ -3658,12 +3300,6 @@ signature file-magic-auto545 {
 	file-magic /(.*)(\x23\x21 ?\x2fusr\x2fbin\x2fwish)/
 }

-# >0  search/w/1,=#! /usr/bin/ruby (len=16), ["Ruby script text executable"], swap_endian=0
-signature file-magic-auto546 {
-	file-mime "text/x-ruby", 46
-	file-magic /(.*)(\x23\x21 ?\x2fusr\x2fbin\x2fruby)/
-}
-
 # >0  search/w/1,=#! /usr/bin/lua (len=15), ["Lua script text executable"], swap_endian=0
 signature file-magic-auto547 {
 	file-mime "text/x-lua", 45
@ -3727,10 +3363,10 @@ signature file-magic-auto556 {
 }

 # >0  regex,=^extern[ \t\n]+ (len=13), ["C source text"], swap_endian=0
-signature file-magic-auto557 {
-	file-mime "text/x-c", 43
-	file-magic /(.*)(extern[ \x09\x0a]+)/
-}
+#signature file-magic-auto557 {
+#	file-mime "text/x-c", 43
+#	file-magic /(.*)(extern[ \x09\x0a]+)/
+#}

 # >0  search/4096,=% -*-latex-*- (len=13), ["LaTeX document text"], swap_endian=0
 signature file-magic-auto558 {
@ -3746,10 +3382,10 @@ signature file-magic-auto558 {
 #}

 # >0  regex,=^struct[ \t\n]+ (len=13), ["C source text"], swap_endian=0
-signature file-magic-auto560 {
-	file-mime "text/x-c", 43
-	file-magic /(.*)(struct[ \x09\x0a]+)/
-}
+#signature file-magic-auto560 {
+#	file-mime "text/x-c", 43
+#	file-magic /(.*)(struct[ \x09\x0a]+)/
+#}

 # >0  search/w/1,=#!/bin/nodejs (len=13), ["Node.js script text executable"], swap_endian=0
 signature file-magic-auto561 {
@ -3802,10 +3438,10 @@ signature file-magic-auto567 {
 }

 # >0  regex,=^char[ \t\n]+ (len=11), ["C source text"], swap_endian=0
-signature file-magic-auto568 {
-	file-mime "text/x-c", 41
-	file-magic /(.*)(char[ \x09\x0a]+)/
-}
+#signature file-magic-auto568 {
+#	file-mime "text/x-c", 41
+#	file-magic /(.*)(char[ \x09\x0a]+)/
+#}

 # >0  search/1,=#! (len=2), [""], swap_endian=0
 # >>0  regex,=^#!.*/bin/perl$ (len=15), ["Perl script text executable"], swap_endian=0
@ -3887,23 +3523,11 @@ signature file-magic-auto578 {
 	file-magic /(^dnl )/
 }

-# >0  regex,=^all: (len=5), ["makefile script text"], swap_endian=0
-signature file-magic-auto579 {
-	file-mime "text/x-makefile", 40
-	file-magic /(^all:)/
-}
-
-# >0  regex,=^.PRECIOUS (len=10), ["makefile script text"], swap_endian=0
-signature file-magic-auto580 {
-	file-mime "text/x-makefile", 40
-	file-magic /(^.PRECIOUS)/
-}
-
 # >0  search/8192,=main( (len=5), ["C source text"], swap_endian=0
-signature file-magic-auto581 {
-	file-mime "text/x-c", 40
-	file-magic /(.*)(main\x28)/
-}
+#signature file-magic-auto581 {
+#	file-mime "text/x-c", 40
+#	file-magic /(.*)(main\x28)/
+#}

 # Not specific enough.
 # >0  search/1,=\" (len=2), ["troff or preprocessor input text"], swap_endian=0
@ -3932,22 +3556,22 @@ signature file-magic-auto584 {
 #}

 # >0  regex,=^#include (len=9), ["C source text"], swap_endian=0
-signature file-magic-auto586 {
-	file-mime "text/x-c", 39
-	file-magic /(.*)(#include)/
-}
+#signature file-magic-auto586 {
+#	file-mime "text/x-c", 39
+#	file-magic /(.*)(#include)/
+#}

 # >0  search/1,=.\" (len=3), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto587 {
-	file-mime "text/troff", 39
-	file-magic /(.*)(\x2e\x5c\x22)/
-}
+#signature file-magic-auto587 {
+#	file-mime "text/troff", 39
+#	file-magic /(.*)(\x2e\x5c\x22)/
+#}

 # >0  search/1,='\" (len=3), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto588 {
-	file-mime "text/troff", 39
-	file-magic /(.*)(\x27\x5c\x22)/
-}
+#signature file-magic-auto588 {
+#	file-mime "text/troff", 39
+#	file-magic /(.*)(\x27\x5c\x22)/
+#}

 # >0  search/1,=<TeXmacs| (len=9), ["TeXmacs document text"], swap_endian=0
 signature file-magic-auto589 {
@ -3974,10 +3598,10 @@ signature file-magic-auto592 {
 }

 # >0  search/1,=''' (len=3), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto593 {
-	file-mime "text/troff", 39
-	file-magic /(.*)(\x27\x27\x27)/
-}
+#signature file-magic-auto593 {
+#	file-mime "text/troff", 39
+#	file-magic /(.*)(\x27\x27\x27)/
+#}

 # >0  search/4096,=try: (len=4), [""], swap_endian=0
 # >>&0  regex,=^\s*except.*: (len=13), ["Python script text executable"], swap_endian=0
@ -3999,12 +3623,6 @@ signature file-magic-auto596 {
 	file-magic /(.*)(\x22LIBHDR\x22)/
 }

-# >0  regex,=^SUBDIRS (len=8), ["automake makefile script text"], swap_endian=0
-signature file-magic-auto597 {
-	file-mime "text/x-makefile", 38
-	file-magic /(.*)(SUBDIRS)/
-}
-
 # >0  search/4096,=(defvar  (len=8), ["Lisp/Scheme program text"], swap_endian=0
 signature file-magic-auto598 {
 	file-mime "text/x-lisp", 38
@ -4031,19 +3649,6 @@ signature file-magic-auto600 {
 #	file-magic /(.*)(\x2a\x2a\x2a )/
 #}

-# >0  search/1,='.\" (len=4), ["troff or preprocessor input text"], swap_endian=0
-signature file-magic-auto602 {
-	file-mime "text/troff", 38
-	file-magic /(.*)(\x27\x2e\x5c\x22)/
-}
-
-# LDFLAGS appears in other contexts, e.g. shell script.
-# >0  regex,=^LDFLAGS (len=8), ["makefile script text"], swap_endian=0
-#signature file-magic-auto603 {
-#	file-mime "text/x-makefile", 38
-#	file-magic /(.*)(LDFLAGS)/
-#}
-
 # >0  search/8192,="libhdr" (len=8), ["BCPL source text"], swap_endian=0
 signature file-magic-auto604 {
 	file-mime "text/x-bcpl", 38
@ -4057,12 +3662,6 @@ signature file-magic-auto604 {
 #	file-magic /(^record)/
 #}

-# >0  regex,=^CFLAGS (len=7), ["makefile script text"], swap_endian=0
-signature file-magic-auto606 {
-	file-mime "text/x-makefile", 37
-	file-magic /(.*)(CFLAGS)/
-}
-
 # >0  search/4096,=(defun  (len=7), ["Lisp/Scheme program text"], swap_endian=0
 signature file-magic-auto607 {
 	file-mime "text/x-lisp", 37
--- a/scripts/base/frameworks/files/magic/msoffice.sig
+++ b/scripts/base/frameworks/files/magic/msoffice.sig
@ -0,0 +1,28 @@
+
+# This signature is non-specific and terrible but after
+# searching for a long time there doesn't seem to be a 
+# better option.  
+signature file-msword {
+	file-magic /^\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1/
+	file-mime "application/msword", 50
+}
+
+signature file-ooxml {
+	file-magic /^PK\x03\x04\x14\x00\x06\x00/
+	file-mime "application/vnd.openxmlformats-officedocument", 50
+}
+
+signature file-docx {
+	file-magic /^PK\x03\x04.{26}(\[Content_Types\]\.xml|_rels\x2f\.rels|word\x2f).*PK\x03\x04.{26}word\x2f/
+	file-mime "application/vnd.openxmlformats-officedocument.wordprocessingml.document", 80
+}
+
+signature file-xlsx {
+	file-magic /^PK\x03\x04.{26}(\[Content_Types\]\.xml|_rels\x2f\.rels|xl\2f).*PK\x03\x04.{26}xl\x2f/
+	file-mime "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", 80
+}
+
+signature file-pptx {
+	file-magic /^PK\x03\x04.{26}(\[Content_Types\]\.xml|_rels\x2f\.rels|ppt\x2f).*PK\x03\x04.{26}ppt\x2f/
+	file-mime "application/vnd.openxmlformats-officedocument.presentationml.presentation", 80
+}
--- a/scripts/base/frameworks/files/main.bro
+++ b/scripts/base/frameworks/files/main.bro
@ -100,8 +100,9 @@ export {
 		## during the process of analysis e.g. due to dropped packets.
 		missing_bytes: count &log &default=0;

-		## The number of not all-in-sequence bytes in the file stream that
-		## were delivered to file analyzers due to reassembly buffer overflow.
+		## The number of bytes in the file stream that were not delivered to
+		## stream file analyzers.  This could be overlapping bytes or 
+		## bytes that couldn't be reassembled.
 		overflow_bytes: count &log &default=0;

 		## Whether the file analysis timed out at least once for the file.
@ -124,6 +125,37 @@ export {
 	## generate two handles that would hash to the same file id.
 	const salt = "I recommend changing this." &redef;

+	## Decide if you want to automatically attached analyzers to 
+	## files based on the detected mime type of the file.
+	const analyze_by_mime_type_automatically = T &redef;
+
+	## The default setting for if the file reassembler is enabled for 
+	## each file.
+	const enable_reassembler = T &redef;
+
+	## The default per-file reassembly buffer size.
+	const reassembly_buffer_size = 1048576 &redef;
+
+	## Allows the file reassembler to be used if it's necessary because the
+	## file is transferred out of order.
+	##
+	## f: the file.
+	global enable_reassembly: function(f: fa_file);
+
+	## Disables the file reassembler on this file.  If the file is not 
+	## transferred out of order this will have no effect.
+	##
+	## f: the file.
+	global disable_reassembly: function(f: fa_file);
+
+	## Set the maximum size the reassembly buffer is allowed to grow
+	## for the given file.
+	##
+	## f: the file.
+	##
+	## max: Maximum allowed size of the reassembly buffer.
+	global set_reassembly_buffer_size: function(f: fa_file, max: count);
+
 	## Sets the *timeout_interval* field of :bro:see:`fa_file`, which is
 	## used to determine the length of inactivity that is allowed for a file
 	## before internal state related to it is cleaned up.  When used within
@ -153,15 +185,6 @@ export {
 	                              tag: Files::Tag,
 	                              args: AnalyzerArgs &default=AnalyzerArgs()): bool;

-	## Adds all analyzers associated with a give MIME type to the analysis of
-	## a file.  Note that analyzers added via MIME types cannot take further
-	## arguments.
-	##
-	## f: the file.
-	##
-	## mtype: the MIME type; it will be compared case-insensitive.
-	global add_analyzers_for_mime_type: function(f: fa_file, mtype: string);
-
 	## Removes an analyzer from the analysis of a given file.
 	##
 	## f: the file.
@ -284,6 +307,7 @@ global registered_protocols: table[Analyzer::Tag] of ProtoRegistration = table()

 # Store the MIME type to analyzer mappings.
 global mime_types: table[Analyzer::Tag] of set[string];
+global mime_type_to_analyzers: table[string] of set[Analyzer::Tag];

 global analyzer_add_callbacks: table[Files::Tag] of function(f: fa_file, args: AnalyzerArgs) = table();

@ -313,8 +337,6 @@ function set_info(f: fa_file)
 	f$info$overflow_bytes = f$overflow_bytes;
 	if ( f?$is_orig )
 		f$info$is_orig = f$is_orig;
-	if ( f?$mime_type )
-		f$info$mime_type = f$mime_type;
 	}

 function set_timeout_interval(f: fa_file, t: interval): bool
@ -322,6 +344,21 @@ function set_timeout_interval(f: fa_file, t: interval): bool
 	return __set_timeout_interval(f$id, t);
 	}

+function enable_reassembly(f: fa_file)
+	{
+	__enable_reassembly(f$id);
+	}
+
+function disable_reassembly(f: fa_file)
+	{
+	__disable_reassembly(f$id);
+	}
+
+function set_reassembly_buffer_size(f: fa_file, max: count)
+	{
+	__set_reassembly_buffer(f$id, max);
+	}
+
 function add_analyzer(f: fa_file, tag: Files::Tag, args: AnalyzerArgs): bool
 	{
 	add f$info$analyzers[Files::analyzer_name(tag)];
@ -337,15 +374,6 @@ function add_analyzer(f: fa_file, tag: Files::Tag, args: AnalyzerArgs): bool
 	return T;
 	}

-function add_analyzers_for_mime_type(f: fa_file, mtype: string)
-	{
-	local dummy_args: AnalyzerArgs;
-	local analyzers = __add_analyzers_for_mime_type(f$id, mtype, dummy_args);
-
-	for ( tag in analyzers )
-		add f$info$analyzers[Files::analyzer_name(tag)];
-	}
-
 function register_analyzer_add_callback(tag: Files::Tag, callback: function(f: fa_file, args: AnalyzerArgs))
 	{
 	analyzer_add_callbacks[tag] = callback;
@ -366,42 +394,6 @@ function analyzer_name(tag: Files::Tag): string
 	return __analyzer_name(tag);
 	}

-event file_new(f: fa_file) &priority=10
-	{
-	set_info(f);
-
-	if ( f?$mime_type )
-		add_analyzers_for_mime_type(f, f$mime_type);
-	}
-
-event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priority=10
-	{
-	set_info(f);
-	add f$info$conn_uids[c$uid];
-	local cid = c$id;
-	add f$info$tx_hosts[f$is_orig ? cid$orig_h : cid$resp_h];
-	if( |Site::local_nets| > 0 )
-		f$info$local_orig=Site::is_local_addr(f$is_orig ? cid$orig_h : cid$resp_h);
-
-	add f$info$rx_hosts[f$is_orig ? cid$resp_h : cid$orig_h];
-	}
-
-event file_timeout(f: fa_file) &priority=10
-	{
-	set_info(f);
-	f$info$timedout = T;
-	}
-
-event file_state_remove(f: fa_file) &priority=10
-	{
-	set_info(f);
-	}
-
-event file_state_remove(f: fa_file) &priority=-10
-	{
-	Log::write(Files::LOG, f$info);
-	}
-
 function register_protocol(tag: Analyzer::Tag, reg: ProtoRegistration): bool
 	{
 	local result = (tag !in registered_protocols);
@ -424,13 +416,18 @@ function register_for_mime_types(tag: Analyzer::Tag, mime_types: set[string]) :

 function register_for_mime_type(tag: Analyzer::Tag, mt: string) : bool
 	{
-	if ( ! __register_for_mime_type(tag, mt) )
-		return F;
-
 	if ( tag !in mime_types )
+		{
 		mime_types[tag] = set();
-
+		}
 	add mime_types[tag][mt];
+
+	if ( mt !in mime_type_to_analyzers )
+		{
+		mime_type_to_analyzers[mt] = set();
+		}
+	add mime_type_to_analyzers[mt][tag];
+
 	return T;
 	}

@ -462,3 +459,61 @@ event get_file_handle(tag: Analyzer::Tag, c: connection, is_orig: bool) &priorit
 	local handler = registered_protocols[tag];
 	set_file_handle(handler$get_file_handle(c, is_orig));
 	}
+
+event file_new(f: fa_file) &priority=10
+	{
+	set_info(f);
+
+	if ( enable_reassembler )
+		{
+		Files::enable_reassembly(f);
+		Files::set_reassembly_buffer_size(f, reassembly_buffer_size);
+		}
+	}
+
+event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priority=10
+	{
+	set_info(f);
+
+	add f$info$conn_uids[c$uid];
+	local cid = c$id;
+	add f$info$tx_hosts[f$is_orig ? cid$orig_h : cid$resp_h];
+	if( |Site::local_nets| > 0 )
+		f$info$local_orig=Site::is_local_addr(f$is_orig ? cid$orig_h : cid$resp_h);
+
+	add f$info$rx_hosts[f$is_orig ? cid$resp_h : cid$orig_h];
+	}
+
+event file_mime_type(f: fa_file, mime_type: string) &priority=10
+	{
+	set_info(f);
+
+	f$info$mime_type = mime_type;
+
+	if ( analyze_by_mime_type_automatically &&
+	     mime_type in mime_type_to_analyzers )
+		{
+		local analyzers = mime_type_to_analyzers[mime_type];
+		for ( a in analyzers )
+			{
+			add f$info$analyzers[Files::analyzer_name(a)];
+			Files::add_analyzer(f, a);
+			}
+		}
+	}
+
+event file_timeout(f: fa_file) &priority=10
+	{
+	set_info(f);
+	f$info$timedout = T;
+	}
+
+event file_state_remove(f: fa_file) &priority=10
+	{
+	set_info(f);
+	}
+
+event file_state_remove(f: fa_file) &priority=-10
+	{
+	Log::write(Files::LOG, f$info);
+	}
--- a/scripts/base/frameworks/intel/main.bro
+++ b/scripts/base/frameworks/intel/main.bro
@ -67,6 +67,7 @@ export {
 		IN_ANYWHERE,
 	};

+	## Information about a piece of "seen" data.
 	type Seen: record {
 		## The string if the data is about a string.
 		indicator:       string        &log &optional;
@ -124,7 +125,7 @@ export {
 		sources:  set[string]    &log &default=string_set();
 	};

-	## Intelligence data manipulation functions.
+	## Intelligence data manipulation function.
 	global insert: function(item: Item);

 	## Function to declare discovery of a piece of data in order to check
@ -289,8 +290,8 @@ event Intel::match(s: Seen, items: set[Item]) &priority=5
 		if ( ! info?$fuid )
 			info$fuid = s$f$id;

-		if ( ! info?$file_mime_type && s$f?$mime_type )
-			info$file_mime_type = s$f$mime_type;
+		if ( ! info?$file_mime_type && s$f?$info && s$f$info?$mime_type )
+			info$file_mime_type = s$f$info$mime_type;

 		if ( ! info?$file_desc )
 			info$file_desc = Files::describe(s$f);
--- a/scripts/base/frameworks/logging/main.bro
+++ b/scripts/base/frameworks/logging/main.bro
@ -405,30 +405,30 @@ function default_path_func(id: ID, path: string, rec: any) : string

 	local id_str = fmt("%s", id);

-	local parts = split1(id_str, /::/);
+	local parts = split_string1(id_str, /::/);
 	if ( |parts| == 2 )
 		{
 		# Example: Notice::LOG -> "notice"
-		if ( parts[2] == "LOG" )
+		if ( parts[1] == "LOG" )
 			{
-			local module_parts = split_n(parts[1], /[^A-Z][A-Z][a-z]*/, T, 4);
+			local module_parts = split_string_n(parts[0], /[^A-Z][A-Z][a-z]*/, T, 4);
 			local output = "";
-			if ( 1 in module_parts )
-				output = module_parts[1];
+			if ( 0 in module_parts )
+				output = module_parts[0];
+			if ( 1 in module_parts && module_parts[1] != "" )
+				output = cat(output, sub_bytes(module_parts[1],1,1), "_", sub_bytes(module_parts[1], 2, |module_parts[1]|));
 			if ( 2 in module_parts && module_parts[2] != "" )
-				output = cat(output, sub_bytes(module_parts[2],1,1), "_", sub_bytes(module_parts[2], 2, |module_parts[2]|));
+				output = cat(output, "_", module_parts[2]);
 			if ( 3 in module_parts && module_parts[3] != "" )
-				output = cat(output, "_", module_parts[3]);
-			if ( 4 in module_parts && module_parts[4] != "" )
-				output = cat(output, sub_bytes(module_parts[4],1,1), "_", sub_bytes(module_parts[4], 2, |module_parts[4]|));
+				output = cat(output, sub_bytes(module_parts[3],1,1), "_", sub_bytes(module_parts[3], 2, |module_parts[3]|));
 			return to_lower(output);
 			}

 		# Example: Notice::POLICY_LOG -> "notice_policy"
-		if ( /_LOG$/ in parts[2] )
-			parts[2] = sub(parts[2], /_LOG$/, "");
+		if ( /_LOG$/ in parts[1] )
+			parts[1] = sub(parts[1], /_LOG$/, "");

-		return cat(to_lower(parts[1]),"_",to_lower(parts[2]));
+		return cat(to_lower(parts[0]),"_",to_lower(parts[1]));
 		}
 	else
 		return to_lower(id_str);
--- a/scripts/base/frameworks/notice/main.bro
+++ b/scripts/base/frameworks/notice/main.bro
@ -531,8 +531,8 @@ function create_file_info(f: fa_file): Notice::FileInfo
 	local fi: Notice::FileInfo = Notice::FileInfo($fuid = f$id,
 	                                              $desc = Files::describe(f));

-	if ( f?$mime_type )
-		fi$mime = f$mime_type;
+	if ( f?$info && f$info?$mime_type )
+		fi$mime = f$info$mime_type;

 	if ( f?$conns && |f$conns| == 1 )
 		for ( id in f$conns )
--- a/scripts/base/frameworks/software/main.bro
+++ b/scripts/base/frameworks/software/main.bro
@ -133,62 +133,62 @@ function parse(unparsed_version: string): Description
 		{
 		# The regular expression should match the complete version number
 		# and software name.
-		local version_parts = split_n(unparsed_version, /\/?( [\(])?v?[0-9\-\._, ]{2,}/, T, 1);
-		if ( 1 in version_parts )
+		local version_parts = split_string_n(unparsed_version, /\/?( [\(])?v?[0-9\-\._, ]{2,}/, T, 1);
+		if ( 0 in version_parts )
 			{
-			if ( /^\(/ in version_parts[1] )
-				software_name = strip(sub(version_parts[1], /[\(]/, ""));
+			if ( /^\(/ in version_parts[0] )
+				software_name = strip(sub(version_parts[0], /[\(]/, ""));
 			else
-				software_name = strip(version_parts[1]);
+				software_name = strip(version_parts[0]);
 			}
 		if ( |version_parts| >= 2 )
 			{
 			# Remove the name/version separator if it's left at the beginning
 			# of the version number from the previous split_all.
-			local sv = strip(version_parts[2]);
+			local sv = strip(version_parts[1]);
 			if ( /^[\/\-\._v\(]/ in sv )
-				sv = strip(sub(version_parts[2], /^\(?[\/\-\._v\(]/, ""));
-			local version_numbers = split_n(sv, /[\-\._,\[\(\{ ]/, F, 3);
-			if ( 5 in version_numbers && version_numbers[5] != "" )
-				v$addl = strip(version_numbers[5]);
-			else if ( 3 in version_parts && version_parts[3] != "" &&
-			          version_parts[3] != ")" )
+				sv = strip(sub(version_parts[1], /^\(?[\/\-\._v\(]/, ""));
+			local version_numbers = split_string_n(sv, /[\-\._,\[\(\{ ]/, F, 3);
+			if ( 4 in version_numbers && version_numbers[4] != "" )
+				v$addl = strip(version_numbers[4]);
+			else if ( 2 in version_parts && version_parts[2] != "" &&
+			          version_parts[2] != ")" )
 				{
-				if ( /^[[:blank:]]*\([a-zA-Z0-9\-\._[:blank:]]*\)/ in version_parts[3] )
+				if ( /^[[:blank:]]*\([a-zA-Z0-9\-\._[:blank:]]*\)/ in version_parts[2] )
 					{
-					v$addl = split_n(version_parts[3], /[\(\)]/, F, 2)[2];
+					v$addl = split_string_n(version_parts[2], /[\(\)]/, F, 2)[1];
 					}
 				else
 					{
-					local vp = split_n(version_parts[3], /[\-\._,;\[\]\(\)\{\} ]/, F, 3);
-					if ( |vp| >= 1 && vp[1] != "" )
+					local vp = split_string_n(version_parts[2], /[\-\._,;\[\]\(\)\{\} ]/, F, 3);
+					if ( |vp| >= 1 && vp[0] != "" )
+						{
+						v$addl = strip(vp[0]);
+						}
+					else if ( |vp| >= 2 && vp[1] != "" )
 						{
 						v$addl = strip(vp[1]);
 						}
-					else if ( |vp| >= 2 && vp[2] != "" )
+					else if ( |vp| >= 3 && vp[2] != "" )
 						{
 						v$addl = strip(vp[2]);
 						}
-					else if ( |vp| >= 3 && vp[3] != "" )
-						{
-						v$addl = strip(vp[3]);
-						}
 					else
 						{
-						v$addl = strip(version_parts[3]);
+						v$addl = strip(version_parts[2]);
 						}
 						
 					}
 				}
 			
-			if ( 4 in version_numbers && version_numbers[4] != "" )
-				v$minor3 = extract_count(version_numbers[4]);
 			if ( 3 in version_numbers && version_numbers[3] != "" )
-				v$minor2 = extract_count(version_numbers[3]);
+				v$minor3 = extract_count(version_numbers[3]);
 			if ( 2 in version_numbers && version_numbers[2] != "" )
-				v$minor = extract_count(version_numbers[2]);
+				v$minor2 = extract_count(version_numbers[2]);
 			if ( 1 in version_numbers && version_numbers[1] != "" )
-				v$major = extract_count(version_numbers[1]);
+				v$minor = extract_count(version_numbers[1]);
+			if ( 0 in version_numbers && version_numbers[0] != "" )
+				v$major = extract_count(version_numbers[0]);
 			}
 		}
 	
@ -200,14 +200,14 @@ function parse_mozilla(unparsed_version: string): Description
 	{
 	local software_name = "<unknown browser>";
 	local v: Version;
-	local parts: table[count] of string;
+	local parts: string_vec;
 	
 	if ( /Opera [0-9\.]*$/ in unparsed_version )
 		{
 		software_name = "Opera";
-		parts = split_all(unparsed_version, /Opera [0-9\.]*$/);
-		if ( 2 in parts )
-			v = parse(parts[2])$version;
+		parts = split_string_all(unparsed_version, /Opera [0-9\.]*$/);
+		if ( 1 in parts )
+			v = parse(parts[1])$version;
 		}
 	else if ( / MSIE |Trident\// in unparsed_version )
 		{
@ -222,28 +222,28 @@ function parse_mozilla(unparsed_version: string): Description
 			v = [$major=11,$minor=0];
 		else
 			{
-			parts = split_all(unparsed_version, /MSIE [0-9]{1,2}\.*[0-9]*b?[0-9]*/);
-			if ( 2 in parts )
-				v = parse(parts[2])$version;
+			parts = split_string_all(unparsed_version, /MSIE [0-9]{1,2}\.*[0-9]*b?[0-9]*/);
+			if ( 1 in parts )
+				v = parse(parts[1])$version;
 			}
 		}
 	else if ( /Version\/.*Safari\// in unparsed_version )
 		{
 		software_name = "Safari";
-		parts = split_all(unparsed_version, /Version\/[0-9\.]*/);
-		if ( 2 in parts )
+		parts = split_string_all(unparsed_version, /Version\/[0-9\.]*/);
+		if ( 1 in parts )
 			{
-			v = parse(parts[2])$version;
+			v = parse(parts[1])$version;
 			if ( / Mobile\/?.* Safari/ in unparsed_version )
 				v$addl = "Mobile";
 			}
 		}
 	else if ( /(Firefox|Netscape|Thunderbird)\/[0-9\.]*/ in unparsed_version )
 		{
-		parts = split_all(unparsed_version, /(Firefox|Netscape|Thunderbird)\/[0-9\.]*/);
-		if ( 2 in parts )
+		parts = split_string_all(unparsed_version, /(Firefox|Netscape|Thunderbird)\/[0-9\.]*/);
+		if ( 1 in parts )
 			{
-			local tmp_s = parse(parts[2]);
+			local tmp_s = parse(parts[1]);
 			software_name = tmp_s$name;
 			v = tmp_s$version;
 			}
@ -251,48 +251,48 @@ function parse_mozilla(unparsed_version: string): Description
 	else if ( /Chrome\/.*Safari\// in unparsed_version )
 		{
 		software_name = "Chrome";
-		parts = split_all(unparsed_version, /Chrome\/[0-9\.]*/);
-		if ( 2 in parts )
-			v = parse(parts[2])$version;
+		parts = split_string_all(unparsed_version, /Chrome\/[0-9\.]*/);
+		if ( 1 in parts )
+			v = parse(parts[1])$version;
 		}
 	else if ( /^Opera\// in unparsed_version )
 		{
 		if ( /Opera M(ini|obi)\// in unparsed_version )
 			{
-			parts = split_all(unparsed_version, /Opera M(ini|obi)/);
-			if ( 2 in parts )
-				software_name = parts[2];
-			parts = split_all(unparsed_version, /Version\/[0-9\.]*/);
-			if ( 2 in parts )
-				v = parse(parts[2])$version;
+			parts = split_string_all(unparsed_version, /Opera M(ini|obi)/);
+			if ( 1 in parts )
+				software_name = parts[1];
+			parts = split_string_all(unparsed_version, /Version\/[0-9\.]*/);
+			if ( 1 in parts )
+				v = parse(parts[1])$version;
 			else
 				{
-				parts = split_all(unparsed_version, /Opera Mini\/[0-9\.]*/);
-				if ( 2 in parts )
-					v = parse(parts[2])$version;
+				parts = split_string_all(unparsed_version, /Opera Mini\/[0-9\.]*/);
+				if ( 1 in parts )
+					v = parse(parts[1])$version;
 				}
 			}
 		else
 			{
 			software_name = "Opera";
-			parts = split_all(unparsed_version, /Version\/[0-9\.]*/);
-			if ( 2 in parts )
-				v = parse(parts[2])$version;
+			parts = split_string_all(unparsed_version, /Version\/[0-9\.]*/);
+			if ( 1 in parts )
+				v = parse(parts[1])$version;
 			}
 		}
 	else if ( /AppleWebKit\/[0-9\.]*/ in unparsed_version )
 		{
 		software_name = "Unspecified WebKit";
-		parts = split_all(unparsed_version, /AppleWebKit\/[0-9\.]*/);
-		if ( 2 in parts )
-			v = parse(parts[2])$version;
+		parts = split_string_all(unparsed_version, /AppleWebKit\/[0-9\.]*/);
+		if ( 1 in parts )
+			v = parse(parts[1])$version;
 		}
 	else if ( / Java\/[0-9]\./ in unparsed_version )
 		{
 		software_name = "Java";
-		parts = split_all(unparsed_version, /Java\/[0-9\._]*/);
-		if ( 2 in parts )
-			v = parse(parts[2])$version;
+		parts = split_string_all(unparsed_version, /Java\/[0-9\._]*/);
+		if ( 1 in parts )
+			v = parse(parts[1])$version;
 		}

 	return [$version=v, $unparsed_version=unparsed_version, $name=software_name];
--- a/scripts/base/init-bare.bro
+++ b/scripts/base/init-bare.bro
@ -353,9 +353,10 @@ type connection: record {
 ## gives up and discards any internal state related to the file.
 const default_file_timeout_interval: interval = 2 mins &redef;

-## Default amount of bytes that file analysis will buffer before raising
-## :bro:see:`file_new`.
-const default_file_bof_buffer_size: count = 1024 &redef;
+## Default amount of bytes that file analysis will buffer in order to use
+## for mime type matching.  File analyzers attached at the time of mime type
+## matching or later, will receive a copy of this buffer.
+const default_file_bof_buffer_size: count = 4096 &redef;

 ## A file that Bro is analyzing.  This is Bro's type for describing the basic
 ## internal metadata collected about a "file", which is essentially just a
@ -394,8 +395,10 @@ type fa_file: record {
 	## during the process of analysis e.g. due to dropped packets.
 	missing_bytes: count &default=0;

-	## The number of not all-in-sequence bytes in the file stream that
-	## were delivered to file analyzers due to reassembly buffer overflow.
+	## The number of bytes in the file stream that were not delivered to
+	## stream file analyzers.  Generally, this consists of bytes that
+	## couldn't be reassembled, either because reassembly simply isn't
+	## enabled, or due to size limitations of the reassembly buffer.
 	overflow_bytes: count &default=0;

 	## The amount of time between receiving new data for this file that
@ -409,16 +412,6 @@ type fa_file: record {
 	## The content of the beginning of a file up to *bof_buffer_size* bytes.
 	## This is also the buffer that's used for file/mime type detection.
 	bof_buffer: string &optional;
-
-	## The mime type of the strongest file magic signature matches against
-	## the data chunk in *bof_buffer*, or in the cases where no buffering
-	## of the beginning of file occurs, an initial guess of the mime type
-	## based on the first data seen.
-	mime_type: string &optional;
-
-	## All mime types that matched file magic signatures against the data
-	## chunk in *bof_buffer*, in order of their strength value.
-	mime_types: mime_matches &optional;
 } &redef;

 ## Fields of a SYN packet.
--- a/scripts/base/protocols/dhcp/utils.bro
+++ b/scripts/base/protocols/dhcp/utils.bro
@ -13,7 +13,7 @@ export {

 function reverse_ip(ip: addr): addr
 	{
-	local octets = split(cat(ip), /\./);
-	return to_addr(cat(octets[4], ".", octets[3], ".", octets[2], ".", octets[1]));
+	local octets = split_string(cat(ip), /\./);
+	return to_addr(cat(octets[3], ".", octets[2], ".", octets[1], ".", octets[0]));
 	}

--- a/scripts/base/protocols/dnp3/dpd.sig
+++ b/scripts/base/protocols/dnp3/dpd.sig
@ -5,5 +5,11 @@ signature dpd_dnp3_server {
 	ip-proto == tcp
 	payload /\x05\x64/
 	tcp-state responder
- 	enable "dnp3"
+ 	enable "dnp3_tcp"
+}
+
+signature dpd_dnp3_server_udp {
+	ip-proto == udp
+	payload /\x05\x64/
+	enable "dnp3_udp"
 }
--- a/scripts/base/protocols/dnp3/main.bro
+++ b/scripts/base/protocols/dnp3/main.bro
@ -31,16 +31,16 @@ redef record connection += {
 	dnp3: Info &optional;
 };

-const ports = { 20000/tcp };
+const ports = { 20000/tcp , 20000/udp };
 redef likely_server_ports += { ports };

 event bro_init() &priority=5
 	{
 	Log::create_stream(DNP3::LOG, [$columns=Info, $ev=log_dnp3]);
-	Analyzer::register_for_ports(Analyzer::ANALYZER_DNP3, ports);
+	Analyzer::register_for_ports(Analyzer::ANALYZER_DNP3_TCP, ports);
 	}

-event dnp3_application_request_header(c: connection, is_orig: bool, fc: count)
+event dnp3_application_request_header(c: connection, is_orig: bool, application_control: count, fc: count)
 	{
 	if ( ! c?$dnp3 )
 		c$dnp3 = [$ts=network_time(), $uid=c$uid, $id=c$id];
@ -49,7 +49,7 @@ event dnp3_application_request_header(c: connection, is_orig: bool, fc: count)
 	c$dnp3$fc_request = function_codes[fc];
 	}

-event dnp3_application_response_header(c: connection, is_orig: bool, fc: count, iin: count)
+event dnp3_application_response_header(c: connection, is_orig: bool, application_control: count, fc: count, iin: count)
 	{
 	if ( ! c?$dnp3 )
 		c$dnp3 = [$ts=network_time(), $uid=c$uid, $id=c$id];
--- a/scripts/base/protocols/ftp/files.bro
+++ b/scripts/base/protocols/ftp/files.bro
@ -17,6 +17,10 @@ export {

 	## Describe the file being transferred.
 	global describe_file: function(f: fa_file): string;
+
+	redef record fa_file += { 
+		ftp: FTP::Info &optional;
+	};
 }

 function get_file_handle(c: connection, is_orig: bool): string
@ -48,7 +52,6 @@ event bro_init() &priority=5
 	                          $describe        = FTP::describe_file]);
 	}

-
 event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priority=5
 	{
 	if ( [c$id$resp_h, c$id$resp_p] !in ftp_data_expected ) 
@ -56,6 +59,14 @@ event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priori

 	local ftp = ftp_data_expected[c$id$resp_h, c$id$resp_p];
 	ftp$fuid = f$id;
-	if ( f?$mime_type )
-		ftp$mime_type = f$mime_type;
+
+	f$ftp = ftp;
+	}
+
+event file_mime_type(f: fa_file, mime_type: string) &priority=5
+	{
+	if ( ! f?$ftp )
+		return;
+
+	f$ftp$mime_type = mime_type;
 	}
--- a/scripts/base/protocols/ftp/main.bro
+++ b/scripts/base/protocols/ftp/main.bro
@ -274,7 +274,7 @@ event file_transferred(c: connection, prefix: string, descr: string,
 	if ( [id$resp_h, id$resp_p] in ftp_data_expected )
 		{
 		local s = ftp_data_expected[id$resp_h, id$resp_p];
-		s$mime_type = split1(mime_type, /;/)[1];
+		s$mime_type = split_string1(mime_type, /;/)[0];
 		}
 	}

--- a/scripts/base/protocols/http/entities.bro
+++ b/scripts/base/protocols/http/entities.bro
@ -35,6 +35,10 @@ export {
 		## body.
 		resp_mime_depth: count            &default=0;
 	};
+
+	redef record fa_file += {
+		http: HTTP::Info &optional;
+	};
 }

 event http_begin_entity(c: connection, is_orig: bool) &priority=10
@ -67,6 +71,8 @@ event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priori
 	{
 	if ( f$source == "HTTP" && c?$http ) 
 		{
+		f$http = c$http;
+
 		if ( c$http?$current_entity && c$http$current_entity?$filename )
 			f$info$filename = c$http$current_entity$filename;

@ -76,14 +82,6 @@ event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priori
 				c$http$orig_fuids = string_vec(f$id);
 			else
 				c$http$orig_fuids[|c$http$orig_fuids|] = f$id;
-
-			if ( f?$mime_type )
-				{
-				if ( ! c$http?$orig_mime_types )
-					c$http$orig_mime_types = string_vec(f$mime_type);
-				else
-					c$http$orig_mime_types[|c$http$orig_mime_types|] = f$mime_type;
-				}
 			}
 		else
 			{
@ -91,17 +89,29 @@ event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priori
 				c$http$resp_fuids = string_vec(f$id);
 			else
 				c$http$resp_fuids[|c$http$resp_fuids|] = f$id;
+			}
+		}
+	}

-			if ( f?$mime_type )
+event file_mime_type(f: fa_file, mime_type: string) &priority=5
 	{
-				if ( ! c$http?$resp_mime_types )
-					c$http$resp_mime_types = string_vec(f$mime_type);
-				else
-					c$http$resp_mime_types[|c$http$resp_mime_types|] = f$mime_type;
-				}
-			}
-		}
+	if ( ! f?$http || ! f?$is_orig )
+		return;

+	if ( f$is_orig )
+		{
+		if ( ! f$http?$orig_mime_types )
+			f$http$orig_mime_types = string_vec(mime_type);
+		else
+			f$http$orig_mime_types[|f$http$orig_mime_types|] = mime_type;
+		}
+	else
+		{
+		if ( ! f$http?$resp_mime_types )
+			f$http$resp_mime_types = string_vec(mime_type);
+		else
+			f$http$resp_mime_types[|f$http$resp_mime_types|] = mime_type;
+		}
 	}

 event http_end_entity(c: connection, is_orig: bool) &priority=5
--- a/scripts/base/protocols/http/main.bro
+++ b/scripts/base/protocols/http/main.bro
@ -242,7 +242,7 @@ event http_header(c: connection, is_orig: bool, name: string, value: string) &pr

 		else if ( name == "HOST" )
 			# The split is done to remove the occasional port value that shows up here.
-			c$http$host = split1(value, /:/)[1];
+			c$http$host = split_string1(value, /:/)[0];

 		else if ( name == "RANGE" )
 			c$http$range_request = T;
@ -262,12 +262,12 @@ event http_header(c: connection, is_orig: bool, name: string, value: string) &pr
 			if ( /^[bB][aA][sS][iI][cC] / in value )
 				{
 				local userpass = decode_base64(sub(value, /[bB][aA][sS][iI][cC][[:blank:]]/, ""));
-				local up = split(userpass, /:/);
+				local up = split_string(userpass, /:/);
 				if ( |up| >= 2 )
 					{
-					c$http$username = up[1];
+					c$http$username = up[0];
 					if ( c$http$capture_password )
-						c$http$password = up[2];
+						c$http$password = up[1];
 					}
 				else
 					{
--- a/scripts/base/protocols/http/utils.bro
+++ b/scripts/base/protocols/http/utils.bro
@ -42,12 +42,12 @@ function extract_keys(data: string, kv_splitter: pattern): string_vec
 	{
 	local key_vec: vector of string = vector();
 	
-	local parts = split(data, kv_splitter);
+	local parts = split_string(data, kv_splitter);
 	for ( part_index in parts )
 		{
-		local key_val = split1(parts[part_index], /=/);
-		if ( 1 in key_val )
-			key_vec[|key_vec|] = key_val[1];
+		local key_val = split_string1(parts[part_index], /=/);
+		if ( 0 in key_val )
+			key_vec[|key_vec|] = key_val[0];
 		}
 	return key_vec;
 	}
--- a/scripts/base/protocols/irc/files.bro
+++ b/scripts/base/protocols/irc/files.bro
@ -12,6 +12,10 @@ export {

 	## Default file handle provider for IRC.
 	global get_file_handle: function(c: connection, is_orig: bool): string;
+
+	redef record fa_file += {
+		irc: IRC::Info &optional;
+	};
 }

 function get_file_handle(c: connection, is_orig: bool): string
@ -34,6 +38,12 @@ event file_over_new_connection(f: fa_file, c: connection, is_orig: bool) &priori
 	irc$fuid = f$id;
 	if ( irc?$dcc_file_name )
 		f$info$filename = irc$dcc_file_name;
-	if ( f?$mime_type )
-		irc$dcc_mime_type = f$mime_type;
+
+	f$irc = irc;
+	}
+
+event file_mime_type(f: fa_file, mime_type: string) &priority=5
+	{
+	if ( f?$irc )
+		f$irc$dcc_mime_type = mime_type;
 	}
--- a/scripts/base/protocols/mysql/main.bro
+++ b/scripts/base/protocols/mysql/main.bro
@ -18,8 +18,10 @@ export {
 		cmd:	string	&log;
 		## The argument issued to the command
 		arg:	string	&log;
-		## The result (error, OK, etc.) from the server
-		result: string &log &optional;
+		## Did the server tell us that the command succeeded?
+		success: bool &log &optional;
+		## The number of affected rows, if any
+		rows: count &log &optional;
 		## Server message, if any
 		response: string &log &optional;
 	};
@ -57,8 +59,14 @@ event mysql_handshake(c: connection, username: string)

 event mysql_command_request(c: connection, command: count, arg: string) &priority=5
 	{
-	if ( ! c?$mysql )
+	if ( c?$mysql )
 		{
+		# We got a request, but we haven't logged our
+		# previous request yet, so let's do that now.
+		Log::write(mysql::LOG, c$mysql);
+		delete c$mysql;
+		}
+
 	local info: Info;
 	info$ts = network_time();
 	info$uid = c$uid;
@ -67,7 +75,6 @@ event mysql_command_request(c: connection, command: count, arg: string) &priorit
 	info$arg = sub(arg, /\0$/, "");
 	c$mysql = info;
 	}
-	}

 event mysql_command_request(c: connection, command: count, arg: string) &priority=-5
 	{
@ -83,7 +90,7 @@ event mysql_error(c: connection, code: count, msg: string) &priority=5
 	{
 	if ( c?$mysql )
 		{
-		c$mysql$result = "error";
+		c$mysql$success = F;
 		c$mysql$response = msg;
 		}
 	}
@ -101,8 +108,8 @@ event mysql_ok(c: connection, affected_rows: count) &priority=5
 	{
 	if ( c?$mysql )
 		{
-		c$mysql$result = "ok";
-		c$mysql$response = fmt("Affected rows: %d", affected_rows);
+		c$mysql$success = T;
+		c$mysql$rows = affected_rows;
 		}
 	}

@ -114,3 +121,12 @@ event mysql_ok(c: connection, affected_rows: count) &priority=-5
 		delete c$mysql;
 		}
 	}
+
+event connection_state_remove(c: connection) &priority=-5
+	{
+	if ( c?$mysql )
+		{
+		Log::write(mysql::LOG, c$mysql);
+		delete c$mysql;
+		}
+	}
--- a/scripts/base/protocols/smtp/main.bro
+++ b/scripts/base/protocols/smtp/main.bro
@ -98,7 +98,7 @@ event bro_init() &priority=5

 function find_address_in_smtp_header(header: string): string
 {
-	local ips = find_ip_addresses(header);
+	local ips = extract_ip_addresses(header);
 	# If there are more than one IP address found, return the second.
 	if ( |ips| > 1 )
 		return ips[1];
@ -163,7 +163,7 @@ event smtp_request(c: connection, is_orig: bool, command: string, arg: string) &
 		{
 		if ( ! c$smtp?$rcptto )
 			c$smtp$rcptto = set();
-		add c$smtp$rcptto[split1(arg, /:[[:blank:]]*/)[2]];
+		add c$smtp$rcptto[split_string1(arg, /:[[:blank:]]*/)[1]];
 		c$smtp$has_client_activity = T;
 		}

@ -172,8 +172,8 @@ event smtp_request(c: connection, is_orig: bool, command: string, arg: string) &
 		# Flush last message in case we didn't see the server's acknowledgement.
 		smtp_message(c);

-		local partially_done = split1(arg, /:[[:blank:]]*/)[2];
-		c$smtp$mailfrom = split1(partially_done, /[[:blank:]]?/)[1];
+		local partially_done = split_string1(arg, /:[[:blank:]]*/)[1];
+		c$smtp$mailfrom = split_string1(partially_done, /[[:blank:]]?/)[0];
 		c$smtp$has_client_activity = T;
 		}
 	}
@ -234,14 +234,14 @@ event mime_one_header(c: connection, h: mime_header_rec) &priority=5
 		if ( ! c$smtp?$to )
 			c$smtp$to = set();

-		local to_parts = split(h$value, /[[:blank:]]*,[[:blank:]]*/);
+		local to_parts = split_string(h$value, /[[:blank:]]*,[[:blank:]]*/);
 		for ( i in to_parts )
 			add c$smtp$to[to_parts[i]];
 		}

 	else if ( h$name == "X-ORIGINATING-IP" )
 		{
-		local addresses = find_ip_addresses(h$value);
+		local addresses = extract_ip_addresses(h$value);
 		if ( 1 in addresses )
 			c$smtp$x_originating_ip = to_addr(addresses[1]);
 		}
--- a/scripts/base/protocols/ssl/consts.bro
+++ b/scripts/base/protocols/ssl/consts.bro
@ -158,12 +158,11 @@ export {
 		[26] = "brainpoolP256r1",
 		[27] = "brainpoolP384r1",
 		[28] = "brainpoolP512r1",
-		# draft-ietf-tls-negotiated-ff-dhe-02
-		[256] = "ffdhe2432",
+		# draft-ietf-tls-negotiated-ff-dhe-05
+		[256] = "ffdhe2048",
 		[257] = "ffdhe3072",
 		[258] = "ffdhe4096",
-		[259] = "ffdhe6144",
-		[260] = "ffdhe8192",
+		[259] = "ffdhe8192",
 		[0xFF01] = "arbitrary_explicit_prime_curves",
 		[0xFF02] = "arbitrary_explicit_char2_curves"
 	} &default=function(i: count):string { return fmt("unknown-%d", i); };
--- a/scripts/base/utils/active-http.bro
+++ b/scripts/base/utils/active-http.bro
@ -105,21 +105,21 @@ function request(req: Request): ActiveHTTP::Response
 			# The reply is the first line.
 			if ( i == 0 )
 				{
-				local response_line = split_n(headers[0], /[[:blank:]]+/, F, 2);
+				local response_line = split_string_n(headers[0], /[[:blank:]]+/, F, 2);
 				if ( |response_line| != 3 )
 					return resp;

-				resp$code = to_count(response_line[2]);
-				resp$msg = response_line[3];
+				resp$code = to_count(response_line[1]);
+				resp$msg = response_line[2];
 				resp$body = join_string_vec(result$files[bodyfile], "");
 				}
 			else
 				{
 				local line = headers[i];
-				local h = split1(line, /:/);
+				local h = split_string1(line, /:/);
 				if ( |h| != 2 )
 					next;
-				resp$headers[h[1]] = sub_bytes(h[2], 0, |h[2]|-1);
+				resp$headers[h[0]] = sub_bytes(h[1], 0, |h[1]|-1);
 				}
 			}
 		return resp;
--- a/scripts/base/utils/addrs.bro
+++ b/scripts/base/utils/addrs.bro
@ -32,7 +32,7 @@ const ip_addr_regex =
 ## octets: an array of strings to check for valid octet values.
 ##
 ## Returns: T if every element is between 0 and 255, inclusive, else F.
-function has_valid_octets(octets: string_array): bool
+function has_valid_octets(octets: string_vec): bool
 	{
 	local num = 0;
 	for ( i in octets )
@ -51,10 +51,10 @@ function has_valid_octets(octets: string_array): bool
 ## Returns: T if the string is a valid IPv4 or IPv6 address format.
 function is_valid_ip(ip_str: string): bool
 	{
-	local octets: string_array;
+	local octets: string_vec;
 	if ( ip_str == ipv4_addr_regex )
 		{
-		octets = split(ip_str, /\./);
+		octets = split_string(ip_str, /\./);
 		if ( |octets| != 4 )
 			return F;

@ -67,13 +67,13 @@ function is_valid_ip(ip_str: string): bool
 			{
 			# the regexes for hybrid IPv6-IPv4 address formats don't for valid
 			# octets within the IPv4 part, so do that now
-			octets = split(ip_str, /\./);
+			octets = split_string(ip_str, /\./);
 			if ( |octets| != 4 )
 				return F;

 			# get rid of remaining IPv6 stuff in first octet
-			local tmp = split(octets[1], /:/);
-			octets[1] = tmp[|tmp|];
+			local tmp = split_string(octets[0], /:/);
+			octets[0] = tmp[|tmp| - 1];

 			return has_valid_octets(octets);
 			}
@ -92,14 +92,32 @@ function is_valid_ip(ip_str: string): bool
 ## input: a string that may contain an IP address anywhere within it.
 ##
 ## Returns: an array containing all valid IP address strings found in *input*.
-function find_ip_addresses(input: string): string_array
+function find_ip_addresses(input: string): string_array &deprecated
 	{
-	local parts = split_all(input, ip_addr_regex);
+	local parts = split_string_all(input, ip_addr_regex);
 	local output: string_array;

 	for ( i in parts )
 		{
-		if ( i % 2 == 0 && is_valid_ip(parts[i]) )
+		if ( i % 2 == 1 && is_valid_ip(parts[i]) )
+			output[|output|] = parts[i];
+		}
+	return output;
+	}
+
+## Extracts all IP (v4 or v6) address strings from a given string.
+##
+## input: a string that may contain an IP address anywhere within it.
+##
+## Returns: an array containing all valid IP address strings found in *input*.
+function extract_ip_addresses(input: string): string_vec
+	{
+	local parts = split_string_all(input, ip_addr_regex);
+	local output: string_vec;
+
+	for ( i in parts )
+		{
+		if ( i % 2 == 1 && is_valid_ip(parts[i]) )
 			output[|output|] = parts[i];
 		}
 	return output;
--- a/scripts/base/utils/exec.bro
+++ b/scripts/base/utils/exec.bro
@ -82,9 +82,9 @@ event Exec::line(description: Input::EventDescription, tpe: Input::Event, s: str

 event Exec::file_line(description: Input::EventDescription, tpe: Input::Event, s: string)
 	{
-	local parts = split1(description$name, /_/);
-	local name = parts[1];
-	local track_file = parts[2];
+	local parts = split_string1(description$name, /_/);
+	local name = parts[0];
+	local track_file = parts[1];

 	local result = results[name];
 	if ( ! result?$files )
@ -99,13 +99,13 @@ event Exec::file_line(description: Input::EventDescription, tpe: Input::Event, s
 event Input::end_of_data(orig_name: string, source:string)
 	{
 	local name = orig_name;
-	local parts = split1(name, /_/);
-	name = parts[1];
+	local parts = split_string1(name, /_/);
+	name = parts[0];

 	if ( name !in pending_commands || |parts| < 2 )
 		return;

-	local track_file = parts[2];
+	local track_file = parts[1];

 	# If the file is empty, still add it to the result$files table. This is needed
 	# because it is expected that the file was read even if it was empty.
--- a/scripts/base/utils/files.bro
+++ b/scripts/base/utils/files.bro
@ -23,7 +23,7 @@ function extract_filename_from_content_disposition(data: string): string

 	# Remove quotes around the filename if they are there.
 	if ( /^\"/ in filename )
-		filename = split_n(filename, /\"/, F, 2)[2];
+		filename = split_string_n(filename, /\"/, F, 2)[1];

 	# Remove the language and encoding if it's there.
 	if ( /^[a-zA-Z0-9\!#$%&+-^_`{}~]+'[a-zA-Z0-9\!#$%&+-^_`{}~]*'/ in filename )
--- a/scripts/base/utils/numbers.bro
+++ b/scripts/base/utils/numbers.bro
@ -2,9 +2,9 @@
 ## If no integer can be found, 0 is returned.
 function extract_count(s: string): count
 	{
-	local parts = split_n(s, /[0-9]+/, T, 1);
-	if ( 2 in parts )
-		return to_count(parts[2]);
+	local parts = split_string_n(s, /[0-9]+/, T, 1);
+	if ( 1 in parts )
+		return to_count(parts[1]);
 	else
 		return 0;
 	}
--- a/scripts/base/utils/paths.bro
+++ b/scripts/base/utils/paths.bro
@ -13,12 +13,12 @@ const absolute_path_pat = /(\/|[A-Za-z]:[\\\/]).*/;
 function extract_path(input: string): string
 	{
 	const dir_pattern = /(\/|[A-Za-z]:[\\\/])([^\"\ ]|(\\\ ))*/;
-	local parts = split_all(input, dir_pattern);
+	local parts = split_string_all(input, dir_pattern);

 	if ( |parts| < 3 )
 		return "";

-	return parts[2];
+	return parts[1];
 	}

 ## Compresses a given path by removing '..'s and the parent directory it
@ -31,27 +31,27 @@ function compress_path(dir: string): string
 	{
 	const cdup_sep = /((\/)*([^\/]|\\\/)+)?((\/)+\.\.(\/)*)/;

-	local parts = split_n(dir, cdup_sep, T, 1);
+	local parts = split_string_n(dir, cdup_sep, T, 1);
 	if ( |parts| > 1 )
 		{
 		# reaching a point with two parent dir references back-to-back means
 		# we don't know about anything higher in the tree to pop off
-		if ( parts[2] == "../.." )
-			return cat_string_array(parts);
-		if ( sub_bytes(parts[2], 0, 1) == "/" )
-			parts[2] = "/";
+		if ( parts[1] == "../.." )
+			return join_string_vec(parts, "");
+		if ( sub_bytes(parts[1], 0, 1) == "/" )
+			parts[1] = "/";
 		else
-			parts[2] = "";
-		dir = cat_string_array(parts);
+			parts[1] = "";
+		dir = join_string_vec(parts, "");
 		return compress_path(dir);
 		}

 	const multislash_sep = /(\/\.?){2,}/;
-	parts = split_all(dir, multislash_sep);
+	parts = split_string_all(dir, multislash_sep);
 	for ( i in parts )
-		if ( i % 2 == 0 )
+		if ( i % 2 == 1 )
 			parts[i] = "/";
-	dir = cat_string_array(parts);
+	dir = join_string_vec(parts, "");

 	# remove trailing slashes from path
 	if ( |dir| > 1 && sub_bytes(dir, |dir|, 1) == "/" )
--- a/scripts/base/utils/patterns.bro
+++ b/scripts/base/utils/patterns.bro
@ -50,11 +50,11 @@ type PatternMatchResult: record {
 ## Returns: a record indicating the match status.
 function match_pattern(s: string, p: pattern): PatternMatchResult
 	{
-	local a = split_n(s, p, T, 1);
+	local a = split_string_n(s, p, T, 1);

 	if ( |a| == 1 )
 		# no match
 		return [$matched = F, $str = "", $off = 0];
 	else
-		return [$matched = T, $str = a[2], $off = |a[1]| + 1];
+		return [$matched = T, $str = a[1], $off = |a[0]| + 1];
 	}
--- a/scripts/base/utils/urls.bro
+++ b/scripts/base/utils/urls.bro
@ -48,7 +48,7 @@ function find_all_urls_without_scheme(s: string): string_set

 function decompose_uri(s: string): URI
 	{
-	local parts: string_array;
+	local parts: string_vec;
 	local u: URI = [$netlocation="", $path="/"];

 	if ( /\?/ in s)
@ -56,55 +56,55 @@ function decompose_uri(s: string): URI
 		# Parse query.
 		u$params = table();

-		parts = split1(s, /\?/);
-		s = parts[1];
-		local query: string = parts[2];
+		parts = split_string1(s, /\?/);
+		s = parts[0];
+		local query: string = parts[1];

 		if ( /&/ in query )
 			{
-			local opv: table[count] of string = split(query, /&/);
+			local opv = split_string(query, /&/);

 			for ( each in opv )
 				{
 				if ( /=/ in opv[each] )
 					{
-					parts = split1(opv[each], /=/);
-					u$params[parts[1]] = parts[2];
+					parts = split_string1(opv[each], /=/);
+					u$params[parts[0]] = parts[1];
 					}
 				}
 			}
 		else
 			{
-			parts = split1(query, /=/);
-			u$params[parts[1]] = parts[2];
+			parts = split_string1(query, /=/);
+			u$params[parts[0]] = parts[1];
 			}
 		}

 	if ( /:\/\// in s )
 		{
 		# Parse scheme and remove from s.
-		parts = split1(s, /:\/\//);
-		u$scheme = parts[1];
-		s = parts[2];
+		parts = split_string1(s, /:\/\//);
+		u$scheme = parts[0];
+		s = parts[1];
 		}

 	if ( /\// in s )
 		{
 		# Parse path and remove from s.
-		parts = split1(s, /\//);
-		s = parts[1];
-		u$path = fmt("/%s", parts[2]);
+		parts = split_string1(s, /\//);
+		s = parts[0];
+		u$path = fmt("/%s", parts[1]);

 		if ( |u$path| > 1 && u$path[|u$path| - 1] != "/" )
 			{
 			local last_token: string = find_last(u$path, /\/.+/);
-			local full_filename = split1(last_token, /\//)[2];
+			local full_filename = split_string1(last_token, /\//)[1];

 			if ( /\./ in full_filename )
 				{
 				u$file_name = full_filename;
-				u$file_base = split1(full_filename, /\./)[1];
-				u$file_ext = split1(full_filename, /\./)[2];
+				u$file_base = split_string1(full_filename, /\./)[0];
+				u$file_ext = split_string1(full_filename, /\./)[1];
 				}
 			else
 				{
@ -117,9 +117,9 @@ function decompose_uri(s: string): URI
 	if ( /:/ in s )
 		{
 		# Parse location and port.
-		parts = split1(s, /:/);
-		u$netlocation = parts[1];
-		u$portnum = to_count(parts[2]);
+		parts = split_string1(s, /:/);
+		u$netlocation = parts[0];
+		u$portnum = to_count(parts[1]);
 		}
 	else
 		u$netlocation = s;
--- a/scripts/policy/frameworks/files/detect-MHR.bro
+++ b/scripts/policy/frameworks/files/detect-MHR.bro
@ -42,15 +42,15 @@ function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
 	when ( local MHR_result = lookup_hostname_txt(hash_domain) )
 		{
 		# Data is returned as "<dateFirstDetected> <detectionRate>"
-		local MHR_answer = split1(MHR_result, / /);
+		local MHR_answer = split_string1(MHR_result, / /);

 		if ( |MHR_answer| == 2 )
 			{
-			local mhr_detect_rate = to_count(MHR_answer[2]);
+			local mhr_detect_rate = to_count(MHR_answer[1]);

 			if ( mhr_detect_rate >= notice_threshold )
 				{
-				local mhr_first_detected = double_to_time(to_double(MHR_answer[1]));
+				local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
 				local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
 				local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
 				local virustotal_url = fmt(match_sub_url, hash);
@ -66,6 +66,7 @@ function do_mhr_lookup(hash: string, fi: Notice::FileInfo)

 event file_hash(f: fa_file, kind: string, hash: string)
 	{
-	if ( kind == "sha1" && f?$mime_type && match_file_types in f$mime_type )
+	if ( kind == "sha1" && f?$info && f$info?$mime_type && 
+	     match_file_types in f$info$mime_type )
 		do_mhr_lookup(hash, Notice::create_file_info(f));
 	}
--- a/scripts/policy/frameworks/intel/seen/http-headers.bro
+++ b/scripts/policy/frameworks/intel/seen/http-headers.bro
@ -31,7 +31,7 @@ event http_header(c: connection, is_orig: bool, name: string, value: string)
 			case "X-FORWARDED-FOR":
 			if ( is_valid_ip(value) )
 				{
-				local addrs = find_ip_addresses(value);
+				local addrs = extract_ip_addresses(value);
 				for ( i in addrs )
 					{
 					Intel::seen([$host=to_addr(addrs[i]),
--- a/scripts/policy/frameworks/intel/seen/smtp.bro
+++ b/scripts/policy/frameworks/intel/seen/smtp.bro
@ -30,10 +30,10 @@ event mime_end_entity(c: connection)

 		if ( c$smtp?$mailfrom )
 			{
-			local mailfromparts = split_n(c$smtp$mailfrom, /<.+>/, T, 1);
+			local mailfromparts = split_string_n(c$smtp$mailfrom, /<.+>/, T, 1);
 			if ( |mailfromparts| > 2 )
 				{
-				Intel::seen([$indicator=mailfromparts[2][1:-2],
+				Intel::seen([$indicator=mailfromparts[1][1:-2],
 				             $indicator_type=Intel::EMAIL,
 				             $conn=c,
 				             $where=SMTP::IN_MAIL_FROM]);
@ -44,10 +44,10 @@ event mime_end_entity(c: connection)
 			{
 			for ( rcptto in c$smtp$rcptto )
 				{
-				local rcpttoparts = split_n(rcptto, /<.+>/, T, 1);
+				local rcpttoparts = split_string_n(rcptto, /<.+>/, T, 1);
 				if ( |rcpttoparts| > 2 )
 					{
-					Intel::seen([$indicator=rcpttoparts[2][1:-2],
+					Intel::seen([$indicator=rcpttoparts[1][1:-2],
 					             $indicator_type=Intel::EMAIL,
 					             $conn=c,
 					             $where=SMTP::IN_RCPT_TO]);
@ -57,10 +57,10 @@ event mime_end_entity(c: connection)

 		if ( c$smtp?$from )
 			{
-			local fromparts = split_n(c$smtp$from, /<.+>/, T, 1);
+			local fromparts = split_string_n(c$smtp$from, /<.+>/, T, 1);
 			if ( |fromparts| > 2 )
 				{
-				Intel::seen([$indicator=fromparts[2][1:-2],
+				Intel::seen([$indicator=fromparts[1][1:-2],
 				             $indicator_type=Intel::EMAIL,
 				             $conn=c,
 				             $where=SMTP::IN_FROM]);
@ -71,10 +71,10 @@ event mime_end_entity(c: connection)
 			{
 			for ( email_to in c$smtp$to )
 				{
-				local toparts = split_n(email_to, /<.+>/, T, 1);
+				local toparts = split_string_n(email_to, /<.+>/, T, 1);
 				if ( |toparts| > 2 )
 					{
-					Intel::seen([$indicator=toparts[2][1:-2],
+					Intel::seen([$indicator=toparts[1][1:-2],
 					             $indicator_type=Intel::EMAIL,
 					             $conn=c,
 					             $where=SMTP::IN_TO]);
@ -84,10 +84,10 @@ event mime_end_entity(c: connection)

 		if ( c$smtp?$reply_to )
 			{
-			local replytoparts = split_n(c$smtp$reply_to, /<.+>/, T, 1);
+			local replytoparts = split_string_n(c$smtp$reply_to, /<.+>/, T, 1);
 			if ( |replytoparts| > 2 )
 				{
-				Intel::seen([$indicator=replytoparts[2][1:-2],
+				Intel::seen([$indicator=replytoparts[1][1:-2],
 				             $indicator_type=Intel::EMAIL,
 				             $conn=c,
 				             $where=SMTP::IN_REPLY_TO]);
--- a/scripts/policy/frameworks/software/vulnerable.bro
+++ b/scripts/policy/frameworks/software/vulnerable.bro
@ -55,18 +55,18 @@ function decode_vulnerable_version_range(vuln_sw: string): VulnerableVersionRang
 		return vvr;
 		}

-	local versions = split1(vuln_sw, /\x09/);
+	local versions = split_string1(vuln_sw, /\x09/);

 	for ( i in versions )
 		{
-		local field_and_ver = split1(versions[i], /=/);
+		local field_and_ver = split_string1(versions[i], /=/);
 		if ( |field_and_ver| != 2 )
 			return vvr; #failure!

-		local ver = Software::parse(field_and_ver[2])$version;
-		if ( field_and_ver[1] == "min" )
+		local ver = Software::parse(field_and_ver[1])$version;
+		if ( field_and_ver[0] == "min" )
 			vvr$min = ver;
-		else if ( field_and_ver[1] == "max" )
+		else if ( field_and_ver[0] == "max" )
 			vvr$max = ver;
 		}

@ -84,15 +84,15 @@ event grab_vulnerable_versions(i: count)

 	when ( local result = lookup_hostname_txt(cat(i,".",vulnerable_versions_update_endpoint)) )
 		{
-		local parts = split1(result, /\x09/);
+		local parts = split_string1(result, /\x09/);
 		if ( |parts| != 2 ) #failure or end of list!
 			{
 			schedule vulnerable_versions_update_interval { grab_vulnerable_versions(1) };
 			return;
 			}

-		local sw = parts[1];
-		local vvr = decode_vulnerable_version_range(parts[2]);
+		local sw = parts[0];
+		local vvr = decode_vulnerable_version_range(parts[1]);
 		if ( sw !in internal_vulnerable_versions )
 			internal_vulnerable_versions[sw] = set();
 		add internal_vulnerable_versions[sw][vvr];
--- a/scripts/policy/misc/detect-traceroute/main.bro
+++ b/scripts/policy/misc/detect-traceroute/main.bro
@ -74,10 +74,10 @@ event bro_init() &priority=5
 	                  $threshold=icmp_time_exceeded_threshold,
 	                  $threshold_crossed(key: SumStats::Key, result: SumStats::Result) =
 	                  	{
-	                  	local parts = split_n(key$str, /-/, F, 2);
-	                  	local src = to_addr(parts[1]);
-	                  	local dst = to_addr(parts[2]);
-	                  	local proto = parts[3];
+	                  	local parts = split_string_n(key$str, /-/, F, 2);
+	                  	local src = to_addr(parts[0]);
+	                  	local dst = to_addr(parts[1]);
+	                  	local proto = parts[2];
 	                  	Log::write(LOG, [$ts=network_time(), $src=src, $dst=dst, $proto=proto]);
 	                  	NOTICE([$note=Traceroute::Detected,
 	                  	        $msg=fmt("%s seems to be running traceroute using %s", src, proto),
--- a/scripts/policy/protocols/http/software-browser-plugins.bro
+++ b/scripts/policy/protocols/http/software-browser-plugins.bro
@ -45,13 +45,13 @@ event log_http(rec: Info)
 	if ( rec$omniture && rec?$uri )
 		{
 		# We do {5,} because sometimes we see p=6 in the urls.
-		local parts = split_n(rec$uri, /&p=([^&]{5,});&/, T, 1);
-		if ( 2 in parts )
+		local parts = split_string_n(rec$uri, /&p=([^&]{5,});&/, T, 1);
+		if ( 1 in parts )
 			{
 			# We do sub_bytes here just to remove the extra extracted 
 			# characters from the regex split above.
-			local sw = sub_bytes(parts[2], 4, |parts[2]|-5);
-			local plugins = split(sw, /[[:blank:]]*;[[:blank:]]*/);
+			local sw = sub_bytes(parts[1], 4, |parts[1]|-5);
+			local plugins = split_string(sw, /[[:blank:]]*;[[:blank:]]*/);
 			
 			for ( i in plugins )
 				Software::found(rec$id, [$unparsed_version=plugins[i], $host=rec$id$orig_h, $software_type=BROWSER_PLUGIN]);
--- a/scripts/policy/protocols/smtp/blocklists.bro
+++ b/scripts/policy/protocols/smtp/blocklists.bro
@ -47,7 +47,7 @@ event smtp_reply(c: connection, is_orig: bool, code: count, cmd: string,
 			local message = fmt("%s received an error message mentioning an SMTP block list", c$id$orig_h);

 			# Determine if the originator's IP address is in the message.
-			local ips = find_ip_addresses(msg);
+			local ips = extract_ip_addresses(msg);
 			local text_ip = "";
 			if ( |ips| > 0 && to_addr(ips[0]) == c$id$orig_h )
 				{
--- a/scripts/policy/protocols/ssl/notary.bro
+++ b/scripts/policy/protocols/ssl/notary.bro
@ -70,23 +70,23 @@ event ssl_established(c: connection) &priority=3
 			clear_waitlist(digest);
 			return;
 			}
-		local fields = split(str, / /);
+		local fields = split_string(str, / /);
 		if ( |fields| != 5 ) # version 1 has 5 fields.
 			{
 			clear_waitlist(digest);
 			return;
 			}
-		local version = split(fields[1], /=/)[2];
+		local version = split_string(fields[0], /=/)[1];
 		if ( version != "1" )
 			{
 			clear_waitlist(digest);
 			return;
 			}
 		local r = notary_cache[digest];
-		r$first_seen = to_count(split(fields[2], /=/)[2]);
-		r$last_seen = to_count(split(fields[3], /=/)[2]);
-		r$times_seen = to_count(split(fields[4], /=/)[2]);
-		r$valid = split(fields[5], /=/)[2] == "1";
+		r$first_seen = to_count(split_string(fields[1], /=/)[1]);
+		r$last_seen = to_count(split_string(fields[2], /=/)[1]);
+		r$times_seen = to_count(split_string(fields[3], /=/)[1]);
+		r$valid = split_string(fields[4], /=/)[1] == "1";

 		# Assign notary answer to all records waiting for this digest.
 		if ( digest in waitlist )
--- a/src/Attr.cc
+++ b/src/Attr.cc
@ -18,7 +18,7 @@ const char* attr_name(attr_tag t)
 		"&encrypt",
 		"&raw_output", "&mergeable", "&priority",
 		"&group", "&log", "&error_handler", "&type_column",
-		"(&tracked)",
+		"(&tracked)", "&deprecated",
 	};

 	return attr_names[int(t)];
@ -212,6 +212,7 @@ void Attributes::DescribeReST(ODesc* d) const
 void Attributes::CheckAttr(Attr* a)
 	{
 	switch ( a->Tag() ) {
+	case ATTR_DEPRECATED:
 	case ATTR_OPTIONAL:
 	case ATTR_REDEF:
 		break;
--- a/src/Attr.h
+++ b/src/Attr.h
@ -34,7 +34,8 @@ typedef enum {
 	ATTR_ERROR_HANDLER,
 	ATTR_TYPE_COLUMN,	// for input framework
 	ATTR_TRACKED,	// hidden attribute, tracked by NotifierRegistry
-#define NUM_ATTRS (int(ATTR_TRACKED) + 1)
+	ATTR_DEPRECATED,
+#define NUM_ATTRS (int(ATTR_DEPRECATED) + 1)
 } attr_tag;

 class Attr : public BroObj {
--- a/src/Expr.cc
+++ b/src/Expr.cc
@ -3213,6 +3213,10 @@ FieldExpr::FieldExpr(Expr* arg_op, const char* arg_field_name)
 			{
 			SetType(rt->FieldType(field)->Ref());
 			td = rt->FieldDecl(field);
+
+			if ( td->FindAttr(ATTR_DEPRECATED) )
+				reporter->Warning("deprecated (%s$%s)", rt->GetName().c_str(),
+				                  field_name);
 			}
 		}
 	}
@ -3333,6 +3337,9 @@ HasFieldExpr::HasFieldExpr(Expr* arg_op, const char* arg_field_name)

 		if ( field < 0 )
 			ExprError("no such field in record");
+		else if ( rt->FieldDecl(field)->FindAttr(ATTR_DEPRECATED) )
+			reporter->Warning("deprecated (%s?$%s)", rt->GetName().c_str(),
+			                  field_name);

 		SetType(base_type(TYPE_BOOL));
 		}
@ -4147,17 +4154,29 @@ RecordCoerceExpr::RecordCoerceExpr(Expr* op, RecordType* r)
 			}

 		for ( i = 0; i < map_size; ++i )
-			if ( map[i] == -1 &&
-			     ! t_r->FieldDecl(i)->FindAttr(ATTR_OPTIONAL) )
+			{
+			if ( map[i] == -1 )
+				{
+				if ( ! t_r->FieldDecl(i)->FindAttr(ATTR_OPTIONAL) )
 					{
 					char buf[512];
 					safe_snprintf(buf, sizeof(buf),
-					      "non-optional field \"%s\" missing", t_r->FieldName(i));
+					              "non-optional field \"%s\" missing",
+					              t_r->FieldName(i));
 					Error(buf);
 					SetError();
 					break;
 					}
 				}
+			else
+				{
+				if ( t_r->FieldDecl(i)->FindAttr(ATTR_DEPRECATED) )
+					reporter->Warning("deprecated (%s$%s)",
+					                  t_r->GetName().c_str(),
+					                  t_r->FieldName(i));
+				}
+			}
+		}
 	}

 RecordCoerceExpr::~RecordCoerceExpr()
--- a/src/Frag.cc
+++ b/src/Frag.cc
@ -28,7 +28,7 @@ void FragTimer::Dispatch(double t, int /* is_expire */)
 FragReassembler::FragReassembler(NetSessions* arg_s,
 			const IP_Hdr* ip, const u_char* pkt,
 			HashKey* k, double t)
-	: Reassembler(0, REASSEM_IP)
+	: Reassembler(0)
 	{
 	s = arg_s;
 	key = k;
--- a/src/Func.cc
+++ b/src/Func.cc
@ -323,7 +323,7 @@ int BroFunc::IsPure() const
 Val* BroFunc::Call(val_list* args, Frame* parent) const
 	{
 #ifdef PROFILE_BRO_FUNCTIONS
-	DEBUG_MSG("Function: %s\n", id->Name());
+	DEBUG_MSG("Function: %s\n", Name());
 #endif

 	SegmentProfiler(segment_logger, location);
--- a/src/ID.cc
+++ b/src/ID.cc
@ -248,6 +248,16 @@ void ID::UpdateValAttrs()
 		}
 	}

+void ID::MakeDeprecated()
+	{
+	if ( IsDeprecated() )
+		return;
+
+	attr_list* attr = new attr_list;
+	attr->append(new Attr(ATTR_DEPRECATED));
+	AddAttrs(new Attributes(attr, Type(), false));
+	}
+
 void ID::AddAttrs(Attributes* a)
 	{
 	if ( attrs )
--- a/src/ID.h
+++ b/src/ID.h
@ -80,6 +80,11 @@ public:
 	Attr* FindAttr(attr_tag t) const
 		{ return attrs ? attrs->FindAttr(t) : 0; }

+	bool IsDeprecated() const
+		{ return FindAttr(ATTR_DEPRECATED) != 0; }
+
+	void MakeDeprecated();
+
 	void Error(const char* msg, const BroObj* o2 = 0);

 	void Describe(ODesc* d) const;
--- a/src/Reassem.cc
+++ b/src/Reassem.cc
@ -31,7 +31,7 @@ DataBlock::DataBlock(const u_char* data, uint64 size, uint64 arg_seq,

 uint64 Reassembler::total_size = 0;

-Reassembler::Reassembler(uint64 init_seq, ReassemblerType arg_type)
+Reassembler::Reassembler(uint64 init_seq)
 	{
 	blocks = last_block = 0;
 	trim_seq = last_reassem_seq = init_seq;
--- a/src/Reassem.h
+++ b/src/Reassem.h
@ -22,11 +22,10 @@ public:
 };


-enum ReassemblerType { REASSEM_IP, REASSEM_TCP };

 class Reassembler : public BroObj {
 public:
-	Reassembler(uint64 init_seq, ReassemblerType arg_type);
+	Reassembler(uint64 init_seq);
 	virtual ~Reassembler();

 	void NewBlock(double t, uint64 seq, uint64 len, const u_char* data);
--- a/src/SerialTypes.h
+++ b/src/SerialTypes.h
@ -87,6 +87,7 @@ SERIAL_TCP_CONTENTS(TCP_NVT, 3)
 #define SERIAL_REASSEMBLER(name, val) SERIAL_CONST(name, val, REASSEMBLER)
 SERIAL_REASSEMBLER(REASSEMBLER, 1)
 SERIAL_REASSEMBLER(TCP_REASSEMBLER, 2)
+SERIAL_REASSEMBLER(FILE_REASSEMBLER, 3)

 #define SERIAL_VAL(name, val) SERIAL_CONST(name, val, VAL)
 SERIAL_VAL(VAL, 1)
--- a/src/Type.cc
+++ b/src/Type.cc
@ -1434,7 +1434,7 @@ EnumType::~EnumType()
 // Note, we use reporter->Error() here (not Error()) to include the current script
 // location in the error message, rather than the one where the type was
 // originally defined.
-void EnumType::AddName(const string& module_name, const char* name, bool is_export)
+void EnumType::AddName(const string& module_name, const char* name, bool is_export, bool deprecated)
 	{
 	/* implicit, auto-increment */
 	if ( counter < 0)
@ -1443,11 +1443,11 @@ void EnumType::AddName(const string& module_name, const char* name, bool is_expo
 		SetError();
 		return;
 		}
-	CheckAndAddName(module_name, name, counter, is_export);
+	CheckAndAddName(module_name, name, counter, is_export, deprecated);
 	counter++;
 	}

-void EnumType::AddName(const string& module_name, const char* name, bro_int_t val, bool is_export)
+void EnumType::AddName(const string& module_name, const char* name, bro_int_t val, bool is_export, bool deprecated)
 	{
 	/* explicit value specified */
 	if ( counter > 0 )
@ -1457,11 +1457,11 @@ void EnumType::AddName(const string& module_name, const char* name, bro_int_t va
 		return;
 		}
 	counter = -1;
-	CheckAndAddName(module_name, name, val, is_export);
+	CheckAndAddName(module_name, name, val, is_export, deprecated);
 	}

 void EnumType::CheckAndAddName(const string& module_name, const char* name,
-                               bro_int_t val, bool is_export)
+                               bro_int_t val, bool is_export, bool deprecated)
 	{
 	if ( Lookup(val) )
 		{
@ -1477,6 +1477,10 @@ void EnumType::CheckAndAddName(const string& module_name, const char* name,
 		id = install_ID(name, module_name.c_str(), true, is_export);
 		id->SetType(this->Ref());
 		id->SetEnumConst();
+
+		if ( deprecated )
+			id->MakeDeprecated();
+
 		broxygen_mgr->Identifier(id);
 		}
 	else
--- a/src/Type.h
+++ b/src/Type.h
@ -554,12 +554,12 @@ public:

 	// The value of this name is next internal counter value, starting
 	// with zero. The internal counter is incremented.
-	void AddName(const string& module_name, const char* name, bool is_export);
+	void AddName(const string& module_name, const char* name, bool is_export, bool deprecated);

 	// The value of this name is set to val. Once a value has been
 	// explicitly assigned using this method, no further names can be
 	// added that aren't likewise explicitly initalized.
-	void AddName(const string& module_name, const char* name, bro_int_t val, bool is_export);
+	void AddName(const string& module_name, const char* name, bro_int_t val, bool is_export, bool deprecated);

 	// -1 indicates not found.
 	bro_int_t Lookup(const string& module_name, const char* name) const;
@ -580,7 +580,8 @@ protected:
 			const char* name, bro_int_t val, bool is_export);

 	void CheckAndAddName(const string& module_name,
-	                     const char* name, bro_int_t val, bool is_export);
+	                     const char* name, bro_int_t val, bool is_export,
+	                     bool deprecated);

 	typedef std::map< const char*, bro_int_t, ltstr > NameMap;
 	NameMap names;
--- a/src/Var.cc
+++ b/src/Var.cc
@ -435,6 +435,10 @@ void end_func(Stmt* body, attr_list* attrs)
 		loop_over_list(*attrs, i)
 			{
 			Attr* a = (*attrs)[i];
+
+			if ( a->Tag() == ATTR_DEPRECATED )
+				continue;
+
 			if ( a->Tag() != ATTR_PRIORITY )
 				{
 				a->Error("illegal attribute for function body");
--- a/src/analyzer/protocol/dnp3/DNP3.cc
+++ b/src/analyzer/protocol/dnp3/DNP3.cc
@ -97,7 +97,6 @@
 //                                            Binpac DNP3 Analyzer

 #include "DNP3.h"
-#include "analyzer/protocol/tcp/TCP_Reassembler.h"
 #include "events.bif.h"

 using namespace analyzer::dnp3;
@ -109,12 +108,14 @@ const unsigned int PSEUDO_APP_LAYER_INDEX = 11;		// index of first DNP3 app-laye
 const unsigned int PSEUDO_TRANSPORT_LEN = 1;		// length of DNP3 Transport Layer
 const unsigned int PSEUDO_LINK_LAYER_LEN = 8;		// length of DNP3 Pseudo Link Layer

-bool DNP3_Analyzer::crc_table_initialized = false;
-unsigned int DNP3_Analyzer::crc_table[256];
+bool DNP3_Base::crc_table_initialized = false;
+unsigned int DNP3_Base::crc_table[256];

-DNP3_Analyzer::DNP3_Analyzer(Connection* c) : TCP_ApplicationAnalyzer("DNP3", c)
+
+DNP3_Base::DNP3_Base(analyzer::Analyzer* arg_analyzer)
 	{
-	interp = new binpac::DNP3::DNP3_Conn(this);
+	analyzer = arg_analyzer;
+	interp = new binpac::DNP3::DNP3_Conn(analyzer);

 	ClearEndpointState(true);
 	ClearEndpointState(false);
@ -123,49 +124,12 @@ DNP3_Analyzer::DNP3_Analyzer(Connection* c) : TCP_ApplicationAnalyzer("DNP3", c)
 		PrecomputeCRCTable();
 	}

-DNP3_Analyzer::~DNP3_Analyzer()
+DNP3_Base::~DNP3_Base()
 	{
 	delete interp;
 	}

-void DNP3_Analyzer::Done()
-	{
-	TCP_ApplicationAnalyzer::Done();
-
-	interp->FlowEOF(true);
-	interp->FlowEOF(false);
-	}
-
-void DNP3_Analyzer::DeliverStream(int len, const u_char* data, bool orig)
-	{
-	TCP_ApplicationAnalyzer::DeliverStream(len, data, orig);
-
-	try
-		{
-		if ( ! ProcessData(len, data, orig) )
-			SetSkip(1);
-		}
-
-	catch ( const binpac::Exception& e )
-		{
-		SetSkip(1);
-		throw;
-		}
-	}
-
-void DNP3_Analyzer::Undelivered(uint64 seq, int len, bool orig)
-	{
-	TCP_ApplicationAnalyzer::Undelivered(seq, len, orig);
-	interp->NewGap(orig, len);
-	}
-
-void DNP3_Analyzer::EndpointEOF(bool is_orig)
-	{
-	TCP_ApplicationAnalyzer::EndpointEOF(is_orig);
-	interp->FlowEOF(is_orig);
-	}
-
-bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)
+bool DNP3_Base::ProcessData(int len, const u_char* data, bool orig)
 	{
 	Endpoint* endp = orig ? &orig_state : &resp_state;

@ -174,25 +138,30 @@ bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)
 		if ( endp->in_hdr )
 			{
 			// We're parsing the DNP3 header and link layer, get that in full.
-			if ( ! AddToBuffer(endp, PSEUDO_APP_LAYER_INDEX, &data, &len) )
+			int res = AddToBuffer(endp, PSEUDO_APP_LAYER_INDEX, &data, &len);
+
+			if ( res == 0 )
 				return true;

+			if ( res < 0 )
+				return false;
+
 			// The first two bytes must always be 0x0564.
 			if( endp->buffer[0] != 0x05 || endp->buffer[1] != 0x64 )
 				{
-				Weird("dnp3_header_lacks_magic");
+				analyzer->Weird("dnp3_header_lacks_magic");
 				return false;
 				}

 			// Make sure header checksum is correct.
 			if ( ! CheckCRC(PSEUDO_LINK_LAYER_LEN, endp->buffer, endp->buffer + PSEUDO_LINK_LAYER_LEN, "header") )
 				{
-				ProtocolViolation("broken_checksum");
+				analyzer->ProtocolViolation("broken_checksum");
 				return false;
 				}

 			// If the checksum works out, we're pretty certainly DNP3.
-			ProtocolConfirmation();
+			analyzer->ProtocolConfirmation();

 			// DNP3 packets without transport and application
 			// layers can happen, we ignore them.
@ -207,7 +176,7 @@ bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)
 			u_char ctrl = endp->buffer[PSEUDO_CONTROL_FIELD_INDEX];

 			if ( orig != (bool)(ctrl & 0x80) )
-				Weird("dnp3_unexpected_flow_direction");
+				analyzer->Weird("dnp3_unexpected_flow_direction");

 			// Update state.
 			endp->pkt_length = endp->buffer[PSEUDO_LENGTH_INDEX];
@ -222,7 +191,11 @@ bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)

 		if ( ! endp->in_hdr )
 			{
-			assert(endp->pkt_length);
+			if ( endp->pkt_length <= 0 )
+				{
+				analyzer->Weird("dnp3_negative_or_zero_length_link_layer");
+				return false;
+				}

 			// We're parsing the DNP3 application layer, get that
 			// in full now as well. We calculate the number of
@ -230,11 +203,17 @@ bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)
 			// the packet length by determining how much 16-byte
 			// chunks fit in there, and then add 2 bytes CRC for
 			// each.
-			int n = PSEUDO_APP_LAYER_INDEX + (endp->pkt_length - 5) + ((endp->pkt_length - 5) / 16) * 2 + 2 - 1;
+			int n = PSEUDO_APP_LAYER_INDEX + (endp->pkt_length - 5) + ((endp->pkt_length - 5) / 16) * 2
+					+ 2 * ( ((endp->pkt_length - 5) % 16 == 0) ? 0 : 1) - 1 ;

-			if ( ! AddToBuffer(endp, n, &data, &len) )
+			int res = AddToBuffer(endp, n, &data, &len);
+
+			if ( res == 0 )
 				return true;

+			if ( res < 0 )
+				return false;
+
 			// Parse the the application layer data.
 			if ( ! ParseAppLayer(endp) )
 				return false;
@ -248,22 +227,45 @@ bool DNP3_Analyzer::ProcessData(int len, const u_char* data, bool orig)
 	return true;
 	}

-bool DNP3_Analyzer::AddToBuffer(Endpoint* endp, int target_len, const u_char** data, int* len)
+int DNP3_Base::AddToBuffer(Endpoint* endp, int target_len, const u_char** data, int* len)
 	{
 	if ( ! target_len )
-		return true;
+		return 1;
+
+	if ( *len < 0 )
+		{
+		reporter->AnalyzerError(analyzer, "dnp3 negative input length: %d", *len);
+		return -1;
+		}
+
+	if ( target_len < endp->buffer_len )
+		{
+		reporter->AnalyzerError(analyzer, "dnp3 invalid target length: %d - %d",
+		                        target_len, endp->buffer_len);
+		return -1;
+		}

 	int to_copy = min(*len, target_len - endp->buffer_len);

+	if ( endp->buffer_len + to_copy > MAX_BUFFER_SIZE )
+		{
+		reporter->AnalyzerError(analyzer, "dnp3 buffer length exceeded: %d + %d",
+		                        endp->buffer_len, to_copy);
+		return -1;
+		}
+
 	memcpy(endp->buffer + endp->buffer_len, *data, to_copy);
 	*data += to_copy;
 	*len -= to_copy;
 	endp->buffer_len += to_copy;

-	return endp->buffer_len == target_len;
+	if ( endp->buffer_len == target_len )
+		return 1;
+
+	return 0;
 	}

-bool DNP3_Analyzer::ParseAppLayer(Endpoint* endp)
+bool DNP3_Base::ParseAppLayer(Endpoint* endp)
 	{
 	bool orig = (endp == &orig_state);
 	binpac::DNP3::DNP3_Flow* flow = orig ? interp->upflow() : interp->downflow();
@ -291,8 +293,15 @@ bool DNP3_Analyzer::ParseAppLayer(Endpoint* endp)
 		if ( ! CheckCRC(n, data, data + n, "app_chunk") )
 			return false;

+		if ( data + n >= endp->buffer + endp->buffer_len )
+			{
+			reporter->AnalyzerError(analyzer,
+			                        "dnp3 app layer parsing overflow %d - %d",
+			                        endp->buffer_len, n);
+			return false;
+			}
+
 		// Pass on to BinPAC.
-		assert(data + n < endp->buffer + endp->buffer_len);
 		flow->flow_buffer()->BufferData(data + transport, data + n);
 		transport = 0;

@ -306,7 +315,7 @@ bool DNP3_Analyzer::ParseAppLayer(Endpoint* endp)
 	if ( ! is_first && ! endp->encountered_first_chunk )
 		{
 		// We lost the first chunk.
-		Weird("dnp3_first_application_layer_chunk_missing");
+		analyzer->Weird("dnp3_first_application_layer_chunk_missing");
 		return false;
 		}

@ -320,7 +329,7 @@ bool DNP3_Analyzer::ParseAppLayer(Endpoint* endp)
 	return true;
 	}

-void DNP3_Analyzer::ClearEndpointState(bool orig)
+void DNP3_Base::ClearEndpointState(bool orig)
 	{
 	Endpoint* endp = orig ? &orig_state : &resp_state;
 	binpac::DNP3::DNP3_Flow* flow = orig ? interp->upflow() : interp->downflow();
@ -333,18 +342,18 @@ void DNP3_Analyzer::ClearEndpointState(bool orig)
 	endp->pkt_cnt = 0;
 	}

-bool DNP3_Analyzer::CheckCRC(int len, const u_char* data, const u_char* crc16, const char* where)
+bool DNP3_Base::CheckCRC(int len, const u_char* data, const u_char* crc16, const char* where)
 	{
 	unsigned int crc = CalcCRC(len, data);

 	if ( crc16[0] == (crc & 0xff) && crc16[1] == (crc & 0xff00) >> 8 )
 		return true;

-	Weird(fmt("dnp3_corrupt_%s_checksum", where));
+	analyzer->Weird(fmt("dnp3_corrupt_%s_checksum", where));
 	return false;
 	}

-void DNP3_Analyzer::PrecomputeCRCTable()
+void DNP3_Base::PrecomputeCRCTable()
 	{
 	for( unsigned int i = 0; i < 256; i++)
 		{
@ -362,7 +371,7 @@ void DNP3_Analyzer::PrecomputeCRCTable()
 		}
 	}

-unsigned int DNP3_Analyzer::CalcCRC(int len, const u_char* data)
+unsigned int DNP3_Base::CalcCRC(int len, const u_char* data)
 	{
 	unsigned int crc = 0x0000;

@ -374,3 +383,76 @@ unsigned int DNP3_Analyzer::CalcCRC(int len, const u_char* data)

 	return ~crc & 0xFFFF;
 	}
+
+DNP3_TCP_Analyzer::DNP3_TCP_Analyzer(Connection* c)
+	: DNP3_Base(this), TCP_ApplicationAnalyzer("DNP3_TCP", c)
+	{
+	}
+
+DNP3_TCP_Analyzer::~DNP3_TCP_Analyzer()
+	{
+	}
+
+void DNP3_TCP_Analyzer::Done()
+	{
+	TCP_ApplicationAnalyzer::Done();
+
+	Interpreter()->FlowEOF(true);
+	Interpreter()->FlowEOF(false);
+	}
+
+void DNP3_TCP_Analyzer::DeliverStream(int len, const u_char* data, bool orig)
+	{
+	TCP_ApplicationAnalyzer::DeliverStream(len, data, orig);
+
+	try
+		{
+		if ( ! ProcessData(len, data, orig) )
+			SetSkip(1);
+		}
+
+	catch ( const binpac::Exception& e )
+		{
+		SetSkip(1);
+		throw;
+		}
+	}
+
+void DNP3_TCP_Analyzer::Undelivered(uint64 seq, int len, bool orig)
+	{
+	TCP_ApplicationAnalyzer::Undelivered(seq, len, orig);
+	Interpreter()->NewGap(orig, len);
+	}
+
+void DNP3_TCP_Analyzer::EndpointEOF(bool is_orig)
+	{
+	TCP_ApplicationAnalyzer::EndpointEOF(is_orig);
+	Interpreter()->FlowEOF(is_orig);
+	}
+
+DNP3_UDP_Analyzer::DNP3_UDP_Analyzer(Connection* c)
+	: DNP3_Base(this), Analyzer("DNP3_UDP", c)
+	{
+	}
+
+DNP3_UDP_Analyzer::~DNP3_UDP_Analyzer()
+	{
+	}
+
+void DNP3_UDP_Analyzer::DeliverPacket(int len, const u_char* data, bool orig, uint64 seq, const IP_Hdr* ip, int caplen)
+	{
+	Analyzer::DeliverPacket(len, data, orig, seq, ip, caplen);
+
+	try
+		{
+		if ( ! ProcessData(len, data, orig) )
+			SetSkip(1);
+		}
+
+	catch ( const binpac::Exception& e )
+		{
+		SetSkip(1);
+		throw;
+		}
+	}
+
--- a/src/analyzer/protocol/dnp3/DNP3.h
+++ b/src/analyzer/protocol/dnp3/DNP3.h
@ -3,24 +3,20 @@
 #define ANALYZER_PROTOCOL_DNP3_DNP3_H

 #include "analyzer/protocol/tcp/TCP.h"
+#include "analyzer/protocol/udp/UDP.h"
+
 #include "dnp3_pac.h"

 namespace analyzer { namespace dnp3 {

-class DNP3_Analyzer : public tcp::TCP_ApplicationAnalyzer {
+class DNP3_Base {
 public:
-	DNP3_Analyzer(Connection* conn);
-	virtual ~DNP3_Analyzer();
+	DNP3_Base(analyzer::Analyzer* analyzer);
+	virtual ~DNP3_Base();

-	virtual void Done();
-	virtual void DeliverStream(int len, const u_char* data, bool orig);
-	virtual void Undelivered(uint64 seq, int len, bool orig);
-	virtual void EndpointEOF(bool is_orig);
+	binpac::DNP3::DNP3_Conn* Interpreter()	{ return interp; }

-	static Analyzer* Instantiate(Connection* conn)
-		{ return new DNP3_Analyzer(conn); }
-
-private:
+protected:
 	static const int MAX_BUFFER_SIZE = 300;

 	struct Endpoint	{
@ -35,22 +31,64 @@ private:

 	bool ProcessData(int len, const u_char* data, bool orig);
 	void ClearEndpointState(bool orig);
-	bool AddToBuffer(Endpoint* endp, int target_len, const u_char** data, int* len);
+
+	/**
+	 * Buffers packet data until it reaches a specified length.
+	 * @param endp an endpoint speaking DNP3 to which data will be buffered.
+	 * @param target_len the required length of the buffer
+	 * @param data source buffer to copy bytes from.  Will be incremented
+	 * by the number of bytes copied by this function.
+	 * @param len the number of bytes available in \a data.  Will be decremented
+	 * by the number of bytes copied by this function.
+	 * @return -1 if invalid input parameters were supplied, 0 if the endpoint's
+	 * buffer is not yet \a target_len bytes in size, or 1 the buffer is the
+	 * required size.
+	 */
+	int AddToBuffer(Endpoint* endp, int target_len, const u_char** data, int* len);
+
 	bool ParseAppLayer(Endpoint* endp);
 	bool CheckCRC(int len, const u_char* data, const u_char* crc16, const char* where);
 	unsigned int CalcCRC(int len, const u_char* data);

-	binpac::DNP3::DNP3_Conn* interp;
-
-	Endpoint orig_state;
-	Endpoint resp_state;
-
 	static void PrecomputeCRCTable();

 	static bool crc_table_initialized;
 	static unsigned int crc_table[256];
+
+	analyzer::Analyzer* analyzer;
+	binpac::DNP3::DNP3_Conn* interp;
+
+	Endpoint orig_state;
+	Endpoint resp_state;
 };

+class DNP3_TCP_Analyzer : public DNP3_Base, public tcp::TCP_ApplicationAnalyzer {
+public:
+	DNP3_TCP_Analyzer(Connection* conn);
+	virtual ~DNP3_TCP_Analyzer();
+
+	virtual void Done();
+	virtual void DeliverStream(int len, const u_char* data, bool orig);
+	virtual void Undelivered(uint64 seq, int len, bool orig);
+	virtual void EndpointEOF(bool is_orig);
+
+	static Analyzer* Instantiate(Connection* conn)
+		{ return new DNP3_TCP_Analyzer(conn); }
+};
+
+class DNP3_UDP_Analyzer : public DNP3_Base, public analyzer::Analyzer {
+public:
+	DNP3_UDP_Analyzer(Connection* conn);
+	virtual ~DNP3_UDP_Analyzer();
+
+	virtual void DeliverPacket(int len, const u_char* data, bool orig,
+                    uint64 seq, const IP_Hdr* ip, int caplen);
+
+	static analyzer::Analyzer* Instantiate(Connection* conn)
+		{ return new DNP3_UDP_Analyzer(conn); }
+};
+
+
 } } // namespace analyzer::*

 #endif
--- a/src/analyzer/protocol/dnp3/Plugin.cc
+++ b/src/analyzer/protocol/dnp3/Plugin.cc
@ -12,11 +12,12 @@ class Plugin : public plugin::Plugin {
 public:
 	plugin::Configuration Configure()
 		{
-		AddComponent(new ::analyzer::Component("DNP3", ::analyzer::dnp3::DNP3_Analyzer::Instantiate));
+		AddComponent(new ::analyzer::Component("DNP3_TCP", ::analyzer::dnp3::DNP3_TCP_Analyzer::Instantiate));
+		AddComponent(new ::analyzer::Component("DNP3_UDP", ::analyzer::dnp3::DNP3_UDP_Analyzer::Instantiate));

 		plugin::Configuration config;
 		config.name = "Bro::DNP3";
-		config.description = "DNP3 analyzer";
+		config.description = "DNP3 UDP/TCP analyzers";
 		return config;
 		}
 } plugin;
--- a/src/analyzer/protocol/dnp3/dnp3-analyzer.pac
+++ b/src/analyzer/protocol/dnp3/dnp3-analyzer.pac
@ -38,7 +38,7 @@ flow DNP3_Flow(is_orig: bool) {
 		return true;
 		%}

-	function get_dnp3_application_request_header(fc: uint8): bool
+	function get_dnp3_application_request_header(application_control: uint8, fc: uint8): bool
 		%{
 		if ( ::dnp3_application_request_header )
 			{
@ -46,13 +46,14 @@ flow DNP3_Flow(is_orig: bool) {
 				connection()->bro_analyzer(),
 				connection()->bro_analyzer()->Conn(),
 				is_orig(),
+				application_control, 
 				fc
 				);
 			}
 		return true;
 		%}

-	function get_dnp3_application_response_header(fc: uint8, iin: uint16): bool
+	function get_dnp3_application_response_header(application_control: uint8, fc: uint8, iin: uint16): bool
 		%{
 		if ( ::dnp3_application_response_header )
 			{
@ -60,6 +61,7 @@ flow DNP3_Flow(is_orig: bool) {
 				connection()->bro_analyzer(),
 				connection()->bro_analyzer()->Conn(),
 				is_orig(),
+				application_control,
 				fc,
 				iin
 				);
@ -743,11 +745,11 @@ refine typeattr Header_Block += &let {
 };

 refine typeattr DNP3_Application_Request_Header += &let {
-	process_request: bool =  $context.flow.get_dnp3_application_request_header(function_code);
+	process_request: bool =  $context.flow.get_dnp3_application_request_header(application_control, function_code);
 };

 refine typeattr DNP3_Application_Response_Header += &let {
-	process_request: bool =  $context.flow.get_dnp3_application_response_header(function_code, internal_indications);
+	process_request: bool =  $context.flow.get_dnp3_application_response_header(application_control, function_code, internal_indications);
 };

 refine typeattr Object_Header += &let {
--- a/src/analyzer/protocol/dnp3/dnp3-protocol.pac
+++ b/src/analyzer/protocol/dnp3/dnp3-protocol.pac
@ -90,7 +90,7 @@ type DNP3_Application_Response_Header = record {
 type Request_Objects(function_code: uint8) = record {
 	object_header: Object_Header(function_code);
 	data: case (object_header.object_type_field) of {
-		0x0c03 -> bocmd_PM: Request_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1 ];
+		0x0c03 -> bocmd_PM: Request_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1*( object_header.number_of_item > ( (object_header.number_of_item / 8)*8 ) ) ];
 		0x3202 -> time_interval_ojbects: Request_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ object_header.number_of_item]
 							&check( object_header.qualifer_field == 0x0f && object_header.number_of_item == 0x01);
 		default -> ojbects: Request_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ object_header.number_of_item];
@ -112,10 +112,10 @@ type Request_Objects(function_code: uint8) = record {
 type Response_Objects(function_code: uint8) = record {
 	object_header: Object_Header(function_code);
 	data: case (object_header.object_type_field) of {
-		0x0101 -> biwoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1 ];
-		0x0301 -> diwoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1 ];
-		0x0a01 -> bowoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1 ];
-		0x0c03 -> bocmd_PM: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1 ];
+		0x0101 -> biwoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1*( object_header.number_of_item > ( (object_header.number_of_item / 8)*8 ) ) ];
+		0x0301 -> diwoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1*( object_header.number_of_item > ( (object_header.number_of_item / 8)*8 ) ) ];
+		0x0a01 -> bowoflag: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1*( object_header.number_of_item > ( (object_header.number_of_item / 8)*8 ) )];
+		0x0c03 -> bocmd_PM: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ ( object_header.number_of_item / 8 ) + 1*( object_header.number_of_item > ( (object_header.number_of_item / 8)*8 ) )];
 		default -> ojbects: Response_Data_Object(function_code, object_header.qualifier_field, object_header.object_type_field )[ object_header.number_of_item];
 	};
 };
--- a/src/analyzer/protocol/dnp3/events.bif
+++ b/src/analyzer/protocol/dnp3/events.bif
@ -7,7 +7,7 @@
 ##
 ## fc: function code.
 ##
-event dnp3_application_request_header%(c: connection, is_orig: bool, fc: count%);
+event dnp3_application_request_header%(c: connection, is_orig: bool, application: count, fc: count%);

 ## Generated for a DNP3 response header.
 ##
@ -19,7 +19,7 @@ event dnp3_application_request_header%(c: connection, is_orig: bool, fc: count%)
 ##
 ## iin: internal indication number.
 ##
-event dnp3_application_response_header%(c: connection, is_orig: bool, fc: count, iin: count%);
+event dnp3_application_response_header%(c: connection, is_orig: bool, application: count, fc: count, iin: count%);

 ## Generated for the object header found in both DNP3 requests and responses.
 ##
--- a/src/analyzer/protocol/mysql/events.bif
+++ b/src/analyzer/protocol/mysql/events.bif
@ -9,7 +9,7 @@
 ##
 ## arg: The argument for the command (empty string if not provided).
 ##
-## .. bro:see:: mysql_error mysql_ok mysql_server_version mysql_handshake_response
+## .. bro:see:: mysql_error mysql_ok mysql_server_version mysql_handshake
 event mysql_command_request%(c: connection, command: count, arg: string%);

 ## Generated for an unsuccessful MySQL response.
@ -23,7 +23,7 @@ event mysql_command_request%(c: connection, command: count, arg: string%);
 ##
 ## msg: Any extra details about the error (empty string if not provided).
 ##
-## .. bro:see:: mysql_command_request mysql_ok mysql_server_version mysql_handshake_response
+## .. bro:see:: mysql_command_request mysql_ok mysql_server_version mysql_handshake
 event mysql_error%(c: connection, code: count, msg: string%);

 ## Generated for a successful MySQL response.
@ -35,7 +35,7 @@ event mysql_error%(c: connection, code: count, msg: string%);
 ##
 ## affected_rows: The number of rows that were affected.
 ##
-## .. bro:see:: mysql_command_request mysql_error mysql_server_version mysql_handshake_response
+## .. bro:see:: mysql_command_request mysql_error mysql_server_version mysql_handshake
 event mysql_ok%(c: connection, affected_rows: count%);

 ## Generated for the initial server handshake packet, which includes the MySQL server version.
@ -47,7 +47,7 @@ event mysql_ok%(c: connection, affected_rows: count%);
 ##
 ## ver: The server version string.
 ##
-## .. bro:see:: mysql_command_request mysql_error mysql_ok mysql_handshake_response
+## .. bro:see:: mysql_command_request mysql_error mysql_ok mysql_handshake
 event mysql_server_version%(c: connection, ver: string%);

 ## Generated for a client handshake response packet, which includes the username the client is attempting
--- a/src/analyzer/protocol/tcp/TCP_Reassembler.cc
+++ b/src/analyzer/protocol/tcp/TCP_Reassembler.cc
@ -28,7 +28,7 @@ TCP_Reassembler::TCP_Reassembler(analyzer::Analyzer* arg_dst_analyzer,
 				TCP_Analyzer* arg_tcp_analyzer,
 				TCP_Reassembler::Type arg_type,
 				TCP_Endpoint* arg_endp)
-	: Reassembler(1, REASSEM_TCP)
+	: Reassembler(1)
 	{
 	dst_analyzer = arg_dst_analyzer;
 	tcp_analyzer = arg_tcp_analyzer;
--- a/src/analyzer/protocol/tcp/TCP_Reassembler.h
+++ b/src/analyzer/protocol/tcp/TCP_Reassembler.h
@ -11,9 +11,6 @@ namespace analyzer { namespace tcp {

 class TCP_Analyzer;

-const int STOP_ON_GAP = 1;
-const int PUNT_ON_PARTIAL = 1;
-
 class TCP_Reassembler : public Reassembler {
 public:
 	enum Type {
--- a/src/analyzer/protocol/tcp/events.bif
+++ b/src/analyzer/protocol/tcp/events.bif
@ -29,8 +29,10 @@ event new_connection_contents%(c: connection%);
 ##    new_connection new_connection_contents partial_connection
 event connection_attempt%(c: connection%);

-## Generated when a SYN-ACK packet is seen in response to a SYN packet during
-## a TCP handshake.  The final ACK of the handshake in response to SYN-ACK may
+## Generated when seeing a SYN-ACK packet from the responder in a TCP
+## handshake.  An associated SYN packet was not seen from the originator
+## side if its state is not set to :bro:see:`TCP_ESTABLISHED`.
+## The final ACK of the handshake in response to SYN-ACK may
 ## or may not occur later, one way to tell is to check the *history* field of
 ## :bro:type:`connection` to see if the originator sent an ACK, indicated by
 ## 'A' in the history string.
--- a/src/builtin-func.y
+++ b/src/builtin-func.y
@ -287,7 +287,7 @@ void record_bif_item(const char* id, const char* type)

 %left ',' ':'

-%type <str> TOK_C_TOKEN TOK_ID TOK_CSTR TOK_WS TOK_COMMENT TOK_ATTR TOK_INT opt_ws type attr_list opt_attr_list
+%type <str> TOK_C_TOKEN TOK_ID TOK_CSTR TOK_WS TOK_COMMENT TOK_ATTR TOK_INT opt_ws type attr_list opt_attr_list opt_func_attrs
 %type <val> TOK_ATOM TOK_BOOL

 %union	{
@ -372,7 +372,13 @@ type_def_types: TOK_RECORD
 			{ set_definition_type(TYPE_DEF, "Table"); }
 	;

-event_def:	event_prefix opt_ws plain_head opt_attr_list
+opt_func_attrs:	attr_list opt_ws
+		{ $$ = $1; }
+	| /* nothing */
+		{ $$ = ""; }
+	;
+
+event_def:	event_prefix opt_ws plain_head opt_func_attrs
 			{ fprintf(fp_bro_init, "%s", $4); } end_of_head ';'
 			{
 			print_event_c_prototype(fp_func_h, true);
@ -380,13 +386,16 @@ event_def:	event_prefix opt_ws plain_head opt_attr_list
 			print_event_c_body(fp_func_def);
 			}

-func_def:	func_prefix opt_ws typed_head end_of_head body
+func_def:	func_prefix opt_ws typed_head opt_func_attrs
+			{ fprintf(fp_bro_init, "%s", $4); } end_of_head body
 	;

-enum_def:	enum_def_1 enum_list TOK_RPB
+enum_def:	enum_def_1 enum_list TOK_RPB opt_attr_list
 			{
 			// First, put an end to the enum type decl.
-			fprintf(fp_bro_init, "};\n");
+			fprintf(fp_bro_init, "} ");
+			fprintf(fp_bro_init, "%s", $4);
+			fprintf(fp_bro_init, ";\n");
 			if ( decl.module_name != GLOBAL_MODULE_NAME )
 				fprintf(fp_netvar_h, "}; } }\n");
 			else
--- a/src/event.bif
+++ b/src/event.bif
@ -905,7 +905,8 @@ event get_file_handle%(tag: Analyzer::Tag, c: connection, is_orig: bool%);
 ##
 ## f: The file.
 ##
-## .. bro:see:: file_over_new_connection file_timeout file_gap file_state_remove
+## .. bro:see:: file_over_new_connection file_timeout file_gap file_mime_type
+##    file_state_remove
 event file_new%(f: fa_file%);

 ## Indicates that a file has been seen being transferred over a connection
@ -917,16 +918,39 @@ event file_new%(f: fa_file%);
 ##
 ## is_orig: true if the originator of *c* is the one sending the file.
 ##
-## .. bro:see:: file_new file_timeout file_gap file_state_remove
+## .. bro:see:: file_new file_timeout file_gap file_mime_type 
+##    file_state_remove
 event file_over_new_connection%(f: fa_file, c: connection, is_orig: bool%);

+## Provide the most likely matching MIME type for this file. The analysis 
+## can be augmented at this time via :bro:see:`Files::add_analyzer`.
+##
+## f: The file.
+##
+## mime_type: The mime type that was discovered.
+##
+## .. bro:see:: file_over_new_connection file_timeout file_gap file_mime_type 
+##    file_mime_types file_state_remove
+event file_mime_type%(f: fa_file, mime_type: string%);
+
+## Provide all matching MIME types for this file. The analysis can be
+## augmented at this time via :bro:see:`Files::add_analyzer`.
+##
+## f: The file.
+##
+## mime_types: The mime types that were discovered.
+##
+## .. bro:see:: file_over_new_connection file_timeout file_gap file_mime_type 
+##    file_mime_types file_state_remove
+event file_mime_types%(f: fa_file, mime_types: mime_matches%);
+
 ## Indicates that file analysis has timed out because no activity was seen
 ## for the file in a while.
 ##
 ## f: The file.
 ##
-## .. bro:see:: file_new file_over_new_connection file_gap file_state_remove
-##    default_file_timeout_interval Files::set_timeout_interval
+## .. bro:see:: file_new file_over_new_connection file_gap file_mime_type
+##    file_mime_types file_state_remove default_file_timeout_interval
 ##    Files::set_timeout_interval
 event file_timeout%(f: fa_file%);

@ -938,14 +962,34 @@ event file_timeout%(f: fa_file%);
 ##
 ## len: The number of missing bytes.
 ##
-## .. bro:see:: file_new file_over_new_connection file_timeout file_state_remove
+## .. bro:see:: file_new file_over_new_connection file_timeout file_mime_type
+##    file_mime_types file_state_remove file_reassembly_overflow
 event file_gap%(f: fa_file, offset: count, len: count%);

+## Indicates that the file had an overflow of the reassembly buffer.
+## This is a specialization of the :bro:id:`file_gap` event.
+##
+## f: The file.
+##
+## offset: The byte offset from the start of the file at which the reassembly
+##         couldn't continue due to running out of reassembly buffer space.
+##
+## skipped: The number of bytes of the file skipped over to flush some
+##          file data and get back under the reassembly buffer size limit.
+##          This value will also be represented as a gap.
+##
+## .. bro:see:: file_new file_over_new_connection file_timeout file_mime_type
+##    file_mime_types file_state_remove file_gap Files::enable_reassembler 
+##    Files::reassembly_buffer_size Files::enable_reassembly 
+##    Files::disable_reassembly Files::set_reassembly_buffer_size
+event file_reassembly_overflow%(f: fa_file, offset: count, skipped: count%);
+
 ## This event is generated each time file analysis is ending for a given file.
 ##
 ## f: The file.
 ##
 ## .. bro:see:: file_new file_over_new_connection file_timeout file_gap
+##    file_mime_type file_mime_types
 event file_state_remove%(f: fa_file%);

 ## Generated when an internal DNS lookup produces the same result as last time.
--- a/src/file_analysis/Analyzer.h
+++ b/src/file_analysis/Analyzer.h
@ -111,6 +111,18 @@ public:
 	 */
 	void SetAnalyzerTag(const file_analysis::Tag& tag);

+	/**
+	 * @return true if the analyzer has ever seen a stream-wise delivery.
+	 */
+	bool GotStreamDelivery() const
+		{ return got_stream_delivery; }
+
+	/**
+	 * Flag the analyzer as having seen a stream-wise delivery.
+	 */
+	void SetGotStreamDelivery()
+		{ got_stream_delivery = true; }
+
 protected:

 	/**
@ -123,7 +135,8 @@ protected:
 	Analyzer(file_analysis::Tag arg_tag, RecordVal* arg_args, File* arg_file)
 	    : tag(arg_tag),
 	      args(arg_args->Ref()->AsRecordVal()),
-	      file(arg_file)
+	      file(arg_file),
+	      got_stream_delivery(false)
 		{
 		id = ++id_counter;
 		}
@ -140,7 +153,8 @@ protected:
 	Analyzer(RecordVal* arg_args, File* arg_file)
 	    : tag(),
 	      args(arg_args->Ref()->AsRecordVal()),
-	      file(arg_file)
+	      file(arg_file),
+	      got_stream_delivery(false)
 		{
 		id = ++id_counter;
 		}
@ -151,6 +165,7 @@ private:
 	file_analysis::Tag tag;	/**< The particular type of the analyzer instance. */
 	RecordVal* args;	/**< \c AnalyzerArgs val gives tunable analyzer params. */
 	File* file;	/**< The file to which the analyzer is attached. */
+	bool got_stream_delivery;

 	static ID id_counter;
 };
--- a/src/file_analysis/AnalyzerSet.cc
+++ b/src/file_analysis/AnalyzerSet.cc
@ -72,7 +72,7 @@ bool AnalyzerSet::Add(file_analysis::Tag tag, RecordVal* args)
 	return true;
 	}

-bool AnalyzerSet::QueueAdd(file_analysis::Tag tag, RecordVal* args)
+Analyzer* AnalyzerSet::QueueAdd(file_analysis::Tag tag, RecordVal* args)
 	{
 	HashKey* key = GetKey(tag, args);
 	file_analysis::Analyzer* a = InstantiateAnalyzer(tag, args);
@ -80,12 +80,12 @@ bool AnalyzerSet::QueueAdd(file_analysis::Tag tag, RecordVal* args)
 	if ( ! a )
 		{
 		delete key;
-		return false;
+		return 0;
 		}

 	mod_queue.push(new AddMod(a, key));

-	return true;
+	return a;
 	}

 bool AnalyzerSet::AddMod::Perform(AnalyzerSet* set)
--- a/src/file_analysis/AnalyzerSet.h
+++ b/src/file_analysis/AnalyzerSet.h
@ -57,9 +57,10 @@ public:
 	 * Queue the attachment of an analyzer to #file.
 	 * @param tag the analyzer tag of the file analyzer to add.
 	 * @param args an \c AnalyzerArgs value which specifies an analyzer.
-	 * @return true if analyzer was able to be instantiated, else false.
+	 * @return if successful, a pointer to a newly instantiated analyzer else
+	 * a null pointer.  The caller does *not* take ownership of the memory.
 	 */
-	bool QueueAdd(file_analysis::Tag tag, RecordVal* args);
+	file_analysis::Analyzer* QueueAdd(file_analysis::Tag tag, RecordVal* args);

 	/**
 	 * Remove an analyzer from #file immediately.
--- a/src/file_analysis/CMakeLists.txt
+++ b/src/file_analysis/CMakeLists.txt
@ -11,6 +11,7 @@ set(file_analysis_SRCS
    Manager.cc
    File.cc
    FileTimer.cc
+    FileReassembler.cc
    Analyzer.cc
    AnalyzerSet.cc
    Component.cc
--- a/src/file_analysis/File.cc
+++ b/src/file_analysis/File.cc
@ -53,8 +53,6 @@ int File::overflow_bytes_idx = -1;
 int File::timeout_interval_idx = -1;
 int File::bof_buffer_size_idx = -1;
 int File::bof_buffer_idx = -1;
-int File::mime_type_idx = -1;
-int File::mime_types_idx = -1;

 void File::StaticInit()
 	{
@ -74,15 +72,14 @@ void File::StaticInit()
 	timeout_interval_idx = Idx("timeout_interval");
 	bof_buffer_size_idx = Idx("bof_buffer_size");
 	bof_buffer_idx = Idx("bof_buffer");
-	mime_type_idx = Idx("mime_type");
-	mime_types_idx = Idx("mime_types");
 	}

-File::File(const string& file_id, Connection* conn, analyzer::Tag tag,
-           bool is_orig)
-	: id(file_id), val(0), postpone_timeout(false), first_chunk(true),
-	  missed_bof(false), need_reassembly(false), done(false),
-	  did_file_new_event(false), analyzers(this)
+File::File(const string& file_id, const string& source_name, Connection* conn,
+           analyzer::Tag tag, bool is_orig)
+	: id(file_id), val(0), file_reassembler(0), stream_offset(0), 
+	  reassembly_max_buffer(0), did_mime_type(false), 
+	  reassembly_enabled(false), postpone_timeout(false), done(false), 
+	  analyzers(this)
 	{
 	StaticInit();

@ -90,11 +87,10 @@ File::File(const string& file_id, Connection* conn, analyzer::Tag tag,

 	val = new RecordVal(fa_file_type);
 	val->Assign(id_idx, new StringVal(file_id.c_str()));
+	SetSource(source_name);

 	if ( conn )
 		{
-		// add source, connection, is_orig fields
-		SetSource(analyzer_mgr->GetComponentName(tag));
 		val->Assign(is_orig_idx, new Val(is_orig, TYPE_BOOL));
 		UpdateConnectionFields(conn, is_orig);
 		}
@ -106,12 +102,7 @@ File::~File()
 	{
 	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] Destroying File object", id.c_str());
 	Unref(val);
-
-	while ( ! fonc_queue.empty() )
-		{
-		delete_vals(fonc_queue.front().second);
-		fonc_queue.pop();
-		}
+	delete file_reassembler;
 	}

 void File::UpdateLastActivityTime()
@ -124,10 +115,10 @@ double File::GetLastActivityTime() const
 	return val->Lookup(last_active_idx)->AsTime();
 	}

-void File::UpdateConnectionFields(Connection* conn, bool is_orig)
+bool File::UpdateConnectionFields(Connection* conn, bool is_orig)
 	{
 	if ( ! conn )
-		return;
+		return false;

 	Val* conns = val->Lookup(conns_idx);

@ -138,29 +129,30 @@ void File::UpdateConnectionFields(Connection* conn, bool is_orig)
 		}

 	Val* idx = get_conn_id_val(conn);
-	if ( ! conns->AsTableVal()->Lookup(idx) )
-		{
-		Val* conn_val = conn->BuildConnVal();
-		conns->AsTableVal()->Assign(idx, conn_val);

-		if ( FileEventAvailable(file_over_new_connection) )
+	if ( conns->AsTableVal()->Lookup(idx) )
+		{
+		Unref(idx);
+		return false;
+		}
+
+	conns->AsTableVal()->Assign(idx, conn->BuildConnVal());
+	Unref(idx);
+	return true;
+	}
+
+void File::RaiseFileOverNewConnection(Connection* conn, bool is_orig)
+	{
+	if ( conn && FileEventAvailable(file_over_new_connection) )
 		{
 		val_list* vl = new val_list();
 		vl->append(val->Ref());
-			vl->append(conn_val->Ref());
+		vl->append(conn->BuildConnVal());
 		vl->append(new Val(is_orig, TYPE_BOOL));
-
-			if ( did_file_new_event )
 		FileEvent(file_over_new_connection, vl);
-			else
-				fonc_queue.push(pair<EventHandlerPtr, val_list*>(
-				        file_over_new_connection, vl));
 		}
 	}

-	Unref(idx);
-	}
-
 uint64 File::LookupFieldDefaultCount(int idx) const
 	{
 	Val* v = val->LookupWithDefault(idx);
@ -242,7 +234,7 @@ bool File::IsComplete() const
 	if ( ! total )
 		return false;

-	if ( LookupFieldDefaultCount(seen_bytes_idx) >= total->AsCount() )
+	if ( stream_offset >= total->AsCount() )
 		return true;

 	return false;
@ -258,7 +250,10 @@ bool File::AddAnalyzer(file_analysis::Tag tag, RecordVal* args)
 	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] Queuing addition of %s analyzer",
 		id.c_str(), file_mgr->GetComponentName(tag).c_str());

-	return done ? false : analyzers.QueueAdd(tag, args);
+	if ( done )
+		return false;
+
+	return analyzers.QueueAdd(tag, args) != 0;
 	}

 bool File::RemoveAnalyzer(file_analysis::Tag tag, RecordVal* args)
@ -269,9 +264,70 @@ bool File::RemoveAnalyzer(file_analysis::Tag tag, RecordVal* args)
 	return done ? false : analyzers.QueueRemove(tag, args);
 	}

+void File::EnableReassembly()
+	{
+	reassembly_enabled = true;
+	}
+
+void File::DisableReassembly()
+	{
+	reassembly_enabled = false;
+	delete file_reassembler;
+	file_reassembler = 0;
+	}
+
+void File::SetReassemblyBuffer(uint64 max)
+	{
+	reassembly_max_buffer = max;
+	}
+
+bool File::DetectMIME()
+	{
+	did_mime_type = true;
+
+	Val* bof_buffer_val = val->Lookup(bof_buffer_idx);
+
+	if ( ! bof_buffer_val )
+		{
+		if ( bof_buffer.size == 0 )
+			return false;
+
+		BroString* bs = concatenate(bof_buffer.chunks);
+		bof_buffer_val = new StringVal(bs);
+		val->Assign(bof_buffer_idx, bof_buffer_val);
+		}
+
+	RuleMatcher::MIME_Matches matches;
+	const u_char* data = bof_buffer_val->AsString()->Bytes();
+	uint64 len = bof_buffer_val->AsString()->Len();
+	len = min(len, LookupFieldDefaultCount(bof_buffer_size_idx));
+	file_mgr->DetectMIME(data, len, &matches);
+
+	if ( matches.empty() )
+		return false;
+
+	if ( FileEventAvailable(file_mime_type) )
+		{
+		val_list* vl = new val_list();
+		vl->append(val->Ref());
+		vl->append(new StringVal(*(matches.begin()->second.begin())));
+		FileEvent(file_mime_type, vl);
+		}
+
+	if ( FileEventAvailable(file_mime_types) )
+		{
+		val_list* vl = new val_list();
+		vl->append(val->Ref());
+		vl->append(file_analysis::GenMIMEMatchesVal(matches));
+		FileEvent(file_mime_types, vl);
+		}
+
+	return true;
+	}
+
 bool File::BufferBOF(const u_char* data, uint64 len)
 	{
-	if ( bof_buffer.full || bof_buffer.replayed )
+	if ( bof_buffer.full )
 		return false;

 	uint64 desired_size = LookupFieldDefaultCount(bof_buffer_size_idx);
@ -279,131 +335,154 @@ bool File::BufferBOF(const u_char* data, uint64 len)
 	bof_buffer.chunks.push_back(new BroString(data, len, 0));
 	bof_buffer.size += len;

-	if ( bof_buffer.size >= desired_size )
-		{
+	if ( bof_buffer.size < desired_size )
+		return true;
+
 	bof_buffer.full = true;
-		ReplayBOF();
-		}

-	return true;
-	}
-
-bool File::DetectMIME(const u_char* data, uint64 len)
+	if ( bof_buffer.size > 0 )
 		{
-	RuleMatcher::MIME_Matches matches;
-	len = min(len, LookupFieldDefaultCount(bof_buffer_size_idx));
-	file_mgr->DetectMIME(data, len, &matches);
-
-	if ( matches.empty() )
-		return false;
-
-	val->Assign(mime_type_idx,
-	            new StringVal(*(matches.begin()->second.begin())));
-	val->Assign(mime_types_idx, file_analysis::GenMIMEMatchesVal(matches));
-
-	return true;
-	}
-
-void File::ReplayBOF()
-	{
-	if ( bof_buffer.replayed )
-		return;
-
-	bof_buffer.replayed = true;
-
-	if ( bof_buffer.chunks.empty() )
-		{
-		// Since we missed the beginning, try file type detect on next data in.
-		missed_bof = true;
-		return;
-		}
-
 		BroString* bs = concatenate(bof_buffer.chunks);
 		val->Assign(bof_buffer_idx, new StringVal(bs));
+		}

-	DetectMIME(bs->Bytes(), bs->Len());
-	FileEvent(file_new);
+	return false;
+	}

-	for ( size_t i = 0; i < bof_buffer.chunks.size(); ++i )
-		DataIn(bof_buffer.chunks[i]->Bytes(), bof_buffer.chunks[i]->Len());
+void File::DeliverStream(const u_char* data, uint64 len)
+	{
+	bool bof_was_full = bof_buffer.full;
+	// Buffer enough data for the BOF buffer
+	BufferBOF(data, len);
+
+	if ( ! did_mime_type && bof_buffer.full &&
+	     LookupFieldDefaultCount(missing_bytes_idx) == 0 )
+		DetectMIME();
+
+	DBG_LOG(DBG_FILE_ANALYSIS,
+	        "[%s] %" PRIu64 " stream bytes in at offset %" PRIu64 "; %s [%s%s]",
+	        id.c_str(), len, stream_offset,
+	        IsComplete() ? "complete" : "incomplete",
+	        fmt_bytes((const char*) data, min((uint64)40, len)),
+	        len > 40 ? "..." : "");
+
+	file_analysis::Analyzer* a = 0;
+	IterCookie* c = analyzers.InitForIteration();
+
+	while ( (a = analyzers.NextEntry(c)) )
+		{
+		if ( ! a->GotStreamDelivery() )
+			{
+			int num_bof_chunks_behind = bof_buffer.chunks.size();
+
+			if ( ! bof_was_full )
+				// We just added a chunk to the BOF buffer, don't count it
+				// as it will get delivered on its own.
+				num_bof_chunks_behind -= 1;
+
+			uint64 bytes_delivered = 0;
+
+			// Catch this analyzer up with the BOF buffer.
+			for ( int i = 0; i < num_bof_chunks_behind; ++i )
+				{
+				if ( ! a->DeliverStream(bof_buffer.chunks[i]->Bytes(),
+				                        bof_buffer.chunks[i]->Len()) )
+					analyzers.QueueRemove(a->Tag(), a->Args());
+
+				bytes_delivered += bof_buffer.chunks[i]->Len();
+				}
+
+			a->SetGotStreamDelivery();
+			// May need to catch analyzer up on missed gap?
+			// Analyzer should be fully caught up to stream_offset now.
+			}
+
+		if ( ! a->DeliverStream(data, len) )
+			analyzers.QueueRemove(a->Tag(), a->Args());
+		}
+
+	stream_offset += len;
+	IncrementByteCount(len, seen_bytes_idx);
+	}
+
+void File::DeliverChunk(const u_char* data, uint64 len, uint64 offset)
+	{
+	// Potentially handle reassembly and deliver to the stream analyzers.
+	if ( file_reassembler )
+		{
+		if ( reassembly_max_buffer > 0 &&
+		     reassembly_max_buffer < file_reassembler->TotalSize() )
+			{
+			uint64 current_offset = stream_offset;
+			uint64 gap_bytes = file_reassembler->Flush();
+			IncrementByteCount(gap_bytes, overflow_bytes_idx);
+
+			if ( FileEventAvailable(file_reassembly_overflow) )
+				{
+				val_list* vl = new val_list();
+				vl->append(val->Ref());
+				vl->append(new Val(current_offset, TYPE_COUNT));
+				vl->append(new Val(gap_bytes, TYPE_COUNT));
+				FileEvent(file_reassembly_overflow, vl);
+				}
+			}
+
+		// Forward data to the reassembler.
+		file_reassembler->NewBlock(network_time, offset, len, data);
+		}
+	else if ( stream_offset == offset )
+		{
+		// This is the normal case where a file is transferred linearly.
+		// Nothing special should be done here.
+		DeliverStream(data, len);
+		}
+	else if ( reassembly_enabled )
+		{
+		// This is data that doesn't match the offset and the reassembler 
+		// needs to be enabled.
+		file_reassembler = new FileReassembler(this, stream_offset);
+		file_reassembler->NewBlock(network_time, offset, len, data);
+		}
+	else
+		{
+		// We can't reassemble so we throw out the data for streaming.
+		IncrementByteCount(len, overflow_bytes_idx);
+		}
+
+	DBG_LOG(DBG_FILE_ANALYSIS,
+	        "[%s] %" PRIu64 " chunk bytes in at offset %" PRIu64 "; %s [%s%s]",
+	        id.c_str(), len, offset,
+	        IsComplete() ? "complete" : "incomplete",
+	        fmt_bytes((const char*) data, min((uint64)40, len)),
+	        len > 40 ? "..." : "");
+
+	file_analysis::Analyzer* a = 0;
+	IterCookie* c = analyzers.InitForIteration();
+
+	while ( (a = analyzers.NextEntry(c)) )
+		{
+		if ( ! a->DeliverChunk(data, len, offset) )
+			{
+			analyzers.QueueRemove(a->Tag(), a->Args());
+			}
+		}
+
+	if ( IsComplete() )
+		EndOfFile();
 	}

 void File::DataIn(const u_char* data, uint64 len, uint64 offset)
 	{
 	analyzers.DrainModifications();
-
-	if ( first_chunk )
-		{
-		// TODO: this should all really be delayed until we attempt reassembly
-		DetectMIME(data, len);
-		FileEvent(file_new);
-		first_chunk = false;
-		}
-
-	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] %" PRIu64 " bytes in at offset" PRIu64 "; %s [%s]",
-		id.c_str(), len, offset,
-		IsComplete() ? "complete" : "incomplete",
-		fmt_bytes((const char*) data, min((uint64)40, len)), len > 40 ? "..." : "");
-
-	file_analysis::Analyzer* a = 0;
-	IterCookie* c = analyzers.InitForIteration();
-
-	while ( (a = analyzers.NextEntry(c)) )
-		{
-		if ( ! a->DeliverChunk(data, len, offset) )
-			analyzers.QueueRemove(a->Tag(), a->Args());
-		}
-
+	DeliverChunk(data, len, offset);
 	analyzers.DrainModifications();
-
-	// TODO: check reassembly requirement based on buffer size in record
-	if ( need_reassembly )
-		reporter->InternalError("file_analyzer::File TODO: reassembly not yet supported");
-
-	// TODO: reassembly overflow stuff, increment overflow count, eval trigger
-
-	IncrementByteCount(len, seen_bytes_idx);
 	}

 void File::DataIn(const u_char* data, uint64 len)
 	{
 	analyzers.DrainModifications();
-
-	if ( BufferBOF(data, len) )
-		return;
-
-	if ( missed_bof )
-		{
-		DetectMIME(data, len);
-		FileEvent(file_new);
-		missed_bof = false;
-		}
-
-	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] %" PRIu64 " bytes in; %s [%s]",
-		id.c_str(), len,
-		IsComplete() ? "complete" : "incomplete",
-		fmt_bytes((const char*) data, min((uint64)40, len)), len > 40 ? "..." : "");
-
-	file_analysis::Analyzer* a = 0;
-	IterCookie* c = analyzers.InitForIteration();
-
-	while ( (a = analyzers.NextEntry(c)) )
-		{
-		if ( ! a->DeliverStream(data, len) )
-			{
-			analyzers.QueueRemove(a->Tag(), a->Args());
-			continue;
-			}
-
-		uint64 offset = LookupFieldDefaultCount(seen_bytes_idx) +
-		                LookupFieldDefaultCount(missing_bytes_idx);
-
-		if ( ! a->DeliverChunk(data, len, offset) )
-			analyzers.QueueRemove(a->Tag(), a->Args());
-		}
-
+	DeliverChunk(data, len, stream_offset);
 	analyzers.DrainModifications();
-	IncrementByteCount(len, seen_bytes_idx);
 	}

 void File::EndOfFile()
@ -413,10 +492,21 @@ void File::EndOfFile()
 	if ( done )
 		return;

-	analyzers.DrainModifications();
+	if ( file_reassembler )
+		{
+		file_reassembler->Flush();
+		}

-	// Send along anything that's been buffered, but never flushed.
-	ReplayBOF();
+	// Mark the bof_buffer as full in case it isn't yet
+	// so that the whole thing can be flushed out to
+	// any stream analyzers.
+	if ( ! bof_buffer.full )
+		{
+		bof_buffer.full = true;
+		DeliverStream((const u_char*) "", 0);
+		}
+
+	analyzers.DrainModifications();

 	done = true;

@ -436,14 +526,17 @@ void File::EndOfFile()

 void File::Gap(uint64 offset, uint64 len)
 	{
-	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] Gap of size %" PRIu64 " at offset %" PRIu64,
+	DBG_LOG(DBG_FILE_ANALYSIS, "[%s] Gap of size %" PRIu64 " at offset %," PRIu64,
 		id.c_str(), len, offset);

-	analyzers.DrainModifications();
+	if ( file_reassembler && ! file_reassembler->IsCurrentlyFlushing() )
+		{
+		file_reassembler->FlushTo(offset + len);
+		// The reassembler will call us back with all the gaps we need to know.
+		return;
+		}

-	// If we were buffering the beginning of the file, a gap means we've got
-	// as much contiguous stuff at the beginning as possible, so work with that.
-	ReplayBOF();
+	analyzers.DrainModifications();

 	file_analysis::Analyzer* a = 0;
 	IterCookie* c = analyzers.InitForIteration();
@ -464,6 +557,8 @@ void File::Gap(uint64 offset, uint64 len)
 		}

 	analyzers.DrainModifications();
+
+	stream_offset += len;
 	IncrementByteCount(len, missing_bytes_idx);
 	}

@ -482,30 +577,13 @@ void File::FileEvent(EventHandlerPtr h)
 	FileEvent(h, vl);
 	}

-static void flush_file_event_queue(queue<pair<EventHandlerPtr, val_list*> >& q)
-	{
-	while ( ! q.empty() )
-		{
-		pair<EventHandlerPtr, val_list*> p = q.front();
-		mgr.QueueEvent(p.first, p.second);
-		q.pop();
-		}
-	}
-
 void File::FileEvent(EventHandlerPtr h, val_list* vl)
 	{
-	if ( h == file_state_remove )
-		flush_file_event_queue(fonc_queue);
-
 	mgr.QueueEvent(h, vl);

-	if ( h == file_new )
-		{
-		did_file_new_event = true;
-		flush_file_event_queue(fonc_queue);
-		}
-
-	if ( h == file_new || h == file_timeout || h == file_extraction_limit )
+	if ( h == file_new || h == file_over_new_connection ||
+	     h == file_mime_type ||
+	     h == file_timeout || h == file_extraction_limit )
 		{
 		// immediate feedback is required for these events.
 		mgr.Drain();
--- a/src/file_analysis/File.h
+++ b/src/file_analysis/File.h
@ -3,11 +3,11 @@
 #ifndef FILE_ANALYSIS_FILE_H
 #define FILE_ANALYSIS_FILE_H

-#include <queue>
 #include <string>
 #include <utility>
 #include <vector>

+#include "FileReassembler.h"
 #include "Conn.h"
 #include "Val.h"
 #include "Tag.h"
@ -16,6 +16,8 @@

 namespace file_analysis {

+class FileReassembler;
+
 /**
 * Wrapper class around \c fa_file record values from script layer.
 */
@ -86,10 +88,10 @@ public:
 	void SetTotalBytes(uint64 size);

 	/**
-	 * Compares "seen_bytes" field to "total_bytes" field of #val record to
-	 * determine if the full file has been seen.
-	 * @return false if "total_bytes" hasn't been set yet or "seen_bytes" is
-	 *         less than it, else true.
+	 * @return true if file analysis is complete for the file, else false.
+	 * It is incomplete if the total size is unknown or if the number of bytes
+	 * streamed to analyzers (either as data delivers or gap information)
+	 * matches the known total size.
 	 */
 	bool IsComplete() const;

@ -166,18 +168,20 @@ public:

 protected:
 	friend class Manager;
+	friend class FileReassembler;

 	/**
 	 * Constructor; only file_analysis::Manager should be creating these.
 	 * @param file_id an identifier string for the file in pretty hash form
 	 *        (similar to connection uids).
+	 * @param source_name the value for the source field to fill in.
 	 * @param conn a network connection over which the file is transferred.
 	 * @param tag the network protocol over which the file is transferred.
 	 * @param is_orig true if the file is being transferred from the originator
 	 *        of the connection to the responder.  False indicates the other
 	 *        direction.
 	 */
-	File(const string& file_id, Connection* conn = 0,
+	File(const string& file_id, const string& source_name, Connection* conn = 0,
 	     analyzer::Tag tag = analyzer::Tag::Error, bool is_orig = false);

 	/**
@ -185,8 +189,14 @@ protected:
 	 * \c conn_id and UID taken from \a conn.
 	 * @param conn the connection over which a part of the file has been seen.
 	 * @param is_orig true if the connection originator is sending the file.
+	 * @return true if the connection was previously unknown.
 	 */
-	void UpdateConnectionFields(Connection* conn, bool is_orig);
+	bool UpdateConnectionFields(Connection* conn, bool is_orig);
+
+	/**
+	 * Raise the file_over_new_connection event with given arguments.
+	 */
+	void RaiseFileOverNewConnection(Connection* conn, bool is_orig);

 	/**
 	 * Increment a byte count field of #val record by \a size.
@ -219,20 +229,40 @@ protected:
 	 */
 	bool BufferBOF(const u_char* data, uint64 len);

-	/**
-	 * Forward any beginning-of-file buffered data on to DataIn stream.
-	 */
-	void ReplayBOF();
-
 	/**
 	 * Does mime type detection via file magic signatures and assigns
 	 * strongest matching mime type (if available) to \c mime_type
-	 * field in #val.
-	 * @param data pointer to a chunk of file data.
-	 * @param len number of bytes in the data chunk.
+	 * field in #val.  It uses the data in the BOF buffer.
 	 * @return whether a mime type match was found.
 	 */
-	bool DetectMIME(const u_char* data, uint64 len);
+	bool DetectMIME();
+
+	/**
+	 * Enables reassembly on the file.
+	 */
+	void EnableReassembly();
+
+	/**
+	 * Disables reassembly on the file.  If there is an existing reassembler
+	 * for the file, this will cause it to be deleted and won't allow a new
+	 * one to be created until reassembly is reenabled.
+	 */
+	void DisableReassembly();
+
+	/**
+	 * Set a maximum allowed bytes of memory for file reassembly for this file.
+	 */
+	void SetReassemblyBuffer(uint64 max);
+
+	/**
+	 * Perform stream-wise delivery for analyzers that need it.
+	 */
+	void DeliverStream(const u_char* data, uint64 len);
+
+	/** 
+	 * Perform chunk-wise delivery for analyzers that need it.
+	 */
+	void DeliverChunk(const u_char* data, uint64 len, uint64 offset);

 	/**
 	 * Lookup a record field index/offset by name.
@ -246,25 +276,24 @@ protected:
 	 */
 	static void StaticInit();

-private:
+protected:
 	string id;                 /**< A pretty hash that likely identifies file */
 	RecordVal* val;            /**< \c fa_file from script layer. */
+	FileReassembler* file_reassembler; /**< A reassembler for the file if it's needed. */
+	uint64 stream_offset;      /**< The offset of the file which has been forwarded. */
+	uint64 reassembly_max_buffer;      /**< Maximum allowed buffer for reassembly. */
+	bool did_mime_type;        /**< Whether the mime type ident has already been attempted. */
+	bool reassembly_enabled;           /**< Whether file stream reassembly is needed. */
 	bool postpone_timeout;     /**< Whether postponing timeout is requested. */
-	bool first_chunk;          /**< Track first non-linear chunk. */
-	bool missed_bof;           /**< Flags that we missed start of file. */
-	bool need_reassembly;      /**< Whether file stream reassembly is needed. */
 	bool done;                 /**< If this object is about to be deleted. */
-	bool did_file_new_event;   /**< Whether the file_new event has been done. */
-	AnalyzerSet analyzers;     /**< A set of attached file analyzer. */
-	queue<pair<EventHandlerPtr, val_list*> > fonc_queue;
+	AnalyzerSet analyzers;     /**< A set of attached file analyzers. */

 	struct BOF_Buffer {
-		BOF_Buffer() : full(false), replayed(false), size(0) {}
+		BOF_Buffer() : full(false), size(0) {}
 		~BOF_Buffer()
 			{ for ( size_t i = 0; i < chunks.size(); ++i ) delete chunks[i]; }

 		bool full;
-		bool replayed;
 		uint64 size;
 		BroString::CVec chunks;
 	} bof_buffer;              /**< Beginning of file buffer. */
--- a/src/file_analysis/FileReassembler.cc
+++ b/src/file_analysis/FileReassembler.cc
@ -0,0 +1,128 @@
+
+#include "FileReassembler.h"
+#include "File.h"
+
+
+namespace file_analysis {
+
+class File;
+
+FileReassembler::FileReassembler(File *f, uint64 starting_offset)
+	: Reassembler(starting_offset), the_file(f), flushing(false)
+	{
+	}
+
+FileReassembler::FileReassembler()
+	: Reassembler(), the_file(0), flushing(false)
+	{
+	}
+
+FileReassembler::~FileReassembler()
+	{
+	}
+
+uint64 FileReassembler::Flush()
+	{
+	if ( flushing )
+		return 0;
+
+	if ( last_block )
+		{
+		// This is expected to call back into FileReassembler::Undelivered().
+		flushing = true;
+		uint64 rval = TrimToSeq(last_block->upper);
+		flushing = false;
+		return rval;
+		}
+
+	return 0;
+	}
+
+uint64 FileReassembler::FlushTo(uint64 sequence)
+	{
+	if ( flushing )
+		return 0;
+
+	flushing = true;
+	uint64 rval = TrimToSeq(sequence);
+	flushing = false;
+	last_reassem_seq = sequence;
+	return rval;
+	}
+
+void FileReassembler::BlockInserted(DataBlock* start_block)
+	{
+	if ( start_block->seq > last_reassem_seq ||
+	     start_block->upper <= last_reassem_seq )
+		return;
+
+	for ( DataBlock* b = start_block;
+	      b && b->seq <= last_reassem_seq; b = b->next )
+		{
+		if ( b->seq == last_reassem_seq )
+			{ // New stuff.
+			uint64 len = b->Size();
+			last_reassem_seq += len;
+			the_file->DeliverStream(b->block, len);
+			}
+		}
+
+	// Throw out forwarded data
+	TrimToSeq(last_reassem_seq);
+	}
+
+void FileReassembler::Undelivered(uint64 up_to_seq)
+	{
+	// If we have blocks that begin below up_to_seq, deliver them.
+	DataBlock* b = blocks;
+
+	while ( b )
+		{
+		if ( b->seq < last_reassem_seq )
+			{
+			// Already delivered this block.
+			b = b->next;
+			continue;
+			}
+
+		if ( b->seq >= up_to_seq )
+			// Block is beyond what we need to process at this point.
+			break;
+
+		uint64 gap_at_seq = last_reassem_seq;
+		uint64 gap_len = b->seq - last_reassem_seq;
+		the_file->Gap(gap_at_seq, gap_len);
+		last_reassem_seq += gap_len;
+		BlockInserted(b);
+		// Inserting a block may cause trimming of what's buffered,
+		// so have to assume 'b' is invalid, hence re-assign to start.
+		b = blocks;
+		}
+
+	if ( up_to_seq > last_reassem_seq )
+		{
+		the_file->Gap(last_reassem_seq, up_to_seq - last_reassem_seq);
+		last_reassem_seq = up_to_seq;
+		}
+	}
+
+void FileReassembler::Overlap(const u_char* b1, const u_char* b2, uint64 n)
+	{
+	// Not doing anything here yet.
+	}
+
+IMPLEMENT_SERIAL(FileReassembler, SER_FILE_REASSEMBLER);
+
+bool FileReassembler::DoSerialize(SerialInfo* info) const
+	{
+	reporter->InternalError("FileReassembler::DoSerialize not implemented");
+	return false; // Cannot be reached.
+	}
+
+bool FileReassembler::DoUnserialize(UnserialInfo* info)
+	{
+	reporter->InternalError("FileReassembler::DoUnserialize not implemented");
+	return false; // Cannot be reached.
+	}
+
+} // end file_analysis
--- a/src/file_analysis/FileReassembler.h
+++ b/src/file_analysis/FileReassembler.h
@ -0,0 +1,65 @@
+#ifndef FILE_ANALYSIS_FILEREASSEMBLER_H
+#define FILE_ANALYSIS_FILEREASSEMBLER_H
+
+#include "Reassem.h"
+#include "File.h"
+
+class BroFile;
+class Connection;
+
+namespace file_analysis {
+
+class File;
+
+class FileReassembler : public Reassembler {
+public:
+
+	FileReassembler(File* f, uint64 starting_offset);
+	virtual ~FileReassembler();
+
+	void Done();
+
+	// Checks if we have delivered all contents that we can possibly
+	// deliver for this endpoint.
+	void CheckEOF();
+
+	/**
+	 * Discards all contents of the reassembly buffer.  This will spin through
+	 * the buffer and call File::DeliverStream() and File::Gap() wherever
+	 * appropriate.
+	 * @return the number of new bytes now detected as gaps in the file.
+	 */
+	uint64 Flush();
+
+	/**
+	 * Discards all contents of the reassembly buffer up to a given sequence
+	 * number.  This will spin through the buffer and call
+	 * File::DeliverStream() and File::Gap() wherever appropriate.
+	 * @param sequence the sequence number to flush until.
+	 * @return the number of new bytes now detected as gaps in the file.
+	 */
+	uint64 FlushTo(uint64 sequence);
+
+	/**
+	 * @return whether the reassembler is currently is the process of flushing
+	 * out the contents of its buffer.
+	 */
+	bool IsCurrentlyFlushing() const
+		{ return flushing; }
+
+protected:
+	FileReassembler();
+
+	DECLARE_SERIAL(FileReassembler);
+
+	void Undelivered(uint64 up_to_seq);
+	void BlockInserted(DataBlock* b);
+	void Overlap(const u_char* b1, const u_char* b2, uint64 n);
+
+	File* the_file;
+	bool flushing;
+};
+
+} // namespace analyzer::* 
+
+#endif
--- a/src/file_analysis/Manager.cc
+++ b/src/file_analysis/Manager.cc
@ -154,14 +154,12 @@ string Manager::DataIn(const u_char* data, uint64 len, analyzer::Tag tag,
 void Manager::DataIn(const u_char* data, uint64 len, const string& file_id,
                     const string& source)
 	{
-	File* file = GetFile(file_id);
+	File* file = GetFile(file_id, 0, analyzer::Tag::Error, false, false,
+	                     source.c_str());

 	if ( ! file )
 		return;

-	if ( file->GetSource().empty() )
-		file->SetSource(source);
-
 	file->DataIn(data, len);

 	if ( file->IsComplete() )
@ -232,6 +230,39 @@ bool Manager::SetTimeoutInterval(const string& file_id, double interval) const
 	return true;
 	}

+bool Manager::EnableReassembly(const string& file_id)
+	{
+	File* file = LookupFile(file_id);
+
+	if ( ! file )
+		return false;
+
+	file->EnableReassembly();
+	return true;
+	}
+
+bool Manager::DisableReassembly(const string& file_id)
+	{
+	File* file = LookupFile(file_id);
+
+	if ( ! file )
+		return false;
+
+	file->DisableReassembly();
+	return true;
+	}
+
+bool Manager::SetReassemblyBuffer(const string& file_id, uint64 max)
+	{
+	File* file = LookupFile(file_id);
+
+	if ( ! file )
+		return false;
+
+	file->SetReassemblyBuffer(max);
+	return true;
+	}
+
 bool Manager::SetExtractionLimit(const string& file_id, RecordVal* args,
                                 uint64 n) const
 	{
@ -254,28 +285,6 @@ bool Manager::AddAnalyzer(const string& file_id, file_analysis::Tag tag,
 	return file->AddAnalyzer(tag, args);
 	}

-TableVal* Manager::AddAnalyzersForMIMEType(const string& file_id, const string& mtype,
-					   RecordVal* args)
-	{
-	if ( ! tag_set_type )
-		tag_set_type = internal_type("files_tag_set")->AsTableType();
-
-	TableVal* sval = new TableVal(tag_set_type);
-	TagSet* l = LookupMIMEType(mtype, false);
-
-	if ( ! l )
-		return sval;
-
-	for ( TagSet::const_iterator i = l->begin(); i != l->end(); i++ )
-		{
-		file_analysis::Tag tag = *i;
-		if ( AddAnalyzer(file_id, tag, args) )
-			sval->Assign(tag.AsEnumVal(), 0);
-		}
-
-	return sval;
-	}
-
 bool Manager::RemoveAnalyzer(const string& file_id, file_analysis::Tag tag,
                             RecordVal* args) const
 	{
@ -288,7 +297,8 @@ bool Manager::RemoveAnalyzer(const string& file_id, file_analysis::Tag tag,
 	}

 File* Manager::GetFile(const string& file_id, Connection* conn,
-                       analyzer::Tag tag, bool is_orig, bool update_conn)
+                       analyzer::Tag tag, bool is_orig, bool update_conn,
+                       const char* source_name)
 	{
 	if ( file_id.empty() )
 		return 0;
@ -300,10 +310,19 @@ File* Manager::GetFile(const string& file_id, Connection* conn,

 	if ( ! rval )
 		{
-		rval = new File(file_id, conn, tag, is_orig);
+		rval = new File(file_id,
+		                source_name ? source_name
+		                            : analyzer_mgr->GetComponentName(tag),
+		                conn, tag, is_orig);
 		id_map.Insert(file_id.c_str(), rval);
 		rval->ScheduleInactivityTimer();

+		// Generate file_new after inserting it into manager's mapping
+		// in case script-layer calls back in to core from the event.
+		rval->FileEvent(file_new);
+		// Same for file_over_new_connection.
+		rval->RaiseFileOverNewConnection(conn, is_orig);
+
 		if ( IsIgnored(file_id) )
 			return 0;
 		}
@ -311,8 +330,8 @@ File* Manager::GetFile(const string& file_id, Connection* conn,
 		{
 		rval->UpdateLastActivityTime();

-		if ( update_conn )
-			rval->UpdateConnectionFields(conn, is_orig);
+		if ( update_conn && rval->UpdateConnectionFields(conn, is_orig) )
+			rval->RaiseFileOverNewConnection(conn, is_orig);
 		}

 	return rval;
@ -461,63 +480,6 @@ Analyzer* Manager::InstantiateAnalyzer(Tag tag, RecordVal* args, File* f) const
 	return a;
 	}

-Manager::TagSet* Manager::LookupMIMEType(const string& mtype, bool add_if_not_found)
-	{
-	MIMEMap::const_iterator i = mime_types.find(to_upper(mtype));
-
-	if ( i != mime_types.end() )
-		return i->second;
-
-	if ( ! add_if_not_found )
-		return 0;
-
-	TagSet* l = new TagSet;
-	mime_types.insert(std::make_pair(to_upper(mtype), l));
-	return l;
-	}
-
-bool Manager::RegisterAnalyzerForMIMEType(EnumVal* tag, StringVal* mtype)
-	{
-	Component* p = Lookup(tag);
-
-	if ( ! p  )
-		return false;
-
-	return RegisterAnalyzerForMIMEType(p->Tag(), mtype->CheckString());
-	}
-
-bool Manager::RegisterAnalyzerForMIMEType(Tag tag, const string& mtype)
-	{
-	TagSet* l = LookupMIMEType(mtype, true);
-
-	DBG_LOG(DBG_FILE_ANALYSIS, "Register analyzer %s for MIME type %s",
-		GetComponentName(tag).c_str(), mtype.c_str());
-
-	l->insert(tag);
-	return true;
-	}
-
-bool Manager::UnregisterAnalyzerForMIMEType(EnumVal* tag, StringVal* mtype)
-	{
-	Component* p = Lookup(tag);
-
-	if ( ! p  )
-		return false;
-
-	return UnregisterAnalyzerForMIMEType(p->Tag(), mtype->CheckString());
-	}
-
-bool Manager::UnregisterAnalyzerForMIMEType(Tag tag, const string& mtype)
-	{
-	TagSet* l = LookupMIMEType(mtype, true);
-
-	DBG_LOG(DBG_FILE_ANALYSIS, "Unregister analyzer %s for MIME type %s",
-		GetComponentName(tag).c_str(), mtype.c_str());
-
-	l->erase(tag);
-	return true;
-	}
-
 RuleMatcher::MIME_Matches* Manager::DetectMIME(const u_char* data, uint64 len,
        RuleMatcher::MIME_Matches* rval) const
 	{
--- a/src/file_analysis/Manager.h
+++ b/src/file_analysis/Manager.h
@ -213,6 +213,21 @@ public:
 	 */
 	bool SetTimeoutInterval(const string& file_id, double interval) const;

+	/**
+	 * Enable the reassembler for a file.
+	 */
+	bool EnableReassembly(const string& file_id);
+	
+	/**
+	 * Disable the reassembler for a file.
+	 */
+	bool DisableReassembly(const string& file_id);
+
+	/**
+	 * Set the reassembly for a file in bytes.
+	 */
+	bool SetReassemblyBuffer(const string& file_id, uint64 max);
+
 	/**
 	 * Sets a limit on the maximum size allowed for extracting the file
 	 * to local disk;
@ -238,18 +253,6 @@ public:
 	bool AddAnalyzer(const string& file_id, file_analysis::Tag tag,
 	                 RecordVal* args) const;

-	/**
-	 * Queue attachment of an all analyzers associated with a given MIME
-	 * type to the file identifier.
-	 *
-	 * @param file_id the file identifier/hash.
-	 * @param mtype the MIME type; comparisions will be performanced case-insensitive.
-	 * @param args a \c AnalyzerArgs value which describes a file analyzer.
-	 * @return A ref'ed \c set[Tag] with all added analyzers.
-	 */
-	TableVal* AddAnalyzersForMIMEType(const string& file_id, const string& mtype,
-					  RecordVal* args);
-
 	/**
 	 * Queue removal of an analyzer for a given file identifier.
 	 * @param file_id the file identifier/hash.
@ -276,62 +279,6 @@ public:
 	 */
 	Analyzer* InstantiateAnalyzer(Tag tag, RecordVal* args, File* f) const;

-	/**
-	 * Registers a MIME type for an analyzer. Once registered, files of
-	 * that MIME type will automatically get a corresponding analyzer
-	 * assigned.
-	 *
-	 * @param tag The analyzer's tag as an enum of script type \c
-	 * Files::Tag.
-	 *
-	 * @param mtype The MIME type. It will be matched case-insenistive.
-	 *
-	 * @return True if successful.
-	 */
-	bool RegisterAnalyzerForMIMEType(EnumVal* tag, StringVal* mtype);
-
-	/**
-	 * Registers a MIME type for an analyzer. Once registered, files of
-	 * that MIME type will automatically get a corresponding analyzer
-	 * assigned.
-	 *
-	 * @param tag The analyzer's tag as an enum of script type \c
-	 * Files::Tag.
-	 *
-	 * @param mtype The MIME type. It will be matched case-insenistive.
-	 *
-	 * @return True if successful.
-	 */
-	bool RegisterAnalyzerForMIMEType(Tag tag, const string& mtype);
-
-	/**
-	 * Unregisters a MIME type for an analyzer.
-	 *
-	 * @param tag The analyzer's tag as an enum of script type \c
-	 * Files::Tag.
-	 *
-	 * @param mtype The MIME type. It will be matched case-insenistive.
-	 *
-	 * @return True if successful (incl. when the type wasn't actually
-	 * registered for the analyzer).
-	 *
-	 */
-	bool UnregisterAnalyzerForMIMEType(EnumVal* tag, StringVal* mtype);
-
-	/**
-	 * Unregisters a MIME type for an analyzer.
-	 *
-	 * @param tag The analyzer's tag as an enum of script type \c
-	 * Files::Tag.
-	 *
-	 * @param mtype The MIME type. It will be matched case-insenistive.
-	 *
-	 * @return True if successful (incl. when the type wasn't actually
-	 * registered for the analyzer).
-	 *
-	 */
-	bool UnregisterAnalyzerForMIMEType(Tag tag, const string& mtype);
-
 	/**
 	 * Returns a set of all matching MIME magic signatures for a given
 	 * chunk of data.
@ -372,6 +319,7 @@ protected:
 	 *        this file isn't related to a connection).
 	 * @param update_conn whether we need to update connection-related field
 	 *        in the \c fa_file record value associated with the file.
+	 * @param an optional value of the source field to fill in.
 	 * @return the File object mapped to \a file_id or a null pointer if
 	 *         analysis is being ignored for the associated file.  An File
 	 *         object may be created if a mapping doesn't exist, and if it did
@ -380,7 +328,8 @@ protected:
 	 */
 	File* GetFile(const string& file_id, Connection* conn = 0,
 	              analyzer::Tag tag = analyzer::Tag::Error,
-	              bool is_orig = false, bool update_conn = true);
+	              bool is_orig = false, bool update_conn = true,
+	              const char* source_name = 0);

 	/**
 	 * Try to retrieve a file that's being analyzed, using its identifier/hash.
--- a/src/file_analysis/analyzer/extract/Extract.cc
+++ b/src/file_analysis/analyzer/extract/Extract.cc
@ -12,9 +12,9 @@ using namespace file_analysis;
 Extract::Extract(RecordVal* args, File* file, const string& arg_filename,
                 uint64 arg_limit)
    : file_analysis::Analyzer(file_mgr->GetComponentTag("EXTRACT"), args, file),
-      filename(arg_filename), limit(arg_limit)
+      filename(arg_filename), limit(arg_limit), depth(0)
 	{
-	fd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC, 0666);
+	fd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, 0666);

 	if ( fd < 0 )
 		{
@ -53,7 +53,7 @@ file_analysis::Analyzer* Extract::Instantiate(RecordVal* args, File* file)
 	                   limit->AsCount());
 	}

-static bool check_limit_exceeded(uint64 lim, uint64 off, uint64 len, uint64* n)
+static bool check_limit_exceeded(uint64 lim, uint64 depth, uint64 len, uint64* n)
 	{
 	if ( lim == 0 )
 		{
@ -61,29 +61,31 @@ static bool check_limit_exceeded(uint64 lim, uint64 off, uint64 len, uint64* n)
 		return false;
 		}

-	if ( off >= lim )
+	if ( depth >= lim )
 		{
 		*n = 0;
 		return true;
 		}
-
-	*n = lim - off;
-
-	if ( len > *n )
+	else if ( depth + len > lim )
+		{
+		*n = lim - depth;
 		return true;
+		}
 	else
+		{
 		*n = len;
+		}

 	return false;
 	}

-bool Extract::DeliverChunk(const u_char* data, uint64 len, uint64 offset)
+bool Extract::DeliverStream(const u_char* data, uint64 len)
 	{
 	if ( ! fd )
 		return false;

 	uint64 towrite = 0;
-	bool limit_exceeded = check_limit_exceeded(limit, offset, len, &towrite);
+	bool limit_exceeded = check_limit_exceeded(limit, depth, len, &towrite);

 	if ( limit_exceeded && file_extraction_limit )
 		{
@ -92,16 +94,31 @@ bool Extract::DeliverChunk(const u_char* data, uint64 len, uint64 offset)
 		vl->append(f->GetVal()->Ref());
 		vl->append(Args()->Ref());
 		vl->append(new Val(limit, TYPE_COUNT));
-		vl->append(new Val(offset, TYPE_COUNT));
 		vl->append(new Val(len, TYPE_COUNT));
 		f->FileEvent(file_extraction_limit, vl);

-		// Limit may have been modified by BIF, re-check it.
-		limit_exceeded = check_limit_exceeded(limit, offset, len, &towrite);
+		// Limit may have been modified by a BIF, re-check it.
+		limit_exceeded = check_limit_exceeded(limit, depth, len, &towrite);
 		}

 	if ( towrite > 0 )
-		safe_pwrite(fd, data, towrite, offset);
+		{
+		safe_write(fd, reinterpret_cast<const char*>(data), towrite);
+		depth += towrite;
+		}

 	return ( ! limit_exceeded );
 	}
+
+bool Extract::Undelivered(uint64 offset, uint64 len)
+	{
+	if ( depth == offset )
+		{
+		char* tmp = new char[len]();
+		safe_write(fd, tmp, len);
+		delete [] tmp;
+		depth += len;
+		}
+
+	return true;
+	}
--- a/src/file_analysis/analyzer/extract/Extract.h
+++ b/src/file_analysis/analyzer/extract/Extract.h
@ -28,11 +28,18 @@ public:
 	 * Write a chunk of file data to the local extraction file.
 	 * @param data pointer to a chunk of file data.
 	 * @param len number of bytes in the data chunk.
-	 * @param offset number of bytes from start of file at which chunk starts.
 	 * @return false if there was no extraction file open and the data couldn't
 	 *         be written, else true.
 	 */
-	virtual bool DeliverChunk(const u_char* data, uint64 len, uint64 offset);
+	virtual bool DeliverStream(const u_char* data, uint64 len);
+
+	/**
+	 * Report undelivered bytes.
+	 * @param offset distance into the file where the gap occurred.
+	 * @param len number of bytes undelivered.
+	 * @return true
+	 */
+	virtual bool Undelivered(uint64 offset, uint64 len);

 	/**
 	 * Create a new instance of an Extract analyzer.
@ -67,6 +74,7 @@ private:
 	string filename;
 	int fd;
 	uint64 limit;
+	uint64 depth;
 };

 } // namespace file_analysis
--- a/src/file_analysis/analyzer/extract/events.bif
+++ b/src/file_analysis/analyzer/extract/events.bif
@ -11,9 +11,7 @@
 ##
 ## limit: The limit, in bytes, the extracted file is about to breach.
 ##
-## offset: The offset at which a file chunk is about to be written.
-##
 ## len: The length of the file chunk about to be written.
 ##
 ## .. bro:see:: Files::add_analyzer Files::ANALYZER_EXTRACT
-event file_extraction_limit%(f: fa_file, args: any, limit: count, offset: count, len: count%);
+event file_extraction_limit%(f: fa_file, args: any, limit: count, len: count%);
--- a/src/file_analysis/file_analysis.bif
+++ b/src/file_analysis/file_analysis.bif
@ -15,6 +15,27 @@ function Files::__set_timeout_interval%(file_id: string, t: interval%): bool
 	return new Val(result, TYPE_BOOL);
 	%}

+## :bro:see:`Files::enable_reassembly`.
+function Files::__enable_reassembly%(file_id: string%): bool
+	%{
+	bool result = file_mgr->EnableReassembly(file_id->CheckString());
+	return new Val(result, TYPE_BOOL);
+	%}
+
+## :bro:see:`Files::disable_reassembly`.
+function Files::__disable_reassembly%(file_id: string%): bool
+	%{
+	bool result = file_mgr->DisableReassembly(file_id->CheckString());
+	return new Val(result, TYPE_BOOL);
+	%}
+
+## :bro:see:`Files::set_reassembly_buffer_size`.
+function Files::__set_reassembly_buffer%(file_id: string, max: count%): bool
+	%{
+	bool result = file_mgr->SetReassemblyBuffer(file_id->CheckString(), max);
+	return new Val(result, TYPE_BOOL);
+	%}
+
 ## :bro:see:`Files::add_analyzer`.
 function Files::__add_analyzer%(file_id: string, tag: Files::Tag, args: any%): bool
 	%{
@ -26,16 +47,6 @@ function Files::__add_analyzer%(file_id: string, tag: Files::Tag, args: any%): b
 	return new Val(result, TYPE_BOOL);
 	%}

-## :bro:see:`Files::add_analyzers_for_mime_type`.
-function Files::__add_analyzers_for_mime_type%(file_id: string, mtype: string, args: any%): files_tag_set
-	%{
-	using BifType::Record::Files::AnalyzerArgs;
-	RecordVal* rv = args->AsRecordVal()->CoerceTo(AnalyzerArgs);
-	Val* analyzers = file_mgr->AddAnalyzersForMIMEType(file_id->CheckString(), mtype->CheckString(), rv);
-	Unref(rv);
-	return analyzers;
-	%}
-
 ## :bro:see:`Files::remove_analyzer`.
 function Files::__remove_analyzer%(file_id: string, tag: Files::Tag, args: any%): bool
 	%{
@ -60,13 +71,6 @@ function Files::__analyzer_name%(tag: Files::Tag%) : string
 	return new StringVal(file_mgr->GetComponentName(tag));
 	%}

-## :bro:see:`Files::register_for_mime_type`.
-function Files::__register_for_mime_type%(id: Analyzer::Tag, mt: string%) : bool
-	%{
-	bool result = file_mgr->RegisterAnalyzerForMIMEType(id->AsEnumVal(), mt);
-	return new Val(result, TYPE_BOOL);
-	%}
-
 module GLOBAL;

 ## For use within a :bro:see:`get_file_handle` handler to set a unique
--- a/src/parse.y
+++ b/src/parse.y
@ -2,7 +2,7 @@
 // See the file "COPYING" in the main distribution directory for copyright.
 %}

-%expect 75
+%expect 78

 %token TOK_ADD TOK_ADD_TO TOK_ADDR TOK_ANY
 %token TOK_ATENDIF TOK_ATELSE TOK_ATIF TOK_ATIFDEF TOK_ATIFNDEF
@ -24,7 +24,7 @@
 %token TOK_ATTR_PERSISTENT TOK_ATTR_SYNCHRONIZED
 %token TOK_ATTR_RAW_OUTPUT TOK_ATTR_MERGEABLE
 %token TOK_ATTR_PRIORITY TOK_ATTR_LOG TOK_ATTR_ERROR_HANDLER
-%token TOK_ATTR_TYPE_COLUMN
+%token TOK_ATTR_TYPE_COLUMN TOK_ATTR_DEPRECATED

 %token TOK_DEBUG

@ -44,7 +44,7 @@
 %right '!'
 %left '$' '[' ']' '(' ')' TOK_HAS_FIELD TOK_HAS_ATTR

-%type <b> opt_no_test opt_no_test_block
+%type <b> opt_no_test opt_no_test_block opt_deprecated
 %type <str> TOK_ID TOK_PATTERN_TEXT single_pattern
 %type <id> local_id global_id def_global_id event_id global_or_event_id resolve_id begin_func
 %type <id_l> local_id_list
@ -227,6 +227,18 @@ static bool expr_is_table_type_name(const Expr* expr)

 	return false;
 	}
+
+static bool has_attr(const attr_list* al, attr_tag tag)
+	{
+	if ( ! al )
+		return false;
+
+	for ( int i = 0; i < al->length(); ++i )
+		if ( (*al)[i]->Tag() == tag )
+			return true;
+
+	return false;
+	}
 %}

 %union {
@ -671,6 +683,9 @@ expr:
 					}
 				else
 					$$ = new NameExpr(id);
+
+				if ( id->IsDeprecated() )
+					reporter->Warning("deprecated (%s)", id->Name());
 				}
 			}

@ -759,7 +774,7 @@ enum_body_elem:
 		   error messages if someboy tries to use constant variables as
 		   enumerator.
 		*/
-		TOK_ID '=' TOK_CONSTANT
+		TOK_ID '=' TOK_CONSTANT opt_deprecated
 			{
 			set_location(@1, @3);
 			assert(cur_enum_type);
@ -768,7 +783,7 @@ enum_body_elem:
 				reporter->Error("enumerator is not a count constant");
 			else
 				cur_enum_type->AddName(current_module, $1,
-				                       $3->InternalUnsigned(), is_export);
+				                       $3->InternalUnsigned(), is_export, $4);
 			}

 	|	TOK_ID '=' '-' TOK_CONSTANT
@ -780,11 +795,11 @@ enum_body_elem:
 			reporter->Error("enumerator is not a count constant");
 			}

-	|	TOK_ID
+	|	TOK_ID opt_deprecated
 			{
 			set_location(@1);
 			assert(cur_enum_type);
-			cur_enum_type->AddName(current_module, $1, is_export);
+			cur_enum_type->AddName(current_module, $1, is_export, $2);
 			}
 	;

@ -963,7 +978,12 @@ type:
 				$$ = error_type();
 				}
 			else
+				{
 				Ref($$);
+
+				if ( $1->IsDeprecated() )
+					reporter->Warning("deprecated (%s)", $1->Name());
+				}
 			}
 	;

@ -1139,6 +1159,9 @@ func_body:
 			{
 			saved_in_init.push_back(in_init);
 			in_init = 0;
+
+			if ( has_attr($1, ATTR_DEPRECATED) )
+				current_scope()->ScopeID()->MakeDeprecated();
 			}

 		stmt_list
@ -1265,6 +1288,8 @@ attr:
 			{ $$ = new Attr(ATTR_LOG); }
 	|	TOK_ATTR_ERROR_HANDLER
 			{ $$ = new Attr(ATTR_ERROR_HANDLER); }
+	|	TOK_ATTR_DEPRECATED
+			{ $$ = new Attr(ATTR_DEPRECATED); }
 	;

 stmt:
@ -1450,6 +1475,10 @@ event:
 			{
 			set_location(@1, @4);
 			$$ = new EventExpr($1, $3);
+			ID* id = lookup_ID($1, current_module.c_str());
+
+			if ( id && id->IsDeprecated() )
+				reporter->Warning("deprecated (%s)", id->Name());
 			}
 	;

@ -1556,6 +1585,15 @@ global_or_event_id:
 				if ( ! $$->IsGlobal() )
 					$$->Error("already a local identifier");

+				if ( $$->IsDeprecated() )
+					{
+					BroType* t = $$->Type();
+
+					if ( t->Tag() != TYPE_FUNC ||
+					     t->AsFuncType()->Flavor() != FUNC_FLAVOR_FUNCTION )
+						reporter->Warning("deprecated (%s)", $$->Name());
+					}
+
 				delete [] $1;
 				}

@ -1597,6 +1635,12 @@ opt_no_test_block:
 	|
 			{ $$ = false; }

+opt_deprecated:
+		TOK_ATTR_DEPRECATED
+			{ $$ = true; }
+	|
+			{ $$ = false; }
+
 %%

 int yyerror(const char msg[])
--- a/src/plugin/ComponentManager.h
+++ b/src/plugin/ComponentManager.h
@ -243,7 +243,8 @@ void ComponentManager<T, C>::RegisterComponent(C* component,
 	// Install an identfier for enum value
 	string id = fmt("%s%s", prefix.c_str(), cname.c_str());
 	tag_enum_type->AddName(module, id.c_str(),
-	                       component->Tag().AsEnumVal()->InternalInt(), true);
+	                       component->Tag().AsEnumVal()->InternalInt(), true,
+	                       false);
 	}

 } // namespace plugin
--- a/src/scan.l
+++ b/src/scan.l
@ -260,6 +260,7 @@ when	return TOK_WHEN;
 &create_expire	return TOK_ATTR_EXPIRE_CREATE;
 &default	return TOK_ATTR_DEFAULT;
 &delete_func	return TOK_ATTR_DEL_FUNC;
+&deprecated	return TOK_ATTR_DEPRECATED;
 &raw_output return TOK_ATTR_RAW_OUTPUT;
 &encrypt	return TOK_ATTR_ENCRYPT;
 &error_handler	return TOK_ATTR_ERROR_HANDLER;
--- a/src/strings.bif
+++ b/src/strings.bif
@ -130,7 +130,7 @@ BroString* cat_string_array_n(TableVal* tbl, int start, int end)
 ## .. bro:see:: cat cat_sep string_cat cat_string_array_n
 ##              fmt
 ##              join_string_vec join_string_array
-function cat_string_array%(a: string_array%): string
+function cat_string_array%(a: string_array%): string &deprecated
 	%{
 	TableVal* tbl = a->AsTableVal();
 	return new StringVal(cat_string_array_n(tbl, 1, a->AsTable()->Length()));
@ -149,7 +149,7 @@ function cat_string_array%(a: string_array%): string
 ## .. bro:see:: cat string_cat cat_string_array
 ##              fmt
 ##              join_string_vec join_string_array
-function cat_string_array_n%(a: string_array, start: count, end: count%): string
+function cat_string_array_n%(a: string_array, start: count, end: count%): string &deprecated
 	%{
 	TableVal* tbl = a->AsTableVal();
 	return new StringVal(cat_string_array_n(tbl, start, end));
@ -168,7 +168,7 @@ function cat_string_array_n%(a: string_array, start: count, end: count%): string
 ## .. bro:see:: cat cat_sep string_cat cat_string_array cat_string_array_n
 ##              fmt
 ##              join_string_vec
-function join_string_array%(sep: string, a: string_array%): string
+function join_string_array%(sep: string, a: string_array%): string &deprecated
 	%{
 	vector<const BroString*> vs;
 	TableVal* tbl = a->AsTableVal();
@ -230,7 +230,7 @@ function join_string_vec%(vec: string_vec, sep: string%): string
 ## Returns: A sorted copy of *a*.
 ##
 ## .. bro:see:: sort
-function sort_string_array%(a: string_array%): string_array
+function sort_string_array%(a: string_array%): string_array &deprecated
 	%{
 	TableVal* tbl = a->AsTableVal();
 	int n = a->AsTable()->Length();
@ -338,6 +338,62 @@ static int match_prefix(int s_len, const char* s, int t_len, const char* t)
 	return 1;
 	}

+VectorVal* do_split_string(StringVal* str_val, RE_Matcher* re, int incl_sep,
+                           int max_num_sep)
+	{
+	VectorVal* rval = new VectorVal(string_vec);
+	const u_char* s = str_val->Bytes();
+	int n = str_val->Len();
+	const u_char* end_of_s = s + n;
+	int num = 0;
+	int num_sep = 0;
+
+	int offset = 0;
+	while ( n >= 0 )
+		{
+		offset = 0;
+		// Find next match offset.
+		int end_of_match = 0;
+		while ( n > 0 &&
+		        (end_of_match = re->MatchPrefix(s + offset, n)) <= 0 )
+			{
+			// Move on to next byte.
+			++offset;
+			--n;
+			}
+
+		if ( max_num_sep && num_sep >= max_num_sep )
+			{
+			offset = end_of_s - s;
+			n=0;
+			}
+
+		rval->Assign(num++, new StringVal(offset, (const char*) s));
+
+		// No more separators will be needed if this is the end of string.
+		if ( n <= 0 )
+			break;
+
+		if ( incl_sep )
+			{ // including the part that matches the pattern
+			rval->Assign(num++, new StringVal(end_of_match, (const char*) s+offset));
+			}
+
+		if ( max_num_sep && num_sep >= max_num_sep )
+			break;
+
+		++num_sep;
+
+		n -= end_of_match;
+		s += offset + end_of_match;;
+
+		if ( s > end_of_s )
+			reporter->InternalError("RegMatch in split goes beyond the string");
+		}
+
+	return rval;
+	}
+
 Val* do_split(StringVal* str_val, RE_Matcher* re, int incl_sep, int max_num_sep)
 	{
 	TableVal* a = new TableVal(string_array);
@ -493,17 +549,33 @@ Val* do_sub(StringVal* str_val, RE_Matcher* re, StringVal* repl, int do_all)
 ## Returns: An array of strings where each element corresponds to a substring
 ##          in *str* separated by *re*.
 ##
-## .. bro:see:: split1 split_all split_n str_split
+## .. bro:see:: split1 split_all split_n str_split split_string1 split_string_all split_string_n str_split
 ##
 ## .. note:: The returned table starts at index 1. Note that conceptually the
 ##           return value is meant to be a vector and this might change in the
 ##           future.
 ##
-function split%(str: string, re: pattern%): string_array
+function split%(str: string, re: pattern%): string_array &deprecated
 	%{
 	return do_split(str, re, 0, 0);
 	%}

+## Splits a string into an array of strings according to a pattern.
+##
+## str: The string to split.
+##
+## re: The pattern describing the element separator in *str*.
+##
+## Returns: An array of strings where each element corresponds to a substring
+##          in *str* separated by *re*.
+##
+## .. bro:see:: split_string1 split_string_all split_string_n str_split
+##
+function split_string%(str: string, re: pattern%): string_vec
+	%{
+	return do_split_string(str, re, 0, 0);
+	%}
+
 ## Splits a string *once* into a two-element array of strings according to a
 ## pattern. This function is the same as :bro:id:`split`, but *str* is only
 ## split once (if possible) at the earliest position and an array of two strings
@ -518,12 +590,32 @@ function split%(str: string, re: pattern%): string_array
 ##          second everything after *re*. An array of one string is returned
 ##          when *s* cannot be split.
 ##
-## .. bro:see:: split split_all split_n str_split
-function split1%(str: string, re: pattern%): string_array
+## .. bro:see:: split split_all split_n str_split split_string split_string_all split_string_n str_split
+function split1%(str: string, re: pattern%): string_array &deprecated
 	%{
 	return do_split(str, re, 0, 1);
 	%}

+## Splits a string *once* into a two-element array of strings according to a
+## pattern. This function is the same as :bro:id:`split_string`, but *str* is
+## only split once (if possible) at the earliest position and an array of two
+## strings is returned.
+##
+## str: The string to split.
+##
+## re: The pattern describing the separator to split *str* in two pieces.
+##
+## Returns: An array of strings with two elements in which the first represents
+##          the substring in *str* up to the first occurence of *re*, and the
+##          second everything after *re*. An array of one string is returned
+##          when *s* cannot be split.
+##
+## .. bro:see:: split_string split_string_all split_string_n str_split
+function split_string1%(str: string, re: pattern%): string_vec
+	%{
+	return do_split_string(str, re, 0, 1);
+	%}
+
 ## Splits a string into an array of strings according to a pattern. This
 ## function is the same as :bro:id:`split`, except that the separators are
 ## returned as well. For example, ``split_all("a-b--cd", /(\-)+/)`` returns
@ -538,12 +630,32 @@ function split1%(str: string, re: pattern%): string_array
 ##          to a substring in *str* of the part not matching *re* (odd-indexed)
 ##          and the part that matches *re* (even-indexed).
 ##
-## .. bro:see:: split split1 split_n str_split
-function split_all%(str: string, re: pattern%): string_array
+## .. bro:see:: split split1 split_n str_split split_string split_string1 split_string_n str_split
+function split_all%(str: string, re: pattern%): string_array &deprecated
 	%{
 	return do_split(str, re, 1, 0);
 	%}

+## Splits a string into an array of strings according to a pattern. This
+## function is the same as :bro:id:`split_string`, except that the separators
+## are returned as well. For example, ``split_string_all("a-b--cd", /(\-)+/)``
+## returns ``{"a", "-", "b", "--", "cd"}``: odd-indexed elements do match the
+## pattern and even-indexed ones do not.
+##
+## str: The string to split.
+##
+## re: The pattern describing the element separator in *str*.
+##
+## Returns: An array of strings where each two successive elements correspond
+##          to a substring in *str* of the part not matching *re* (even-indexed)
+##          and the part that matches *re* (odd-indexed).
+##
+## .. bro:see:: split_string split_string1 split_string_n str_split
+function split_string_all%(str: string, re: pattern%): string_vec
+	%{
+	return do_split_string(str, re, 1, 0);
+	%}
+
 ## Splits a string a given number of times into an array of strings according
 ## to a pattern. This function is similar to :bro:id:`split1` and
 ## :bro:id:`split_all`, but with customizable behavior with respect to
@ -563,13 +675,39 @@ function split_all%(str: string, re: pattern%): string_array
 ##          not matching *re* (odd-indexed) and the part that matches *re*
 ##          (even-indexed).
 ##
-## .. bro:see:: split split1 split_all str_split
+## .. bro:see:: split split1 split_all str_split split_string split_string1 split_string_all str_split
 function split_n%(str: string, re: pattern,
-		incl_sep: bool, max_num_sep: count%): string_array
+		incl_sep: bool, max_num_sep: count%): string_array &deprecated
 	%{
 	return do_split(str, re, incl_sep, max_num_sep);
 	%}

+## Splits a string a given number of times into an array of strings according
+## to a pattern. This function is similar to :bro:id:`split_string1` and
+## :bro:id:`split_string_all`, but with customizable behavior with respect to
+## including separators in the result and the number of times to split.
+##
+## str: The string to split.
+##
+## re: The pattern describing the element separator in *str*.
+##
+## incl_sep: A flag indicating whether to include the separator matches in the
+##           result (as in :bro:id:`split_string_all`).
+##
+## max_num_sep: The number of times to split *str*.
+##
+## Returns: An array of strings where, if *incl_sep* is true, each two
+##          successive elements correspond to a substring in *str* of the part
+##          not matching *re* (event-indexed) and the part that matches *re*
+##          (odd-indexed).
+##
+## .. bro:see:: split_string split_string1 split_string_all str_split
+function split_string_n%(str: string, re: pattern,
+		incl_sep: bool, max_num_sep: count%): string_vec
+	%{
+	return do_split_string(str, re, incl_sep, max_num_sep);
+	%}
+
 ## Substitutes a given replacement string for the first occurrence of a pattern
 ## in a given string.
 ##
--- a/testing/btest/Baseline/bifs.split_string/out
+++ b/testing/btest/Baseline/bifs.split_string/out
@ -0,0 +1,32 @@
+t
+s is a t
+t
+---------------------
+t
+s is a test
+---------------------
+t
+hi
+s is a t
+es
+t
+---------------------
+t
+s is a test
+---------------------
+t
+hi
+s is a test
+---------------------
+[, thi, s i, s a tes, t]
+---------------------
+X-Mailer
+Testing Test (http://www.example.com)
+---------------------
+A 
+=
+ B 
+=
+ C 
+=
+ D
--- a/Show more
+++ b/Show more
 @ -1 +1 @@
 .3-343
 .3-411