While we support initializing records via coercion from an expression
list, e.g.,
local x: X = [$x1=1, $x2=2];
this can sometimes obscure the code to readers, e.g., when assigning to
value declared and typed elsewhere. The language runtime has a similar
overhead since instead of just constructing a known type it needs to
check at runtime that the coercion from the expression list is valid;
this can be slower than just writing the readible code in the first
place, see #4559.
With this patch we use explicit construction, e.g.,
local x = X($x1=1, $x2=2);
The previous "fix" caused significant performance degradation without
the signature ever having a chance to trigger. Moving it to policy
seems the best compromise, the alternative being outright removing it.
Repeating the message for every new call to get_file_handle() is not
very useful. It's pretty much an analyzer configuration issue so logging
it once should be enough.
When an analyzer calls DataIn(), there's a costly callback construct
going through the event queue. If an analyzer does not have a
get_file_handle() handler installed, the produced file_id would
end up empty and ignored. Consequently, the get_file_handle() callback
was invoked for every new DataIn() invocations.
This is surprising and costly. Log a warning when this happens and
instead set a generically generated file handle value instead to
prevent the repeated get_file_handle() invocations.
When a fa_file object is created through the use of Input::add_analysis(),
the fa_file's source is likely not valid representation of an analyzer's
tag and a Files::describe() should not error and instead return an empty
description.
Add a new Analyzer::is_tag() helper that can be used to pre-check `f$source`.
This is a script-only change that unrolls File::Info records into
multiple files.log entries if the same file was seen over different
connections by single worker. Consequently, the File::Info record
gets the commonly used uid and id fields added. These fields are
optional for File::Info - a file may be analyzed without relation
to a network connection (e.g by using Input::add_analysis()).
The existing tx_hosts, rx_hosts and conn_uids fields of Files::Info
are not meaningful after this change and removed by default. Therefore,
files.log will have them removed, too.
The tx_hosts, rx_hosts and conn_uids fields can be revived by using the
policy script frameworks/files/deprecated-txhosts-rxhosts-connuids.zeek
included in the distribution. However, with v6.1 this script will be
removed.
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:
hook(rec: any, id: Log::ID, filter: Log::Filter);
The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.
The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:
hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
if ( some_veto_reason(rec) )
break;
}
By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.
The existing predicates are deprecated for removal in 4.1 but continue
to work.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.
There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.
Due to the fact that we switch the file hashing algorithm, all file
hashes change.
The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
This signature is relevant for process dumps on Windows that could be extracted by various tools. The unencrypted transmission of the dump of a critical system process (for example, lsass.exe) via network would be detected by this rule.
* All "Broxygen" usages have been replaced in
code, documentation, filenames, etc.
* Sphinx roles/directives like ":bro:see" are now ":zeek:see"
* The "--broxygen" command-line option is now "--zeexygen"
There are two new script level functions to query and lookup files
from the core by their IDs. These are adding feature parity for
similarly named functions for files. The function prototypes are
as follows:
Files::file_exists(fuid: string): bool
Files::lookup_File(fuid: string): fa_file
Some of the existing mime types received extended matchers
to fix problems with UTF-16 BOMs.
New file mime types:
- .ini files
- MS Registry policy files
- MS Registry files
- MS Registry format files (e.g. DESKTOP.DAT)
- MS Outlook PST files
- Apple AFPInfo files
Mime type fixes:
- MP3 files with ID3 tags.
- JSON and XML matchers were extended
* origin/topic/seth/more-file-type-ident-fixes:
File API updates complete.
Fixes for file type identification.
API changes to file analysis mime type detection.
Make HTTP 206 reassembly require ETags by default.
More file type identification improvements
Fix an issue with files having gaps before the bof_buffer is filled.
Fix an issue with packet loss in http file reporting.
Adding WOFF fonts to file type identification.
Extended JSON matching and added OCSP responses.
Another large signature update.
More signature updates.
Even more file type ident clean up.
Lots of fixes for file type identification.
BIT-1368 #merged
- Backed out eTag changes. The real world is more complicated
than just using eTags to identify the same file.
- A bit of code simplication in the http base scripts.
- Test updates (more existing small problems were identified!).
-
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred". It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply. The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).
Addresses BIT-1368.
- Lots of cleanup and expansion of XML match types.
- Signatures for ATOM and RSS (text/atom, text/rss).
- Improved SOAP signature.
- Improved text/cross-domain-policy signature
- Improved and expanded javascript matching a bit.
- Removed a lot of potentially problematic signatures (performance)
- Split out more signatures from libmagic.sig
- Added a signature for matching JSON. Seems to work ok.
- Signature for MPEGv4 audio.
- Expanded java applet signature.
- Improved PNG matching.
- Improved MP3 matching.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.
The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.
Adapted the existing Log::create_stream calls to explicitly specify a
path value.
Addresses BIT-1324