Commit graph

84 commits

Author SHA1 Message Date
Benjamin Bannier
d5fd29edcd Prefer explicit construction to coercion in record initialization
While we support initializing records via coercion from an expression
list, e.g.,

    local x: X = [$x1=1, $x2=2];

this can sometimes obscure the code to readers, e.g., when assigning to
value declared and typed elsewhere. The language runtime has a similar
overhead since instead of just constructing a known type it needs to
check at runtime that the coercion from the expression list is valid;
this can be slower than just writing the readible code in the first
place, see #4559.

With this patch we use explicit construction, e.g.,

    local x = X($x1=1, $x2=2);
2025-07-11 16:28:37 -07:00
Tim Wojtulewicz
ccefd66d37 Move python signatures to a separate file 2024-12-09 11:08:30 -07:00
Tim Wojtulewicz
bbd7f56dcc Add signatures for Python bytecode for 3.8-3.14 2024-12-06 13:45:46 -07:00
Johanna Amann
2217eab38a Fix cid propagation into files.log
Changes to the connection id were not propagated to files.log in all
cases.

Fixes GH-3700
2024-04-29 14:13:19 +01:00
Arne Welzel
1a5ce65e3d signatures: Move ISO 9660 signature to policy
The previous "fix" caused significant performance degradation without
the signature ever having a chance to trigger. Moving it to policy
seems the best compromise, the alternative being outright removing it.
2024-02-26 13:35:23 +01:00
Arne Welzel
d2409dd432 signatures: Fix ISO 9960 signature
This signature only really works when default_file_bof_buffer_size is bumped
to a sufficient value (40k).
2024-02-22 12:37:40 +01:00
Arne Welzel
d4c99e7c3f files: Warn once for missing get_file_handle()
Repeating the message for every new call to get_file_handle() is not
very useful. It's pretty much an analyzer configuration issue so logging
it once should be enough.
2023-05-19 09:37:51 -07:00
Arne Welzel
e4ab7b2d70 files/main: No empty file_ids
When an analyzer calls DataIn(), there's a costly callback construct
going through the event queue. If an analyzer does not have a
get_file_handle() handler installed, the produced file_id would
end up empty and ignored. Consequently, the get_file_handle() callback
was invoked for every new DataIn() invocations.

This is surprising and costly. Log a warning when this happens and
instead set a generically generated file handle value instead to
prevent the repeated get_file_handle() invocations.
2023-02-06 18:08:05 +01:00
Arne Welzel
85ce48eb1e analyzer/files: handle non-analyzer names in describe_file()
When a fa_file object is created through the use of Input::add_analysis(),
the fa_file's source is likely not valid representation of an analyzer's
tag and a Files::describe() should not error and instead return an empty
description.

Add a new Analyzer::is_tag() helper that can be used to pre-check `f$source`.
2022-12-06 11:17:30 +01:00
Josh Soref
21e0d777b3 Spelling fixes: scripts
* accessing
* across
* adding
* additional
* addresses
* afterwards
* analyzer
* ancillary
* answer
* associated
* attempts
* because
* belonging
* buffer
* cleanup
* committed
* connects
* database
* destination
* destroy
* distinguished
* encoded
* entries
* entry
* hopefully
* image
* include
* incorrect
* information
* initial
* initiate
* interval
* into
* java
* negotiation
* nodes
* nonexistent
* ntlm
* occasional
* omitted
* otherwise
* ourselves
* paragraphs
* particular
* perform
* received
* receiver
* referring
* release
* repetitions
* request
* responded
* retrieval
* running
* search
* separate
* separator
* should
* synchronization
* target
* that
* the
* threshold
* timeout
* transaction
* transferred
* transmission
* triggered
* vetoes
* virtual

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-02 17:36:39 -04:00
Arne Welzel
d2314d2666 files.log: Unroll and introduce uid and id fields
This is a script-only change that unrolls File::Info records into
multiple files.log entries if the same file was seen over different
connections by single worker. Consequently, the File::Info record
gets the commonly used uid and id fields added. These fields are
optional for File::Info - a file may be analyzed without relation
to a network connection (e.g by using Input::add_analysis()).

The existing tx_hosts, rx_hosts and conn_uids fields of Files::Info
are not meaningful after this change and removed by default. Therefore,
files.log will have them removed, too.

The tx_hosts, rx_hosts and conn_uids fields can be revived by using the
policy script frameworks/files/deprecated-txhosts-rxhosts-connuids.zeek
included in the distribution. However, with v6.1 this script will be
removed.
2022-08-16 17:22:20 +02:00
Tim Wojtulewicz
a6378531db Remove trailing whitespace from script files 2021-10-20 09:57:09 -07:00
Jon Siwek
737d2c390b Support explicit disabling of file analyzers 2021-02-23 15:50:18 -08:00
Christian Kreibich
1bd658da8f Support for log filter policy hooks
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:

    hook(rec: any, id: Log::ID, filter: Log::Filter);

The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.

The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:

    hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
            {
            if ( some_veto_reason(rec) )
                break;
            }

By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.

The existing predicates are deprecated for removal in 4.1 but continue
to work.
2020-09-30 12:32:45 -07:00
Johanna Amann
04ed125941 Merge remote-tracking branch 'origin/master' into topic/johanna/hash-unification 2020-05-06 23:18:33 +00:00
Johanna Amann
3bce313b12 Switch file UID hashing from md5 to highwayhash.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.

There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.

Due to the fact that we switch the file hashing algorithm, all file
hashes change.

The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
2020-04-30 10:20:09 -07:00
Seth Hall
dac96a6be3 Fixes a small bug in one signature with a duplicate name.
Also update a single failing test.
2020-04-29 11:22:42 -04:00
Seth Hall
15d43dfbcd Organized and added to the shipped file identification signatures.
- Added ISO 9660 disk image
 - Created new files for categorizing signatures better.
   - executable.sig - Executable (and bytecode) files.
   - java.sig - Java related files (class/jar, etc).
   - programming.sig - Mostly scripting language identification
2020-04-29 11:08:32 -04:00
Alexander Bolshakov
1759205930
Add Windows Minidump file signature
This signature is relevant for process dumps on Windows that could be extracted by various tools. The unencrypted transmission of the dump of a critical system process (for example, lsass.exe) via network would be detected by this rule.
2019-06-28 14:43:38 +03:00
Jon Siwek
aebcb1415d GH-234: rename Broxygen to Zeexygen along with roles/directives
* All "Broxygen" usages have been replaced in
  code, documentation, filenames, etc.

* Sphinx roles/directives like ":bro:see" are now ":zeek:see"

* The "--broxygen" command-line option is now "--zeexygen"
2019-04-22 19:45:50 -07:00
Jon Siwek
a994be9eeb Merge remote-tracking branch 'origin/topic/seth/zeek_init'
* origin/topic/seth/zeek_init:
  Some more testing fixes.
  Update docs and tests for bro_(init|done) -> zeek_(init|done)
  Implement the zeek_init handler.
2019-04-19 11:24:29 -07:00
Seth Hall
8cefb9be42 Implement the zeek_init handler.
Implements the change and a test.
2019-04-14 08:37:35 -04:00
Daniel Thayer
18bd74454b Rename all scripts to have ".zeek" file extension 2019-04-11 21:12:40 -05:00
Daniel Thayer
bff8392ad4 Remove unnecessary ".bro" from @load directives
Removed ".bro" file extensions from "@load" directives because
they are not needed.
2019-03-31 02:24:47 -05:00
Daniel Thayer
01a899255e Convert more redef-able constants to runtime options 2018-08-24 16:05:44 -05:00
Seth Hall
7cb6cf24a6 Functions for retrieving files by their id.
There are two new script level functions to query and lookup files
from the core by their IDs.  These are adding feature parity for
similarly named functions for files.  The function prototypes are
as follows:

  Files::file_exists(fuid: string): bool
  Files::lookup_File(fuid: string): fa_file
2018-01-09 12:16:17 -05:00
Seth Hall
809660d48a Tiny mime-type fix from Dan Caselden. 2017-02-14 07:21:00 -08:00
Seth Hall
645ec39f4b New file types sigs from Keith Lehigh. 2017-01-31 23:33:58 -05:00
Seth Hall
04d41dce5c Tiny xlsx file signature fix.
Thanks to Dan Caselden for noticing!
2016-12-08 08:32:45 -05:00
Seth Hall
15f5deed87 Add a files framework signature for VIM tmp files. 2016-11-02 11:51:38 -04:00
Seth Hall
514dfc3479 Merge remote-tracking branch 'origin/master' into topic/seth/smb
# Conflicts:
#	testing/btest/Baseline/plugins.hooks/output
#	testing/btest/Baseline/scripts.policy.misc.dump-events/all-events.log
#	testing/btest/Baseline/scripts.policy.misc.dump-events/smtp-events.log
2016-06-29 09:43:31 -04:00
Seth Hall
2e9491482f Add ACE archive files to the identified file types.
Addresses BIT-1609.  Thanks Stephen Hosom!
2016-06-14 22:27:09 -04:00
Seth Hall
1fe9e522fb Merge remote-tracking branch 'origin/master' into topic/seth/smb 2016-04-14 21:39:48 -04:00
Seth Hall
9aa9618473 Additional mime types for file identification and a few fixes.
Some of the existing mime types received extended matchers
to fix problems with UTF-16 BOMs.

New file mime types:
 - .ini files
 - MS Registry policy files
 - MS Registry files
 - MS Registry format files (e.g. DESKTOP.DAT)
 - MS Outlook PST files
 - Apple AFPInfo files

Mime type fixes:
 - MP3 files with ID3 tags.
 - JSON and XML matchers were extended
2016-04-14 10:06:58 -04:00
Seth Hall
017fa13393 Fix mime type identification for Windows LNK files. 2016-04-04 15:20:03 -04:00
dmfreemon@users.noreply.github.com
b14b189d12 add support for MIME type video/MP2T
BIT-1457 #merged
2015-08-21 17:32:19 -07:00
Seth Hall
217ccf6063 Add signature support for F4M files. 2015-06-02 12:48:53 -04:00
Robin Sommer
ed91732e09 Merge remote-tracking branch 'origin/topic/seth/more-file-type-ident-fixes'
* origin/topic/seth/more-file-type-ident-fixes:
  File API updates complete.
  Fixes for file type identification.
  API changes to file analysis mime type detection.
  Make HTTP 206 reassembly require ETags by default.
  More file type identification improvements
  Fix an issue with files having gaps before the bof_buffer is filled.
  Fix an issue with packet loss in http file reporting.
  Adding WOFF fonts to file type identification.
  Extended JSON matching and added OCSP responses.
  Another large signature update.
  More signature updates.
  Even more file type ident clean up.
  Lots of fixes for file type identification.

BIT-1368 #merged
2015-04-20 13:31:00 -07:00
Seth Hall
ed375167c8 File API updates complete.
Addresses BIT-1368.
2015-04-20 10:46:48 -04:00
Seth Hall
038e4c24f6 Merge remote-tracking branch 'origin/topic/jsiwek/bit-1368' into topic/seth/more-file-type-ident-fixes
Conflicts:
	src/file_analysis/File.cc
	testing/btest/Baseline/plugins.hooks/output
2015-04-20 09:36:40 -04:00
Seth Hall
faabe8a5e3 Fixes for file type identification.
- Backed out eTag changes.  The real world is more complicated
   than just using eTags to identify the same file.
 - A bit of code simplication in the http base scripts.
 - Test updates (more existing small problems were identified!).
 -
2015-04-20 09:34:09 -04:00
Jon Siwek
a55ce01ef3 API changes to file analysis mime type detection.
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred".  It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply.  The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).

Addresses BIT-1368.
2015-04-10 16:31:29 -05:00
Seth Hall
49926ad7bf Merge remote-tracking branch 'origin/master' into topic/seth/more-file-type-ident-fixes 2015-04-09 23:58:52 -04:00
Seth Hall
e8c87e19bd More file type identification improvements
- Split fonts into their own file.
 - Improved JSON matching.
 - Added XML-RPC content matching using application/xml-rpc
 - Added OCSP requests
2015-04-09 01:23:55 -04:00
Seth Hall
8fd5e7f382 Adding WOFF fonts to file type identification. 2015-04-07 02:06:02 -04:00
Seth Hall
422e558d77 Extended JSON matching and added OCSP responses. 2015-04-07 00:46:10 -04:00
Seth Hall
99061fff4c Another large signature update.
- Lots of cleanup and expansion of XML match types.
   - Signatures for ATOM and RSS (text/atom, text/rss).
   - Improved SOAP signature.
   - Improved text/cross-domain-policy signature
 - Improved and expanded javascript matching a bit.
 - Removed a lot of potentially problematic signatures (performance)
 - Split out more signatures from libmagic.sig
 - Added a signature for matching JSON.  Seems to work ok.
 - Signature for MPEGv4 audio.
 - Expanded java applet signature.
 - Improved PNG matching.
 - Improved MP3 matching.
2015-04-06 23:40:20 -04:00
Seth Hall
6861ecc046 More signature updates. 2015-04-06 17:21:53 -04:00
Jon Siwek
186e67ec1d Allow logging filters to inherit default path from stream.
This allows the path for the default filter to be specified explicitly
when creating a stream and reduces the need to rely on the default path
function to magically supply the path.

The default path function is now only used if, when a filter is added to
a stream, it has neither a path nor a path function already.

Adapted the existing Log::create_stream calls to explicitly specify a
path value.

Addresses BIT-1324
2015-03-19 14:49:55 -05:00
Seth Hall
19f498b4a4 Even more file type ident clean up.
- Add detection for ColdFusion scripts.
 - Support detection of XML/HTML with prefixed comment blocks.
2015-03-14 00:25:13 -04:00