When an analyzer calls DataIn(), there's a costly callback construct
going through the event queue. If an analyzer does not have a
get_file_handle() handler installed, the produced file_id would
end up empty and ignored. Consequently, the get_file_handle() callback
was invoked for every new DataIn() invocations.
This is surprising and costly. Log a warning when this happens and
instead set a generically generated file handle value instead to
prevent the repeated get_file_handle() invocations.
We previously used the Spicy plugin's `Spicy::available` to test for
Spicy support. However, having Spicy support does not necessarily mean that we
have built Zeek with its in-tree Spicy analyzers: the Spicy plugin
could have been pulled in from external. The new BIF now reliably
tells us whether the Spicy analyzers are available; its result
corresponds to what `zeek-config --have-spicy-analyzers` returns as
well.
We also move the two current checks over to use this BIF.
(Note: I refrained from renaming the CMake-side `USE_SPICY_ANALYERS`
to `HAVE_SPICY_ANALYZERS`. We should do this eventually for
consistency, but I didn't want to make more changes than necessary
right now.)
As initial examples, this branch ports the Syslog and Finger analyzers
over. We leave the old analyzers in place for now and activate them
iff we compile without any Spicy.
Needs `zeek-spicy-infra` branches in `spicy/`, `spicy-plugin/`,
`CMake/`, and `zeek/zeek-testing-private`.
Note that the analyzer events remain associated with the Spicy plugin
for now: that's where they will show up with `-NN`, and also inside
the Zeekygen documentation.
We switch CMake over to linking the runtime library into the plugin,
vs. at the top-level through object libraries.
Add configurability of synchronous and journal_mode for SQLite backed
Broker data stores. Setting these to synchronous=normal and journal_mode=wal
can significantly improve throughput at the cost of some durability in
the presence of power loss or OS crash. In the context of Zeek, this is
likely more than acceptable.
Additionally, add integrity_check and failure_mode options to support deleting
and re-opening a corrupted SQLite database at store creation.
Closes#2698
b41a4bf06d removed a field from this record
because it had a duplicate name as another field. The field does need to
exist, but it needs the correct name.
We were parsing MySQL using bigendian even though the protocol is
specified as with "least significant byte first" [1]. This is most
problematic when parsing length encoded strings with 2 byte length
fields...
Further, I think, the EOF_Packet parsing was borked, either due to
testing the CLIENT_DEPRECATE_EOF with the wrong endianness, or due to
the workaround in Resultset processing raising mysql_ok(). Introduce a
new mysql_eof() that triggers for EOF_Packet's and remove the fake
mysql_ok() Resultset invocation to fix. Adapt the mysql script and tests
to account for the new event.
This is a quite backwards incompatible change on the event level, but
due to being quite buggy in general, doubt this matters to many.
I think there is more buried, but this fixes the violation of the simple
"SHOW ENGINE INNODB STATUS" and the existing tests continue to
succeed...
[1] https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_basic_dt_integers.html
* jeff-bb/patch-2:
Log raw keyboard value on best guess
Avoid excessive fmt calls, return default behavior on unknown
"Best Guess" unknown keyboard / language variants
When disabling_analyzer() was introduced, it was added to the GLOBAL
module. The awkward side-effect is that implementing a hook handler
in another module requires to prefix it with GLOBAL. Alternatively, one
can re-open the GLOBAL module and implement the handler in that scope.
Both are not great, and prefixing with GLOBAL is ugly, so move the
identifier to the Analyzer module and ask users to prefix with Analyzer.
Using "in" to query the language const. This also handles the case of not having a best guess and continue using the existing behavior.
Given
keyboard_layout = 1033 (0x0409), "keyboard-English - United States"
keyboard_layout = 66569 (0x00010409), "keyboard-English - United States (Best Guess)"
keyboard_layout = 12345 (0x3039), "keyboard-12345"
If the lookup table does not have an entry, it will just log as the raw decimal language/keyboard code. With this change, if we do not have an entry in the lookup table, we'll look at the low order / 4 least significant bits to see if we have a match. The high order / 4 most significant bits are flags/modifiers to the base language/keyboard code. We'll append that it is a "Best Guess"
(This is my first attempt at Zeek scripting, apologies upfront if I'm missing obvious language features. I feel like the const language lookup should return a success/fail return code that we would key off of, but unsure how to accomplish that so instead went for string matching on value in == value out).
* origin/topic/awelzel/analyzer-log:
btest/net-control: Use different expiration times for rules
analyzer: Add analyzer.log for logging violations/confirmations
By default this only logs all the violations, regardless of the
confirmation state (for which there's still dpd.log). It includes
packet, protocol and file analyzers.
This uses options, change handlers and event groups for toggling
the functionality at runtime.
Closes#2031
In certain deployment scenarios, all analyzers are disabled by default.
However, conditionally/optionally loaded scripts may rely on analyzers
functioning and declare a request for them.
Add a global set set to the Analyzer module where external scripts can record
their requirement/request for a certain analyzer. Analyzers found in this
set are enabled at zeek_init() time.
This commit adds an optional event_groups field to the Logging::Stream record
to associated event groups with logging streams.
This can be used to disable all event groups of a logging stream when it is
disabled. It does require making an explicit connection between the
logging stream and the involved groups, however.
When a fa_file object is created through the use of Input::add_analysis(),
the fa_file's source is likely not valid representation of an analyzer's
tag and a Files::describe() should not error and instead return an empty
description.
Add a new Analyzer::is_tag() helper that can be used to pre-check `f$source`.
* When a file is transferred over multiple connection, have
create_file_info() just pick the first one instead of none.
* Do not unconditionally assume cid and cuid as set on a
Notice::FileInfo object.
From Vern in GH-846: This is a conscious decision in the TCP analysis to
consider a connection's "duration" to run up through the end of its
productive (= data can be delivered) lifetime, not extending beyond that. So
once it's closed, packets seen subsequently (until the state-holding for the
connection times out) get processed in terms of updating the associated
history, but not the duration. This can include (unnecessarily) retransmitted
data packets, like in one of the examples above. An advantage of this definition
of "duration" is it allows more accurate computation of connection data rates.
* origin/topic/awelzel/2514-expire-all-timers-special-case:
TimerMgr: Add back max_timer_expires=0 special case
Add btest for expiration of all pending timers.
Commit 58fae22708 removed the max_expire==0
handling from DoAdvance() due to not being obvious what use it is. Jan
later reported that it broke the `redef max_timer_expires=0` (#2514).
This commit adds back the special case re-introducing the `max_timer_expires=0` ,
trying to make it fairly explicit that it exists.
This is an adaption of #2516 not adding a new option and trying a bit
to avoid global variable accesses down in DoAdvance(), though that
just moved to InitPostScript().
Fixes#2514.
oss-fuzz produced FTP traffic with a ~550KB long FTP command. Cap FTP command
length at 100 bytes, log a weird if a command is larger than that and move
on to the next. Likely it's not actual FTP traffic, but raising an
analyzer violation would allow clients an easy way to disable the analyzer
by sending an overly long command.
The added test PCAP was generated using a fake Python socket server/client.
When a negotiate request offers no dialects, but the response contains
an ntlm record which selects a dialect, a script error is triggered.
$ zeek -C -r ./f2b0e.pcap 'DPD::ignore_violations+={ Analyzer::ANALYZER_SMB }'
1668615340.837882 expression error in /home/awelzel/corelight-oss/zeek/scripts/base/protocols/smb/./smb1-main.zeek, line 96: no such index (SMB1::c$smb_state$current_cmd$smb1_offered_dialects[SMB1::response$ntlm$dialect_index])
Script error triggered by fuzzing when testing Tim's all-the-fuzzing branch.
While unusual, analyzer_confirmation() may never be called for the
SSH analyzer, but still ssh_auth_attempted is invoked later indicating
successful authentication. I haven't checked how that is actually possible,
but seems prudent to check for the existence of c$ssh$analyzer_id before
referencing it (also in light of runtime enable/disabling of events).
This was found testing Tim's all-the-fuzzing branch on large system,
merging this should avoid oss-fuzz telling us about it.
$ zeek -C -r ./e83db.pcap 'DPD::ignore_violations+={ Analyzer::ANALYZER_SSH }'
1668610572.429058 expression error in scripts/base/protocols/ssh/./main.zeek, line 260: field value missing (SSH::c$ssh$analyzer_id)