This moves the ports the LDAP analyzers should be triggered on from the
EVT file to the Zeek module. This gives users full control over which
ports the analyzers are registered for while previously they could only
register them for additional ports (there is no Zeek script equivalent
of `Manager::UnregisterAnalyzerForPort`).
The analyzers could still be triggered via DPD, but this is intentional.
To fully disable analyzers users can use e.g.,
```zeek
event zeek_init()
{
Analyzer::disable_analyzer(Analyzer::ANALYZER_LDAP_TCP);
}
```
On Linux with a default ext4 or tmpfs filesystem, the default buffer size for
reading a pcap is chosen as 4k (strace/gdb validated). When reading large pcaps
containing raw data transfers, the syscall overhead for read becomes visible
in profiles. Support configurability of the buffer size and default to 128kb.
When processing a ~830M PCAP (16 UDP connections, each transferring ~50MB) in
bare mode, this change improves runtime from 1.39 sec to 1.29 sec. Increasing
the buffer further didn't provide a noticeable boost.
Setting this option to false does not count missing bytes in files towards the
extraction limits, and allows to extract data up to the desired limit,
even when partial files are written.
When missing bytes are encountered, files are now written as sparse
files.
Using this option requires the underlying storage and utilities to support
sparse files.
In the past, we allocated a buffer with zeroes and wrote that with
fwrite. Now, instead we just fseek to the correct offset.
This changes the way in which the file extract limit is counted a bit;
skipped bytes do no longer count against the file size limit.
(cherry picked from commit 5071592e9b7105090a1d9de19689c499070749d4)
Setting this option to false does not count missing bytes in files towards the
extraction limits, and allows to extract data up to the desired limit,
even when partial files are written.
When missing bytes are encountered, files are now written as sparse
files.
Using this option requires the underlying storage and utilities to support
sparse files.
(cherry picked from commit afa6f3a0d3b8db1ec5b5e82d26225504c2891089)
OSS Fuzz generated a CWD request and reply followed by very many EPRT
requests. This caused Zeek to re-log the CWD request and invoke `build_url_ftp()`
over and over again resulting in long processing times.
Avoid this scenario by not logging commands that aren't pending anymore.
(cherry picked from commit b05dd31667ff634ec7d017f09d122f05878fdf65)
A call to `extract_filename_from_content_disposition()` is only
efficient if the string is guaranteed to contain the pattern that
is removed by `sub()`. Due to missing brackets around the `[:blank:]`
character class, an overly long string (756kb) ending in
"Type:dtanameaa=" matched the wrong pattern causing `sub()` to
exhibit quadratic runtime. Besides that, we may have potentially
extracted wrong information from a crafted header value.
(cherry picked from commit 6d385b1ca724a10444865e4ad38a58b31a2e2288)
When http_reply events are received before http_request events, either
through faking traffic or possible re-ordering, it is possible to trigger
unbounded state growth due to later http_requests never being matched
again with responses.
Prevent this by synchronizing request/response counters when late
requests come in.
Also forcefully flush pending requests when http_replies are never
observed either due to the analyzer having been disabled or because
half-duplex traffic.
Fixes#1705
This works around the new semantics of is_orig=T for "connections"
from DHCP servers to broadcast addresses. IMO, having the server address
as originator in the conn.log is still more intuitive.
Using pcaps from https://interop.seemann.io/ as samples for QUIC protocol
data didn't produce a conn.log for the contained data. `tcpdump -r`
and Wireshark do show the contained IP/UDP packets. Teach Zeek how
to handle link type DLT_PPP 0x09 using a new PPP analyzer based on the
PPPSerial analyzer code.
Usual update to files/x509 baseline after adding new analyzer due
to enum values changing.
* origin/topic/awelzel/3145-dcerpc-state-clean:
dce-rpc: Test cases for unbounded state growth
dce-rpc: Handle smb2_close_request() in scripts
smb/dce-rpc: Cleanup DCE-RPC analyzers when fid is closed and limit them
dce-rpc: Do not repeatedly register removal hooks
Roughly 2.5 years ago all events taking the ``icmp_conn`` parameter were
removed with 44ad614094 and the NetVar.cc
type not populated anymore.
Remove the left-overs in script land, too.
This patch does two things:
1) For SMB close requests, tear down any associated DCE-RPC
analyzer if one exists.
2) Protect from fid_to_analyzer_map growing unbounded by introducing a
new SMB::max_dce_rpc_analyzers limit and forcefully wipe the
analyzers if exceeded. Propagate this to script land as event
smb_discarded_dce_rpc_analyzers() for additional cleanup.
This is mostly to fix how the binpac SMB analyzer tracks individual
DCE-RPC analyzers per open fid. Connections that re-open the same or
different pipe may currently allocate unbounded number of analyzers.
Closes#3145.
When a JSON document contains key names containing colons or other
special characters that are not valid in Zeek identifiers, from_json()
cannot be used to parse such input.
This change allows a customizable normalization function.
Closes#3142.
* topic/awelzel/3112-log-suffix-left-over-log-rotation:
cluster/logger: Fix leftover-log-rotation in multi-logger setups
cluster/logger: Fix global var reference
Using break in either of the hooks allows to suppress the default reporter
error message rather than suppressing solely based on the existence of an
assertion_failure() handler.
Populating log_metadata during zeek_init() is too late for the
leftover-log-rotation functionality, so do it at script parse time.
Also, prepend archiver_ to the log_metadata table and encoding function
due to being in the global namespace and to align with the
archiver_rotation_format_func. This hasn't been in a released
version yet, so fine to rename still.
Closes#3112
* origin/topic/awelzel/cluster-at-if-removal:
test-all-policy: Do not load nodes-experimental/manager.zeek
cluster/main: Remove extra @if ( Cluster::is_enabled() )
These have been discussed in the context of "@if &analyze" [1] and
am much in favor for not disabling/removing ~100 lines (more than
fits on a single terminal) out from the middle of a file. There's no
performance impact for having these handlers enabled unconditionally.
Also, any future work on "@if &analyze" will look at them again which
we could also skip.
This also reverts back to the behavior where the Cluster::LOG stream
is created even in non cluster setups like in previous Zeek versions.
As long as no one writes to it there's essentially no difference. If
someone does write to Cluster::LOG, I'd argue not black holing these
messages is better. Schema generators using Log::active_streams will
continue to discover Cluster::LOG even if they run in non-cluster
mode.
https://github.com/zeek/zeek/pull/3062#discussion_r1200498905