* origin/topic/christian/mmdb-configurability:
Modernize various C++/Zeek-isms in the MMDB code.
Fix MMDB code to re-open explicitly opened DBs correctly
Add btest to verify behavior of re-opened MMDBs opened directly via BIFs
Simplify MMDB code by moving more lookup functionality into MMDB class
Move MMDB logic out of mmdb.bif and into MMDB.cc/h.
Fix mmdb.temporary-error testcase when MMDBs are installed on system
Adapt MMDB BiF code to new script-layer variables
Update btest baselines to reflect introduction of mmdb.bif
Move MaxMind/GeoIP BiF functionality into separate file
Provide script-level configurability of MaxMind DB placement on disk
Sort toplevel .bif list in CMakeLists
In AWS GLB environments, the max_depth of 2 is easily reached due to packets
being encapsulated with GENEVE and VXLAN [1]. Any additional encapsulation
layer causes Zeek raise a weird and ignore the inner traffic. Bump the default
maximum depth to 4, while not common it's not unusual either to observe
this in the wild.
[1] https://docs.aws.amazon.com/vpc/latest/mirroring/traffic-mirroring-packet-formats.htmlCloses#3439
The mmdb_open_location_db() and mmdb_open_asn_db() BiFs were untested, and Zeek
has a bug that makes any DBs opened that way fall back to looking up DBs via the
existing script-level config mechanism (via mmdb_dir), which is at least
unexpected and might well be unconfigured if somebody uses the direct BiFs.
The test would previously fail in settings where the user has Maxmind DBs
installed in the hardwired system locations, because the fallback logic still
picked those up.
The initial (prefix) and final (suffix) strings are specified individually
with a variable number of "any" matches that can occur between these.
The previous implementation assumed a single string and rendered it
as *<string>*.
Reported and PCAP provided by @martinvanhensbergen, thanks!
Closeszeek/spicy-ldap#27
* origin/topic/awelzel/3504-ldap-logs-scalars:
Update external baselines
ldap: Use scalar values in logs where appropriate
ldap: Rename LDAP::search_result to LDAP::search_result_entry
Skimming through the RFC, the previous approach of having containers for most
fields seems unfounded for normal protocol operation. The new weirds could just
as well be considered protocol violations. Outside of duplicated or missed data
they just shouldn't happen for well-behaved client/server behavior.
Additionally, with non-conformant traffic it would be trivial to cause
unbounded state growth and immense log record sizes.
Unfortunately, things have become a bit clunky now.
Closes#3504
While it seems interesting functionality, this hasn't been documented,
maintained or knowingly leveraged for many years.
There are various other approaches today, too:
* We track the number of event handler invocations regardless of
profiling. It's possible to approximate a load_sample event by
comparing the result of two get_event_stats() calls. Or, visualize
the corresponding counters in a Prometheus setup to get an idea of
event/s broken down by event names.
* HookCallFunction() allows to intercept script execution, including
measuring the time execution takes.
* The global call_stack and g_frame_stack can be used from plugins
(and even external processes) to walk the Zeek script stack at certain
points to implement a sampling profiler.
* USDT probes or more plugin hooks will likely be preferred over Zeek
builtin functionality in the future.
Relates to #3458
The produced coverage files are of little use in current local workflows
and usually just end-up taking up disk space. ZEEK_PROFILER_FILE can be
set explicitly if there's a one-off need to produce these locally, too.
* origin/topic/vern/CSE-opt:
incorporate latest version of gen-zam to correctly generate indirect calls
added sub-directory for tracking ZAM maintenance issues
BTest to stress-test AST optimizer's assessment of side effects
reworked AST optimizers analysis of side effects during aggregate operations & calls
script optimization support for tracking information associated with BiFs/functions
fix for AST analysis of inlined functions
improved AST optimizer's analysis of variable usage in inlined functions
new method for Stmt nodes to report whether they could execute a "return"
bug fixes for indirect function calls when using ZAM
minor fixes for script optimization, exporting of attr_name, script layout tweak
This change allows to specify a per signature specific event, overriding
the default signature_match event. It further removes the message
parameter from such events if not provided in the signature.
This also tracks the message as StringValPtr directly to avoid
allocating the same StringVal for every DoAction() call.
Closes#3403
* origin/topic/awelzel/log-write-delay-3:
logging: ref() to record_ref() renaming
logging: Fix typos from review
logging/Manager: Make LogDelayExpiredTimer an implementation detail
logging/WriteToFilters: Use range-based for loop
testing/btest: Log::delay() from JavaScript
NEWS: Entry for delayed log writes
Bump doc submodule to branch
logging: Do not keep delay state persistent
logging: delay documentation polishing
logging: Better error messages for invalid Log::delay() calls
logging/Manager: Implement DelayTokenType as an actual opaque
logging: Implement get_delay_queue_size()
logging: Introduce Log::delay() and Log::delay_finish()
logging/Manager: zeek::detail'ify
logging/Manager: Split Write()
Timer: Add LOG_DELAY_EXPIRE timer type
Ascii: Remove extra include
With a bit of tweaking in the JavaScript plugin to support opaque types, this
will allow the delay functionality to work there, too.
Making the LogDelayToken an actual opaque seems reasonable, too. It's not
supposed to be user inspected.
This is a verbose, opinionated and fairly restrictive version of the log delay idea.
Main drivers are explicitly, foot-gun-avoidance and implementation simplicity.
Calling the new Log::delay() function is only allowed within the execution
of a Log::log_stream_policy() hook for the currently active log write.
Conceptually, the delay is placed between the execution of the global stream
policy hook and the individual filter policy hooks. A post delay callback
can be registered with every Log::delay() invocation. Post delay callbacks
can (1) modify a log record as they see fit, (2) veto the forwarding of the
log record to the log filters and (3) extend the delay duration by calling
Log::delay() again. The last point allows to delay a record by an indefinite
amount of time, rather than a fixed maximum amount. This should be rare and
is therefore explicit.
Log::delay() increases an internal reference count and returns an opaque
token value to be passed to Log::delay_finish() to release a delay reference.
Once all references are released, the record is forwarded to all filters
attached to a stream when the delay completes.
This functionality separates Log::log_stream_policy() and individual filter
policy hooks. One consequence is that a common use-case of filter policy hooks,
removing unproductive log records, may run after a record was delayed. Users
can lift their filtering logic to the stream level (or replicate the condition
before the delay decision). The main motivation here is that deciding on a
stream-level delay in per-filter hooks is too late. Attaching multiple filters
to a stream can additionally result in hard to understand behavior.
On the flip side, filter policy hooks are guaranteed to run after the delay
and can be used for further mangling or filtering of a delayed record.
Update cipher consts.
Furthermore some past updates have been applied to scriptland, but it
was not considered that some of these also have to be applied to binpac
code, to be able to correcly parse the ServerKeyExchange message.
(As a side-note - this was discovered due to a test discrepancy with the
Spicy parser)
Allow spicy parsers to generate their own file IDs and provide them to
Zeek. This duplicates functionality that is currently possible (and
used) by some binpac-based analyzers. One example for an analyzer
creating its own file IDs is the SSL analyzer.
Provide a script accessible way to introspect the DFA stats that can be
leveraged to gather runtime statistics of the underlying DFA. This
re-uses the existing MatcherStats used by ``get_matcher_stats()``.
Not sure how useful this is (and the implementation isn't optimized in
any way), but seems reasonable for consistency.
Vern suggested that set[pattern] can already be achieved via
set_to_regex(), so left out any set[pattern] variants.