This change adds two new hooks to the Intel framework that can be used
to intercept added and removed indicators and their type.
These hooks are fairly low-level. One immediate use-case is to count the
number of indicators loaded per Intel::Type and enable and disable the
corresponding event groups of the intel/seen scripts.
I attempted to gauge the overhead and while it's definitely there, loading
a file with ~500k DOMAIN entries takes somewhere around ~0.5 seconds hooks
when populated via the min_data_store store mechanism. While that
doesn't sound great, it actually takes the manager on my system 2.5
seconds to serialize and Cluster::publish() the min_data_store alone
and its doing that serially for every active worker. Mostly to say that
the bigger overhead in that area on the manager doing redundant work
per worker.
Co-authored-by: Mohan Dhawan <mohan@corelight.com>
This introduces a new hook into the Intel::seen() function that allows
users to directly interact with the result of a find() call via external
scripts.
This should solve the use-case brought up by @chrisanag1985 in
discussion #3256: Recording and acting on "no intel match found".
@Canon88 was recently asking on Slack about enabling HTTP logging for a
given connection only when an Intel match occurred and found that the
Intel::match() event would only occur on the manager. The
Intel::match_remote() event might be a workaround, but possibly running a
bit too late and also it's just an internal "detail" event that might not
be stable.
Another internal use case revolved around enabling packet recording
based on Intel matches which necessarily needs to happen on the worker
where the match happened. The proposed workaround is similar to the above
using Intel::match_remote().
This hook also provides an opportunity to rate-limit heavy hitter intel
items locally on the worker nodes, or even replacing the event approach
currently used with a customized approach.
This simply expands this test to match the behavior of
cluster-transparency-with-proxy, since the two are so similar. This test does
not seem to need disabling the worker's initial send of the data store.
This test was unstable for two reasons:
- Nothing verified whether the two workers had checked in with the proxy,
meaning that messages between the workers and proxies could get lost. This adds
an extra node_up event that the proxy generates synthetically, with values
recognizable to the manager, once the proxy sees both workers connected. This is
a test-level workaround for what should really be a cluster-is-ready event in
the cluster framework proper.
- More subtle: the Intel framework makes the manager send its current
min_data_store to newly connected workers, which in the case of this tests
introduces a race: since the data store, arriving at the worker, replaces the
existing value, it could actually remove already established items if timing was
right. This would lead to the count in the test reaching 3, assuming that 3
intel items are available, when in reality it was less, causing the
Intel::seen() call to do nothing. We now disable the sending of the data store
upon connect, via the global added in the previous commit.
This also expands the test slightly so that both workers call Intel::seen() for
the items inserted by the other worker. This is added validation for the second
point above, because in the presence of that race one occasionally sees one log
entry make it, and the other fail.
- Use `-b` most everywhere, it will save time.
- Start some intel tests upon the input file being fully read instead of
at an arbitrary time.
- Improve termination condition for some sumstats/cluster tests.
- Filter uninteresting output from some supervisor tests.
- Test for `notice_policy.log` is no longer needed.
Haven't checked different build configurations yet, but all except
a few SumStats tests are stable for me now. The external tests
are also completely failing, but haven't looked at those yet.
* Generally increase timeouts for tests that have recent transient
failures
* Change any test that relied on `btest-bg-wait -k` since that's never
going to play with with CI systems. Instead, we always need to have
a well-defined termination condition in the test itself (and most
already did, so didn't really need the `-k` flag anyway).
For backward compatibility when reading values, we first check
the ZEEK-prefixed value, and if not set, then check the corresponding
BRO-prefixed value.
This also installs symlinks from "zeek" and "bro-config" to a wrapper
script that prints a deprecation warning.
The btests pass, but this is still WIP. broctl renaming is still
missing.
#239
* 'topic/jgras/intel-filter' of https://github.com/J-Gras/zeek:
Added new intel policy script to policy test.
Added test for intel removal policy script.
Added policy script for intel removal.
Added test for intel item filtering.
Added hook to filter intelligence items.
This introduces the following redefinable string constants, empty by
default:
- InputAscii::path_prefix
- InputBinary::path_prefix
- Intel::path_prefix
When using ASCII or binary reades in the Input/Intel Framework with an
input stream source that does not have an absolute path, these
constants cause Zeek to prefix the resulting paths accordingly. For
example, in the following the location on disk from which Zeek loads
the input becomes "/path/to/input/whitelist.data":
redef InputAscii::path_prefix = "/path/to/input";
event bro_init()
{
Input::add_table([$source="whitelist.data", ...]);
}
These path prefixes can be absolute or relative. When an input stream
source already uses an absolute path, this path is preserved and the
new variables have no effect (i.e., we do not affect configurations
already using absolute paths).
Since the Intel framework builds upon the Input framework, the first
two paths also affect Intel file locations. If this is undesirable,
the Intel::path_prefix variable allows specifying a separate path:
when its value is absolute, the resulting source seen by the Input
framework is absolute, therefore no further changes to the paths
happen.
Mostly trying to standardize the way tests sleep for arbitrary amounts
of time to make it easier to tell at which particular point the
unit test actually may need the timeout interval increased (or else
debugged further).
When inserting, existance of the given subnet is checked using exact
matching instead of longest prefix matching. Before, inserting a subnet
would have updated the subnet item, which is the longest prefix of the
inserted subnet, if present.
The intel-framework now supports the new indicator type Intel::SUBNET.
As subnets are matched against seen addresses, the field matched was
introduced to indicate which indicator types caused the hit. A testcase
for subents was added and the old ones have been updated accordingly.
By addind debug output to Intel::insert() the testcase reveals that
updating an intel item will cause its metadata to be inserted again,
without the old being deleted.
- Intel importing format has changed (refer to docs).
- All string matching is now case insensitive.
- SMTP intel script has been updated to extract email
addresses correctly.
- Small fix sneaking into the smtp base script to actually
extract individual email addresses in the To: field
correctly.