Commit graph

49 commits

Author SHA1 Message Date
Tim Wojtulewicz
2c2a595af5 Fix clang-tidy bugprone-switch-missing-default-case warnings 2025-05-27 11:58:27 -07:00
Tim Wojtulewicz
8b992320cb Remove unnecessary #includes in cluster/broker/iosource/probabilistic/session 2025-05-19 10:25:05 -07:00
Dominik Charousset
a69928d977 Integrate review feedback 2023-12-04 15:23:56 +01:00
Dominik Charousset
647fdf7737 Add facade types to avoid using raw Broker types
By avoiding to use `broker::data` directly, we gain a degree of freedom
that allows us to swap out `broker::data` for something else (e.g.,
`broker::variant`) in the future. Furthermore, it also helps us to keep
Broker types "local" to the Broker manager and gives us a nicer
interface.

Also replaces uses of `broker::expected` with `std::optional`. While an
`expected `can carry additional information as to why a value is not
present, nothing in Zeek ever cared about that. Hence, using
`std::optional` removes an unnecessary dependency on a Broker detail
while also being more efficient (no extra heap allocation when no value
is present).
2023-12-04 15:23:28 +01:00
Dominik Charousset
c500370563 Avoid OpenSSL header dependencies 2023-11-03 15:54:46 +01:00
Benjamin Bannier
f5a76c1aed Reformat Zeek in Spicy style
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
2023-10-30 09:40:55 +01:00
Tim Wojtulewicz
64b78f6fb9 Use emplace_back over push_back where appropriate 2023-07-07 09:17:05 -07:00
Tim Wojtulewicz
4957dace64 Simplify type trait usage (remove ::value usage) 2023-07-07 09:17:05 -07:00
Dominik Charousset
56f30b500a Update to latest Broker without public CAF dep 2021-12-20 08:16:21 +01:00
Tim Wojtulewicz
b2f171ec69 Reformat the world 2021-09-16 15:35:39 -07:00
Tim Wojtulewicz
96d9115360 GH-1079: Use full paths starting with zeek/ when including files 2020-11-12 12:15:26 -07:00
Tim Wojtulewicz
fe0c22c789 Base: Clean up explicit uses of namespaces in places where they're not necessary.
This commit covers all of the common and base classes.
2020-08-24 12:07:00 -07:00
Tim Wojtulewicz
ddf48d7529 Move a few of the zeek::util methods and variables to zeek::util::detail 2020-08-20 16:11:44 -07:00
Tim Wojtulewicz
8d2d867a65 Move everything in util.h to zeek::util namespace.
This commit includes renaming a number of methods prefixed with bro_ to be prefixed with zeek_.
2020-08-20 16:00:33 -07:00
Tim Wojtulewicz
f310795d79 Move probabilistic code into zeek namespaces 2020-08-20 15:55:17 -07:00
Tim Wojtulewicz
a2a435360a Move all of the hashing classes/functions to zeek::detail namespace 2020-07-31 16:23:34 -04:00
Jon Siwek
6bbb0a6b48 Deprecate bro_prng(), replace with zeek::prng()
The type used for storing the state of the RNG is changed from
`unsigned int` to `long int` since the former has a minimal range
of [0, 65,535] while the RNG function itself has a range of
[1, 2147483646].  A `long int` must be capable of
[−2147483647, +2147483647] and is also the return type of `random()`,
which is what zeek::prng() aims to roughly parity.
2020-07-22 14:01:33 -07:00
Tim Wojtulewicz
64332ca22c Move all Val classes to the zeek namespaces 2020-06-30 20:48:09 -07:00
Jon Siwek
4debad8caf Switch zeek:🆔:lookup to zeek:🆔:find
For parity with Scope since it now uses Find instead of Lookup
2020-05-14 18:00:18 -07:00
Jon Siwek
a5762c12cc Move various elements into ID.h and zeek::id namespace
* A handful of generic/useful/common global type pointers that used
  to be in NetVar.h

* Lookup functions that used to be Var.h
2020-05-14 17:24:20 -07:00
Jon Siwek
d34b24e776 Deprecate global Val pointers in NetVar.h
All of these have fairly niche uses, so better maintained as
lookup/static closer to the usage site.
2020-05-14 17:23:20 -07:00
Johanna Amann
360c06a3f8 Start refactoring hashing.
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.

It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.

There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
2020-04-24 18:27:09 -07:00
Johanna Amann
5e7915ae7a Remove the siphash->hmac-md5 switch after 36 bytes.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
2020-04-24 13:14:29 -07:00
Johanna Amann
3937fff57f Replace siphash with Google implementation
This adds the entirety of the highwayhash implementation of Google.
This includes siphash as well as severl highwayhash variants - which
are faster.

This first commit only switches out the siphash implementation. All
hashes that are generated are exactly the same as before. However, this
does make all other hashes available to be used by us.

I did some performance tests vs the previous siphash implementation by
running the 2009-M57-day11-18 trace 100x through both cases. The average
runtime was virtually the same (within 0.014 seconds of each other).

Note that the way that I included the highwayhash implementation in our
cmake setup is... well, let's say hacky. This definitely needs to be
changed a bit before including this in a real build.
2020-04-23 16:05:03 -07:00
Max Kellermann
0db61f3094 include cleanup
The Zeek code base has very inconsistent #includes.  Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed.  Another side effect was a lot of header
bloat which slows down the build.

First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.

After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations.  In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.

This patch speeds up the build by 19%, because each compilation unit
gets smaller.  Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):

Before this patch:

 3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
 760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps

After this patch:

 2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
 72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
2020-02-04 20:51:02 +01:00
Tim Wojtulewicz
54752ef9a1 Deprecate the internal int/uint types in favor of the cstdint types they were based on 2019-08-12 13:50:07 -07:00
Jon Siwek
399496efa8 Merge remote-tracking branch 'origin/topic/johanna/remove-serializer'
* origin/topic/johanna/remove-serializer:
  Fix memory leak introduced by removing opaque of ocsp_resp.
  Change return value of OpaqueVal::DoSerialize.
  Add missing ShallowClone implementation for SetType
  Remove opaque of ocsp_resp.
  Remove remnants of event serializer.
  Fix cardinalitycounter deserialization.
  Smaller compile fixes for the new opaque serialization.
  Reimplement serialization infrastructure for OpaqueVals.
  Couple of compile fixes.
  Remove const from ShallowClone.
  Remove test-case for removed functionality
  Implement a Shallow Clone operation for types.
  Remove value serialization.

Various changes I made:

- Fix memory leak in type-checker for opaque vals wrapped in broker::data

- Noticed the two "copy-all" leak tests weren't actually checking for
  memory leaks because the heap checker isn't active until after zeek_init()
  is evaluated.

- Change OpaqueVal::DoClone to use the clone caching mechanism

- Improve copy elision for broker::expected return types in the various
  OpaqueVal serialize methods

  - Not all compilers end up properly treating the return of
    local/automatic variable as an rvalue that can be moved, and ends up
    copying it instead.

  - Particularly, until GCC 8, this pattern ends up copying instead of
    moving, and we still support platforms whose default compiler
    pre-dates that version.

  - Generally seems it's something that wasn't addressed until C++14.
    See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1579

- Change OpaqueVal::SerializeType to return broker::expected

- Change probabilistic DoSerialize methods to return broker::expected
2019-06-20 13:38:54 -07:00
Johanna Amann
618f0802f4 Smaller compile fixes for the new opaque serialization.
Also remove the non-existing clone function for EntrypyVals - which now
can just use serialization :)
2019-06-17 14:48:02 -07:00
Robin Sommer
01e662b3e0 Reimplement serialization infrastructure for OpaqueVals.
We need this to sender through Broker, and we also leverage it for
cloning opaques. The serialization methods now produce Broker data
instances directly, and no longer go through the binary formatter.

Summary of the new API for types derived from OpaqueVal:

    - Add DECLARE_OPAQUE_VALUE(<class>) to the class declaration
    - Add IMPLEMENT_OPAQUE_VALUE(<class>) to the class' implementation file
    - Implement these two methods (which are declated by the 1st macro):
        - broker::data DoSerialize() const
        - bool DoUnserialize(const broker::data& data)

This machinery should work correctly from dynamic plugins as well.

OpaqueVal provides a default implementation of DoClone() as well that
goes through serialization. Derived classes can provide a more
efficient version if they want.

The declaration of the "OpaqueVal" class has moved into the header
file "OpaqueVal.h", along with the new serialization infrastructure.
This is breaking existing code that relies on the location, but
because the API is changing anyways that seems fine.

This adds an internal BiF
"Broker::__opaque_clone_through_serialization" that does what the name
says: deep-copying an opaque by serializing, then-deserializing. That
can be used to tests the new functionality from btests.

Not quite done yet. TODO:
    - Not all tests pass yet:
        [  0%] language.named-set-ctors ... failed
        [ 16%] language.copy-all-opaques ... failed
        [ 33%] language.set-type-checking ... failed
        [ 50%] language.table-init-container-ctors ... failed
        [ 66%] coverage.sphinx-zeekygen-docs ... failed
        [ 83%] scripts.base.frameworks.sumstats.basic-cluster ... failed

      (Some of the serialization may still be buggy.)

    - Clean up the code a bit more.
2019-06-17 16:13:54 +00:00
Johanna Amann
474efe9e69 Remove value serialization.
Note - this compiles, but you cannot run Bro anymore - it crashes
immediately with a 0-pointer access. The reason behind it is that the
required clone functionality does not work anymore.
2019-05-09 11:54:38 -07:00
Johanna Amann
86161c85c4 A few more updates to the digest functions.
This builds upon the previous commit to make Zeek compile on FIPS
systems.

This patch makes the changes a bit more aggressive. Instead of having a
number of different hash functions with different return values, we now
standardize on EVP_MD_CTX and just have one set of functions, to which
the hash algorithm that is desired is passed.

On the positive side, this enables us to support a wider range of hash
algorithm (and to easily add to them in the future).

I reimplemented the internal_md5 function - we don't support ebdic
systems in any case.

The md5/sha1 serialization functions are now also tested (I don't think
they were before).
2019-01-24 10:44:28 -08:00
Robert Clark
a72e9a8126
Tell OpenSSL that MD5 is not used for security in order to allow bro to work properly on a FIPS system 2019-01-11 16:09:42 -05:00
Jon Siwek
0c9878f136 Fix strict-aliasing compiler warning 2018-08-29 17:18:56 -05:00
Robin Sommer
4d84ee82da Merge remote-tracking branch 'origin/topic/johanna/bit-1612'
Addig a new random seed for external tests.

I added a wrapper around the siphash() function to make calling it a
little bit safer at least.

BIT-1612 #merged

* origin/topic/johanna/bit-1612:
  HLL: Fix missing typecast in test case.
  Remove the -K/-J options for setting keys.
  Add test checking the quality of HLL by adding a lot of elements.
  Fix serializing probabilistic hashers.
  Baseline updates after hash function change.
  Also switch BloomFilters from H3 to siphash.
  Change Hashing from H3 to Siphash.
  HLL: Remove unnecessary comparison.
  Hyperloglog: change calculation of Rho
2016-07-14 16:26:17 -07:00
Johanna Amann
4a14fd4688 Fix serializing probabilistic hashers. 2016-07-13 10:12:17 -07:00
Johanna Amann
f1bae871e9 Also switch BloomFilters from H3 to siphash.
This removes all dependencies on H3 in our source tree.
2016-07-13 09:04:10 -07:00
Matthias Vallentin
1d50874256 Use full digest length instead of just one byte.
When our universal hash function fell back to MD5 for inputs larger than
supported by H3, the computation only returned the first byte of the MD5 result
instead of as many bytes as needed to cover sizeof(Hasher::digest).
2014-06-05 16:01:20 +02:00
Daniel Thayer
1d33883dfc Fix compiler warnings 2013-09-13 00:30:18 -05:00
Matthias Vallentin
516e044e34 Use Bro-style platform-independent integer types. 2013-08-16 13:29:52 -07:00
Jon Siwek
774dadfe9a Change bloom filter's dependence on size_t.
That type can vary across platforms, but factored in to a bloom
filter's internal state, e.g. size of the seed.
2013-08-16 12:39:21 -05:00
Robin Sommer
00e4369eae Merge remote-tracking branch 'origin/topic/matthias/bloom-filter' into topic/robin/bloom-filter-merge
* origin/topic/matthias/bloom-filter:
  Support UHF hashing for >= UHASH_KEY_SIZE bytes.
2013-08-01 10:38:33 -07:00
Matthias Vallentin
34965b4e77 Support UHF hashing for >= UHASH_KEY_SIZE bytes. 2013-08-01 19:15:28 +02:00
Robin Sommer
2a0790c231 Changing the Bloom filter hashing so that it's independent of
CompositeHash.

We do this by hashing values added to a BloomFilter another time more
with a stable hash seeded only by either the filter's name or the
global_hash_seed (or Bro's random() seed if neither is defined).

I'm also adding a new bif bloomfilter_internal_state() that returns a
string representation of a Bloom filter's current internal state. This
is solely for writing tests that check that the filters end up
consistent when seeded with the same value.
2013-07-31 19:56:34 -07:00
Robin Sommer
6c197fbebf Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Add new BiF for low-level Bloom filter initialization.
  Introduce global_hash_seed script variable.
2013-07-31 11:41:08 -07:00
Matthias Vallentin
8ca76dd4ee Introduce global_hash_seed script variable.
This commit adds support for script-level specification of a seed to be used by
hashers. For example, if the given name of a Bloom filter is not empty, then
the seed used by the underlying hasher only depends on the Bloom filter name.
If the name is empty, we check whether the user defined a non-empty
global_hash_seed string variable at script and use it instead. If that script
variable does not exist, then we fall back to the initial seed computed a
Bro startup (which is affected ultimately by $BRO_SEED).

See Hasher::MakeSeed for details.
2013-07-31 17:59:08 +02:00
Robin Sommer
629c331ca0 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
* origin/topic/matthias/bloom-filter:
  Update submodules.
  Make hashers serializable.
  Add docs and use default value for hasher names.
2013-07-30 10:06:44 -07:00
Matthias Vallentin
2fc5ca53ff Make hashers serializable.
There exists still a small bug that I could not find; the unit test
istate/opaque.bro fails. If someone sees why, please chime in.
2013-07-25 17:35:35 +02:00
Robin Sommer
474107fe40 Broifying the code.
Also extending API documentation a bit more and fixing a memory leak.
2013-07-23 20:10:32 -07:00
Robin Sommer
21685d2529 Merge remote-tracking branch 'origin/topic/matthias/bloom-filter'
I'm moving the new files into a subdirectory probabilistic, and into a
corresponding namespace. We can later put code for the other
probabilistic data structures there as well.

* origin/topic/matthias/bloom-filter: (45 commits)
  Implement and test Bloom filter merging.
  Make hash functions equality comparable.
  Make counter vectors mergeable.
  Use half adder for bitwise addition and subtraction.
  Fix and test counting Bloom filter.
  Implement missing CounterVector functions.
  Tweak hasher interface.
  Add missing include for GCC.
  Fixing for unserializion error.
  Small fixes and style tweaks.
  Only serialize Bloom filter type if available.
  Create hash policies through factory.
  Remove lingering debug code.
  Factor implementation and change interface.
  Expose Bro's linear congruence PRNG as utility function.
  H3 does not check for zero length input.
  Support seeding for hashers.
  Add utility function to access first random seed.
  Update H3 documentation (and minor style nits.)
  Make H3 seed configurable.
  ...
2013-07-23 16:40:56 -07:00