Commit graph

48 commits

Author SHA1 Message Date
Tim Wojtulewicz
e618d00326 Remove including <cinttypes> from util.h 2025-05-16 10:14:37 -07:00
Arne Welzel
0f1c1cb754 clang-format: Sort doctest header at the bottom 2024-11-15 17:00:00 +01:00
Dominik Charousset
c500370563 Avoid OpenSSL header dependencies 2023-11-03 15:54:46 +01:00
Benjamin Bannier
f5a76c1aed Reformat Zeek in Spicy style
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
2023-10-30 09:40:55 +01:00
Tim Wojtulewicz
4957dace64 Simplify type trait usage (remove ::value usage) 2023-07-07 09:17:05 -07:00
Arne Welzel
92e73606ba HashKey: Do not call Describe() unconditionally in DEBUG mode
An unnecessary overhead of the Hash() method was uncovered for DEBUG builds
due to computing a description of every HashKey() even when the DBG_HASHKEY
stream is not enabled. Squelch it.
2023-02-14 10:52:54 +01:00
Tim Wojtulewicz
a8fc63e182 Merge remote-tracking branch 'microsoft/master'
* microsoft/master: (71 commits)
  Clang formatting
  Mask ports before inserting them into the map
  Fix compiler warning from applied patch
  Remove statistics plugin in favor of stats bif
  Add EventHandler version of stats plugin
  Mark a few EventHandler methods const
  Changed implementation from std::map to std::unordered_map of Val.cc
  Removed const, Windows build is now working
  Added fixes suggested in PR
  Update src/packet_analysis/protocol/ip/IP.cc
  Apply suggestions from code review
  Clang format again but now with v13.0.1
  Rewrote usages of define(_MSC_VER) to ifdef _MSC_VER
  Clang format it all
  Fixed initial CR comments
  Add NEWS entry about Windows port
  Add a couple of extra unistd.h includes to fix a build failure
  Use std::chrono instead of gettimeofday
  Update libkqueue submodule [nomail]
  Don't call tokenize_string if the input string is empty
  ...
2022-11-11 15:23:21 -07:00
Josh Soref
cd201aa24e Spelling src
These are non-functional changes.

* accounting
* activation
* actual
* added
* addresult
* aggregable
* aligned
* alternatively
* ambiguous
* analysis
* analyzer
* anticlimactic
* apparently
* application
* appropriate
* arithmetic
* assignment
* assigns
* associated
* authentication
* authoritative
* barrier
* boundary
* broccoli
* buffering
* caching
* called
* canonicalized
* capturing
* certificates
* ciphersuite
* columns
* communication
* comparison
* comparisons
* compilation
* component
* concatenating
* concatenation
* connection
* convenience
* correctly
* corresponding
* could
* counting
* data
* declared
* decryption
* defining
* dependent
* deprecated
* detached
* dictionary
* directional
* directly
* directory
* discarding
* disconnecting
* distinguishes
* documentation
* elsewhere
* emitted
* empty
* endianness
* endpoint
* enumerator
* essentially
* evaluated
* everything
* exactly
* execute
* explicit
* expressions
* facilitates
* fiddling
* filesystem
* flag
* flagged
* for
* fragments
* guarantee
* guaranteed
* happen
* happening
* hemisphere
* identifier
* identifies
* identify
* implementation
* implemented
* implementing
* including
* inconsistency
* indeterminate
* indices
* individual
* information
* initial
* initialization
* initialize
* initialized
* initializes
* instantiate
* instantiated
* instantiates
* interface
* internal
* interpreted
* interpreter
* into
* it
* iterators
* length
* likely
* log
* longer
* mainly
* mark
* maximum
* message
* minimum
* module
* must
* name
* namespace
* necessary
* nonexistent
* not
* notifications
* notifier
* number
* objects
* occurred
* operations
* original
* otherwise
* output
* overridden
* override
* overriding
* overwriting
* ownership
* parameters
* particular
* payload
* persistent
* potential
* precision
* preexisting
* preservation
* preserved
* primarily
* probably
* procedure
* proceed
* process
* processed
* processes
* processing
* propagate
* propagated
* prototype
* provides
* publishing
* purposes
* queue
* reached
* reason
* reassem
* reassemble
* reassembler
* recommend
* record
* reduction
* reference
* regularly
* representation
* request
* reserved
* retrieve
* returning
* separate
* should
* shouldn't
* significant
* signing
* simplified
* simultaneously
* single
* somebody
* sources
* specific
* specification
* specified
* specifies
* specify
* statement
* subdirectories
* succeeded
* successful
* successfully
* supplied
* synchronization
* tag
* temporarily
* terminating
* that
* the
* transmitted
* true
* truncated
* try
* understand
* unescaped
* unforwarding
* unknown
* unknowndata
* unspecified
* update
* usually
* which
* wildcard

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-09 12:08:15 -05:00
Elad Solomon
3a80b79497 Compile Zeek with MSVC
Allow Zeek to be embedded in another project
2022-11-09 18:15:30 +02:00
Tim Wojtulewicz
3b69dd38f3 Add equality, inequality, copy, and move operators to HashKey 2022-10-10 10:08:58 -07:00
Tim Wojtulewicz
f624c18383 Deprecate bro_int_t and bro_uint_t 2022-07-12 12:01:23 -07:00
Tim Wojtulewicz
9cb54f5d44 clang-format: Force zeek-config.h to be earlier in the config ordering 2021-09-25 11:52:55 -07:00
Christian Kreibich
10e8d36340 Remove unused HashKey constructor and reorder for consistency
One of the HashKey constructors was only used in the old CompHash code.
This aso reorders some constructors and the destructor for readability.
2021-09-20 17:51:43 -07:00
Christian Kreibich
b6a11a69db Add debug string and ODesc support to HashKey class
This allows tracing of hash key buffer reservations, reads, and writes via a new
debug stream, and supports printing a summary of a HashKey object via
Describe(). The latter comes in handy e.g. in TableVal::Describe() (where
including the hash key is now available but commented out).
2021-09-20 17:51:43 -07:00
Christian Kreibich
82822b1e07 Refactor HashKey class to support read/write operations
This preserves the optimization of storing values directly in the key_u member
union when feasible, and using a variable size buffer otherwise. It also adds
bounds-checking for that buffer, moves size arguments to size_t, decouples
construction from hash computation, emulates the tagging feature found in
SerializationFormat to assist troubleshooting, and switches feasible
reinterpret_casts to static_casts.
2021-09-20 17:51:43 -07:00
Tim Wojtulewicz
b2f171ec69 Reformat the world 2021-09-16 15:35:39 -07:00
Tim Wojtulewicz
5e00f78920 Fix a number of Coverity findings
- 1458048: Use-after-free in the SQLite logger
- 1457823: Missing a break statement in script-opt reduction
- 1453966: Dead code in CompHash
- 1445417: Unintialized variable in StaticHash64
- 1437716: Unintialized variables in FileInfo in scan.l
2021-07-02 11:18:19 -07:00
Jon Siwek
8a8a983c49 Add missing zeek/ to header includes
Related to https://github.com/zeek/zeek/pull/1377
2021-01-29 19:16:29 -08:00
Tim Wojtulewicz
96d9115360 GH-1079: Use full paths starting with zeek/ when including files 2020-11-12 12:15:26 -07:00
Tim Wojtulewicz
fe0c22c789 Base: Clean up explicit uses of namespaces in places where they're not necessary.
This commit covers all of the common and base classes.
2020-08-24 12:07:00 -07:00
Tim Wojtulewicz
a2a435360a Move all of the hashing classes/functions to zeek::detail namespace 2020-07-31 16:23:34 -04:00
Tim Wojtulewicz
bfab224d7c Move Reporter to zeek namespace 2020-07-31 16:22:41 -04:00
Tim Wojtulewicz
45d2c96643 Rename BroString files to ZeekString 2020-07-02 17:24:22 -07:00
Tim Wojtulewicz
736a3f53d4 Rename BroString to zeek::String 2020-07-02 16:15:01 -07:00
Tim Wojtulewicz
58c6e10b62 Move BroString to zeek namespace 2020-06-30 21:12:26 -07:00
Jon Siwek
8561c79363 Remove inline from some static KeyedHash members
Coverity Scan builds currently encounter catastrophic error, claiming
alignas requires use on both declaration and definition, so appears to
actually not understand "static inline" in combo with alignas.
2020-06-05 18:20:05 -07:00
Jon Siwek
0db5c920f2 Deprecate names in BifConst, replace with zeek::BifConst
Some Val* types are also replaced with IntrusivePtr at the new location
2020-05-14 17:26:00 -07:00
Johanna Amann
3bce313b12 Switch file UID hashing from md5 to highwayhash.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.

There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.

Due to the fact that we switch the file hashing algorithm, all file
hashes change.

The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
2020-04-30 10:20:09 -07:00
Johanna Amann
bc546634d1 Switch most internal md5 calls to digest calls.
The places that used md5 basically already used it as a digest
algorithm. Switching to a digest just means that the internal values
used to not change between runs - which is actually wanted in these
cases.

This commit also removes our special cmake subdirectory. We don't expose
highwayhash in headers anymore - so we can just treat it as an internal
implementation choice that is not directly exposed to plugins.
2020-04-29 16:05:31 -07:00
Johanna Amann
360c06a3f8 Start refactoring hashing.
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.

It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.

There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
2020-04-24 18:27:09 -07:00
Johanna Amann
5e7915ae7a Remove the siphash->hmac-md5 switch after 36 bytes.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
2020-04-24 13:14:29 -07:00
Johanna Amann
3937fff57f Replace siphash with Google implementation
This adds the entirety of the highwayhash implementation of Google.
This includes siphash as well as severl highwayhash variants - which
are faster.

This first commit only switches out the siphash implementation. All
hashes that are generated are exactly the same as before. However, this
does make all other hashes available to be used by us.

I did some performance tests vs the previous siphash implementation by
running the 2009-M57-day11-18 trace 100x through both cases. The average
runtime was virtually the same (within 0.014 seconds of each other).

Note that the way that I included the highwayhash implementation in our
cmake setup is... well, let's say hacky. This definitely needs to be
changed a bit before including this in a real build.
2020-04-23 16:05:03 -07:00
Tim Wojtulewicz
fd5e15b116 The Great Embooleanating
A large number of functions had return values and/or arguments changed
to use ``bool`` types instead of ``int``.
2020-03-31 06:41:54 +00:00
Max Kellermann
0db61f3094 include cleanup
The Zeek code base has very inconsistent #includes.  Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed.  Another side effect was a lot of header
bloat which slows down the build.

First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.

After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations.  In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.

This patch speeds up the build by 19%, because each compilation unit
gets smaller.  Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):

Before this patch:

 3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
 760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps

After this patch:

 2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
 72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
2020-02-04 20:51:02 +01:00
Tim Wojtulewicz
54752ef9a1 Deprecate the internal int/uint types in favor of the cstdint types they were based on 2019-08-12 13:50:07 -07:00
Daniel Thayer
0ae1bfa29d Rename bro to zeek in error messages
More renaming in error messages and a few other places.
2019-06-16 23:08:45 -05:00
Robin Sommer
789cb376fd GH-239: Rename bro to zeek, bro-config to zeek-config, and bro-path-dev to zeek-path-dev.
This also installs symlinks from "zeek" and "bro-config" to a wrapper
script that prints a deprecation warning.

The btests pass, but this is still WIP. broctl renaming is still
missing.

#239
2019-05-01 21:43:45 +00:00
Robin Sommer
4d84ee82da Merge remote-tracking branch 'origin/topic/johanna/bit-1612'
Addig a new random seed for external tests.

I added a wrapper around the siphash() function to make calling it a
little bit safer at least.

BIT-1612 #merged

* origin/topic/johanna/bit-1612:
  HLL: Fix missing typecast in test case.
  Remove the -K/-J options for setting keys.
  Add test checking the quality of HLL by adding a lot of elements.
  Fix serializing probabilistic hashers.
  Baseline updates after hash function change.
  Also switch BloomFilters from H3 to siphash.
  Change Hashing from H3 to Siphash.
  HLL: Remove unnecessary comparison.
  Hyperloglog: change calculation of Rho
2016-07-14 16:26:17 -07:00
Johanna Amann
e1218cc7fa Change Hashing from H3 to Siphash.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
2016-07-13 06:44:51 -07:00
Robin Sommer
3957091e1b Renaming config.h to bro-config.h.
A couple times now I had this conflicting with files of the same name
in other projects.
2015-07-28 11:57:04 -07:00
Jon Siwek
d7dafe2fe2 Refactoring various usages of new IPAddr class.
Reducing number of places that internal representation was exposed
via GetBytes/CopyIPv6.

Also fixed a bug in remask_addr bif.
2012-02-22 14:45:44 -06:00
Jon Siwek
0f207c243c Port DNS_Mgr to use new IPAddr class, enable lookups on IPv6 addrs.
Host lookups still need to be changed to also do AAAA queries.
2012-02-13 15:57:59 -06:00
Robin Sommer
bd2e30f521 Merge remote-tracking branch 'origin/topic/dist-cleanup'
* origin/topic/dist-cleanup:
  Updating INSTALL
  Updating README
  Remove $Id$ tags
  Remove policy.old directory, adresses #511
2011-09-18 16:17:42 -07:00
Jon Siwek
d412aa9d63 Fix H3 assumption of an 8-bit byte/char.
The hash function was internally casting the void* data argument into an
unsigned char* and then using values from that to index another internal
array that's dimensioned based on the assumption of 256 values possible
for an unsigned char (8-bit chars/bytes).  This is probably a correct
assumption most of the time, but should be safer to use the limits as
defined in standard headers to get it right for the particular
system/compiler.

There was an unused uint8* casted variable in HashKey::HashBytes that
seemed like it might have been meant to be passed to H3's hash function
as an unfinished attempt to solve the 8-bit byte assumption problem, but
that doesn't seem as good as taking care of that internally in H3 so
users of the API are only concerned with byte sizes as reported by
`sizeof`.  Removing the unused variable addresses #530.

Also a minor tweak to an hmac_md5 call that was casting away const from
one argument (which doesn't match the prototype).
2011-08-17 15:03:18 -05:00
Jon Siwek
495e987938 Remove $Id$ tags 2011-08-04 15:21:18 -05:00
Robin Sommer
03c0d587a4 Removing code for unused hash functions. 2011-04-01 16:09:28 -07:00
Jon Siwek
13569aaab7 Removal of the --enable-int64 config option.
This will now be always on.  As such, uses of the USE_INT64 preprocessor
definition have been cleaned out.
2010-11-17 20:38:33 -06:00
Robin Sommer
61757ac78b Initial import of svn+ssh:://svn.icir.org/bro/trunk/bro as of r7088 2010-09-27 20:42:30 -07:00