Commit graph

180 commits

Author SHA1 Message Date
Johanna Amann
5e7915ae7a Remove the siphash->hmac-md5 switch after 36 bytes.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
2020-04-24 13:14:29 -07:00
Johanna Amann
3937fff57f Replace siphash with Google implementation
This adds the entirety of the highwayhash implementation of Google.
This includes siphash as well as severl highwayhash variants - which
are faster.

This first commit only switches out the siphash implementation. All
hashes that are generated are exactly the same as before. However, this
does make all other hashes available to be used by us.

I did some performance tests vs the previous siphash implementation by
running the 2009-M57-day11-18 trace 100x through both cases. The average
runtime was virtually the same (within 0.014 seconds of each other).

Note that the way that I included the highwayhash implementation in our
cmake setup is... well, let's say hacky. This definitely needs to be
changed a bit before including this in a real build.
2020-04-23 16:05:03 -07:00
Jon Siwek
8f1b34b915 Add basic structure for fuzzing targets
General changes:

* Add -D/--deterministic command line option as
  convenience/alternative to -G/--load-seeds (i.e. no file needed, it just
  uses zero-initialized random seeds).  It also changes Broker data
  stores over to using deterministic timing rather than real time.

* Add option to make Reporter abort on runtime scripting errors
2020-04-23 12:51:25 -07:00
Tim Wojtulewicz
0a47588d0b The remaining nulls 2020-04-07 16:08:34 -07:00
Tim Wojtulewicz
01df20c782 Merge remote-tracking branch 'origin/topic/jsiwek/deprecated-attribute'
* origin/topic/jsiwek/deprecated-attribute:
  Switch to using [[deprecated]] attribute
2020-02-24 18:53:37 -07:00
Jon Siwek
0d5e395c6c Switch to using [[deprecated]] attribute 2020-02-22 11:58:38 -08:00
Jon Siwek
8fed26824b Replace va_list fmt() overload with vfmt()
Using an overload that takes a va_list argument potentially causes
accidental misuse on platforms (e.g. 32-bit) where va_list is
implemented as a type that may collide with commonly-used argument
types.

For example:

    char* c = copy_string("hi");
    fmt("%s", (const char*)c);
    fmt("%s", c);

The first fmt() call correctly goes through fmt(const char*, ...) first,
but the second mistakenly goes through fmt(const char*, va_list) first
because variadic function overloads have lower priority during overload
resolution and va_list on a 32-bit system happens to be defined as a
pointer type that can match with "char*" but not "const char*".
2020-02-14 21:40:36 -08:00
Max Kellermann
98241bbc60 util: pass string_view to without_bropath_component() 2020-02-07 10:56:14 +01:00
Jon Siwek
cd74d6f392 Change various functions to by-value std::string_view args 2020-01-31 15:08:48 -08:00
Max Kellermann
26da10ca05 util: add a tokenize_string() overload which returns string_views
Additionally, it uses a single "char" as delimiter, which is also
faster.

This patch speeds up Zeek startup by 10%.
2020-01-31 13:46:45 +01:00
Max Kellermann
0589f295fa util: pass std::string_view to normalize_path()
Reduce overhead in some callers.
2020-01-31 13:46:44 +01:00
Max Kellermann
f1566bda14 util: pass std::string_view to tokenize_string()
This saves some overhead because some callers pass a plain C string
here which needed to be copied to a temporary std::string.
2020-01-31 13:46:42 +01:00
Max Kellermann
e068ad8a53 util: don't modify the input string in tokenize_string()
This saves one full copy of the input string and avoids moving memory
around at O(n^2) in the erase() call in each loop iteration.
2020-01-31 13:42:30 +01:00
Jon Siwek
70b45d1aba Merge remote-tracking branch 'origin/topic/robin/631-deprecation-v2'
During merge I split the test for bro_init/bro_done/bro_script_loaded
event errors into individual tests since the other testing of the zeek
versions of those events seemed fine to otherwise keep.

* origin/topic/robin/631-deprecation-v2:
  Update NEWS for naming changes.
  Small cleanup and updating submodules.
  Remove test for legacy plugin.
  Remove legancy symlinks in aux/.
  Add warnings when loading scripts ending in ".bro", or using legacy environment variables.
  Fix missing rename.
  No longer symlink local.zeek to local.bro.
  Update notice user agent.
  Remove old_comm_usage_is_ok.
  Remove bro-config.h.in and bro-path-dev.in.
  Change Bro wrapper script to now abort when old executable names are still used.
  Remove APIs that were explicitly deprecated to be removed in 3.1.
2020-01-30 19:19:56 -08:00
Max Kellermann
32bb019e3a util, nb_dns: fix off-by-one bugs in strncpy() calls
Fortunately, these bugs had no effect because the following lines
overwrote the last character with a null byte.
2020-01-29 20:22:16 +01:00
Robin Sommer
649301b667 Add warnings when loading scripts ending in ".bro", or using legacy environment variables. 2020-01-29 12:08:10 +00:00
Jon Siwek
9c0d252c2b Merge branch 'master' into topic/jsiwek/supervisor 2020-01-21 12:17:56 -08:00
Tim Wojtulewicz
23f551876c Optimize json_escape_utf8 a bit by removing repeated calls to string methods 2020-01-14 15:43:25 -07:00
Jon Siwek
6046da9993 Merge branch 'master' into topic/jsiwek/supervisor 2020-01-07 16:57:58 -08:00
Tim Wojtulewicz
67fcc9b5af Mark safe_snprintf and safe_vsnprintf as deprecated, remove uses of them
safe_snprintf and safe_vsnprintf just exist to ensure that the resulting strings are always null-terminated. The documentation for snprintf/vsnprintf states that the output of those methods are always null-terminated, thus making the safe versions obsolete.
2020-01-02 15:36:39 -07:00
Jon Siwek
cc37e505e4 Merge remote-tracking branch 'origin/master' into topic/jsiwek/supervisor 2019-11-05 10:11:47 -08:00
Johanna Amann
e2a8dd4db1 Replace build_unique with make_unique
This was a rarely used convenience function from when we did not yet
have c++17 support.
2019-10-29 11:50:30 +01:00
Jon Siwek
29f386e388 Implement minimal supervised cluster configuration
More aspects of the cluster configuration to get fleshed out later,
but a basic cluster like one would use for a live deployment
can now be instantiated and run under supervision.  The new
clusterized-pcap-processing supervisor mode is also not done yet.
2019-10-23 17:37:53 -07:00
Jon Siwek
4959d438fa Initial structure for supervisor-mode
The full process hierarchy isn't set up yet, but these changes
help prepare by doing two things:

- Add a -j option to enable supervisor-mode.  Currently, just a single
  "stem" process gets forked early on to be used as the basis for
  further forking into real cluster nodes.

- Separates the parsing of command-line options from their consumption.
  i.e. need to parse whether we're in -j supervisor-mode before
  modifying any global state since that would taint the "stem" process.
  The new intermediate structure containing the parsed options may
  also serve as a way to pass configuration info from "stem" to its
  descendent cluster node processes.
2019-09-27 19:17:58 -07:00
Dominik Charousset
c1f3fe7829 Switch from header guards to pragma once 2019-09-17 14:10:30 +02:00
The Alchemist
a4e20bb58a
fix another minor typo 2019-08-29 16:10:26 -04:00
Tim Wojtulewicz
54752ef9a1 Deprecate the internal int/uint types in favor of the cstdint types they were based on 2019-08-12 13:50:07 -07:00
Tim Wojtulewicz
385de9b0e7 Add new method for escaping UTF8 strings for JSON output 2019-07-02 12:52:26 -07:00
Johanna Amann
c139ad07f4 Unbreak build on Linux.
Turns out os-x does not to include memory...
2019-06-24 15:51:04 -07:00
Johanna Amann
5f9a9bbcbe Merge branch 'paraglob' of https://github.com/ZekeMedley/zeek
* 'paraglob' of https://github.com/ZekeMedley/zeek:
  Add leak test to paraglob.
  Catch paraglob serialization errors in DoClone.
  Update paraglob serialization.
  Stop execution on paraglob error.
  Update paraglob submodule
  Change C++11 detection in paraglob.
  Make paraglob serializable and copyable.
  Initial paraglob integration.

I made a bunch of small changes:
 * paraglob now deals better with \0 characters
 * I rolled back the changes to Binary Serialization format,
 * there were some small formatting issue
 * the error output was slightly unsafe
 * build_unique is now in util.h.

and perhaps a few more small things.
2019-06-24 15:21:46 -07:00
Jon Siwek
d3927d9266 Rename BRO_DEPRECATED macro to ZEEK_DEPRECATED 2019-06-05 16:23:43 -07:00
Jon Siwek
7f0fb49612 Add an internal getenv wrapper function: zeekenv
It maps newer environment variable names starting with ZEEK to the
legacy names starting with BRO.
2019-05-23 20:42:42 -07:00
Daniel Thayer
1a74516db1 Rename all BRO-prefixed environment variables
For backward compatibility when reading values, we first check
the ZEEK-prefixed value, and if not set, then check the corresponding
BRO-prefixed value.
2019-05-22 00:12:31 -05:00
Robin Sommer
789cb376fd GH-239: Rename bro to zeek, bro-config to zeek-config, and bro-path-dev to zeek-path-dev.
This also installs symlinks from "zeek" and "bro-config" to a wrapper
script that prints a deprecation warning.

The btests pass, but this is still WIP. broctl renaming is still
missing.

#239
2019-05-01 21:43:45 +00:00
Daniel Thayer
18bd74454b Rename all scripts to have ".zeek" file extension 2019-04-11 21:12:40 -05:00
Daniel Thayer
7366155bad Update script search logic for new file extension
When searching for script files, look for both the new and old file
extensions.  If a file with ".zeek" can't be found, then search for
a file with ".bro" as a fallback.
2019-04-09 01:26:16 -05:00
Jon Siwek
e0b8b4b6b1 Replace some bro.org usages with zeek.org 2019-01-04 17:51:25 -06:00
Robin Sommer
fe7e1ee7f0 Merge topic/actor-system throug a squashed commit. 2018-05-18 22:39:23 +00:00
Johanna Amann
6d612ced3d Mark one-parameter constructors as explicit & use override where possible
This commit marks (hopefully) ever one-parameter constructor as explicit.

It also uses override in (hopefully) all circumstances where a virtual
method is overridden.

There are a very few other minor changes - most of them were necessary
to get everything to compile (like one additional constructor). In one
case I changed an implicit operation to an explicit string conversion -
I think the automatically chosen conversion was much more convoluted.

This took longer than I want to admit but not as long as I feared :)
2018-03-27 07:17:32 -07:00
Jon Siwek
054c4a67c4 Add BRO_DEPRECATED macro. 2017-12-12 11:34:49 -06:00
Johanna Amann
fc33bf2014 Make strerror_r portable.
This uses the same code that broker already uses to determine if we use
the XSI or gnu version of strerror_r. Patch by Thomas Petersen.
2017-09-18 14:50:30 -07:00
Johanna Amann
6b9abe85a7 Add error events to input framework.
This change introduces error events for Table and Event readers. Users
can now specify an event that is called when an info, warning, or error
is emitted by their input reader. This can, e.g., be used to raise
notices in case errors occur when reading an important input stream.

Example:

event error_event(desc: Input::TableDescription, msg: string, level: Reporter::Level)
	{
	...
	}

event bro_init()
	{
	Input::add_table([$source="a", $error_ev=error_event, ...]);
	}

For the moment, this converts all errors in the Asciiformatter into
warnings (to show that they are non-fatal) - the Reader itself also has
to throw an Error to show that a fatal error occurred and processing
will be abort.

It might be nicer to change this and require readers to mark fatal
errors as such when throwing them.

Addresses BIT-1181
2016-07-22 19:45:28 -07:00
Robin Sommer
4d84ee82da Merge remote-tracking branch 'origin/topic/johanna/bit-1612'
Addig a new random seed for external tests.

I added a wrapper around the siphash() function to make calling it a
little bit safer at least.

BIT-1612 #merged

* origin/topic/johanna/bit-1612:
  HLL: Fix missing typecast in test case.
  Remove the -K/-J options for setting keys.
  Add test checking the quality of HLL by adding a lot of elements.
  Fix serializing probabilistic hashers.
  Baseline updates after hash function change.
  Also switch BloomFilters from H3 to siphash.
  Change Hashing from H3 to Siphash.
  HLL: Remove unnecessary comparison.
  Hyperloglog: change calculation of Rho
2016-07-14 16:26:17 -07:00
Johanna Amann
499ed5b566 Remove the -K/-J options for setting keys.
The options were never really used and do not seem especially useful;
initialization with a seed file still works.

This also fixes a bug with the initialization of the siphash key.
2016-07-13 16:57:53 -07:00
Johanna Amann
e1218cc7fa Change Hashing from H3 to Siphash.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.

This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.

Furthermore, it once again fixes a problem with the Rank function in
HLL.
2016-07-13 06:44:51 -07:00
Seth Hall
d9d579c52c Merge remote-tracking branch 'origin/master' into topic/seth/stats-improvement 2016-05-02 14:34:29 -04:00
Johanna Amann
446a44787a Remove old string functions.
More specifically, this removes the functions:
strcasecmp_n
strchr_n
strrchr_n

and replaces the calls with the respective C-library calls that should
be part of just about all operating systems by now.
2016-03-04 12:02:19 -08:00
Seth Hall
6d836b7956 More stats improvements
Broke out the stats collection into a bunch of new Bifs
in stats.bif.  Scripts that use stats collection functions
have also been updated.  More work to do.
2016-01-07 16:20:24 -05:00
Robin Sommer
3957091e1b Renaming config.h to bro-config.h.
A couple times now I had this conflicting with files of the same name
in other projects.
2015-07-28 11:57:04 -07:00
Robin Sommer
0f96d06252 Making plugin names case-insensitive for some internal comparisions.
Makes the plugin system a bit more tolerant against spelling
inconsistencies that would be hard to catch otherwise.
2015-02-16 20:26:23 -08:00