Commit graph

237 commits

Author SHA1 Message Date
Tim Wojtulewicz
fe0c22c789 Base: Clean up explicit uses of namespaces in places where they're not necessary.
This commit covers all of the common and base classes.
2020-08-24 12:07:00 -07:00
Tim Wojtulewicz
54215ab9cd Rename methods in RunState to remove 'net' from their names 2020-08-20 16:11:47 -07:00
Tim Wojtulewicz
0ac3fafe13 Move zeek::net namespace to zeek::run_state namespace.
This also moves all of the code from Net.{h,cc} to RunState.{h,cc} and marks Net.h as deprecated
2020-08-20 16:11:47 -07:00
Tim Wojtulewicz
db36688bf0 Move a few smaller files to zeek namespaces 2020-08-20 16:11:46 -07:00
Tim Wojtulewicz
ddf48d7529 Move a few of the zeek::util methods and variables to zeek::util::detail 2020-08-20 16:11:44 -07:00
Tim Wojtulewicz
8d2d867a65 Move everything in util.h to zeek::util namespace.
This commit includes renaming a number of methods prefixed with bro_ to be prefixed with zeek_.
2020-08-20 16:00:33 -07:00
Tim Wojtulewicz
8862b585fa Deprecate ptr_compat_uint and ptr_compat_int in util.h 2020-08-20 15:55:17 -07:00
Tim Wojtulewicz
e7c6d51ae7 Move the functions and variables in Net.h to the zeek::net namespace. This includes moving network_time out of util.h. 2020-08-20 15:55:17 -07:00
Tim Wojtulewicz
be92bd536f Move iosource code to zeek namespaces 2020-08-20 15:55:17 -07:00
Tim Wojtulewicz
f1ed66d52c Fix some printf warnings with size_t values 2020-08-11 13:42:03 -07:00
Tim Wojtulewicz
4e9a5e9d98 Move ODesc to zeek namespace 2020-07-31 16:25:54 -04:00
Tim Wojtulewicz
a2a435360a Move all of the hashing classes/functions to zeek::detail namespace 2020-07-31 16:23:34 -04:00
Tim Wojtulewicz
bfab224d7c Move Reporter to zeek namespace 2020-07-31 16:22:41 -04:00
Jon Siwek
b17627fa09 Deprecate bro_srandom(), replace with zeek::seed_random().
Avoiding zeek::srandom() to avoid potential for confusion with srandom()
2020-07-22 14:01:33 -07:00
Jon Siwek
d486af06b1 Add zeek::max_random() & fix misuse of RAND_MAX w/ zeek::random_number()
In deterministic mode, RAND_MAX is not related to the result of
zeek::random_number() (formerly bro_random()), but some logic was
using RAND_MAX as indication of the possible range of values.  The
new zeek::max_random() will give the correct upper-bound regardless
of whether deterministic-mode is used.
2020-07-22 14:01:33 -07:00
Jon Siwek
bde38893ce Deprecate bro_random(), replace with zeek::random_number()
Avoiding the use of zeek::random() due to potential for confusion
with random().
2020-07-22 14:01:33 -07:00
Jon Siwek
6bbb0a6b48 Deprecate bro_prng(), replace with zeek::prng()
The type used for storing the state of the RNG is changed from
`unsigned int` to `long int` since the former has a minimal range
of [0, 65,535] while the RNG function itself has a range of
[1, 2147483646].  A `long int` must be capable of
[−2147483647, +2147483647] and is also the return type of `random()`,
which is what zeek::prng() aims to roughly parity.
2020-07-22 14:01:33 -07:00
Jon Siwek
887b53b7f3 GH-1076: Fix bro_srandom() to replace 0 seeds with 1
The bro_prng() implementation cannot generate 0 as a result since it
causes every subsequent number from the PRNG to also be 0, so use the
number 1 instead of 0.
2020-07-22 14:01:33 -07:00
Jon Siwek
0f4eb9af02 GH-1076: Fix bro_prng() implementation
The intermediate result of the PRNG used unsigned storage, preventing
the ( result < 0 ) branch from ever being evaluated.  This could cause
return values to exceed the modulus as well as RAND_MAX.

One interesting effect of this is potential for the rand() BIF to
return values outside the requested maximum limit.

Another interesting effect of this is that a PacketFilter may start
randomly dropping packets even if it was not configured for
random-packet-drops.
2020-07-22 14:01:33 -07:00
Jon Siwek
dba764386b GH-1076: Fix use of getrandom()
The availability and use of getrandom() actually caused unrandom and
deterministic results in terms of Zeek's random number generation.
2020-07-22 14:01:33 -07:00
Jon Siwek
d60f16c229 Fix race condition in ensure_dir()
If something else created the dir between the stat() and mkdir(),
it previously reported that as a failure.
2020-07-16 12:32:10 -07:00
Jon Siwek
dfc34563b5 Fix tokenize_string() to work with delimiters of length > 1 2020-07-16 11:51:40 -07:00
Tim Wojtulewicz
64332ca22c Move all Val classes to the zeek namespaces 2020-06-30 20:48:09 -07:00
Jon Siwek
5b4313b593 Deprecate Val(double, TypeTag) ctor, add TimeVal/DoubleVal subclasses
This also updates all usages of the deprecated Val ctor to use
either IntervalVal, TimeVal, or DoubleVal ctors.  The reason for
doing away with the old constructor is that using it with TYPE_INTERVAL
isn't strictly correct since there exists a more specific subclass,
IntervalVal, with overriden ValDescribe() method that ought to be used
to print such values in a more descriptive way.
2020-06-02 23:33:40 -07:00
Jon Siwek
1a60fb7c0d Add missing include in util.cc 2020-06-01 20:45:37 -07:00
Jon Siwek
34a1875e74 Merge remote-tracking branch 'origin/topic/timw/reduce-func-inclusion'
- Minor tweaks to some odd includes during merge

* origin/topic/timw/reduce-func-inclusion:
  Remove Analyzer.h from bro-bif.h
  Remove IPAddr.h from Reporter.h
  Remove the inclusion of Func.h from NetVar.h, which reduces the inclusion of Func.h overall.
2020-06-01 19:26:55 -07:00
Tim Wojtulewicz
ea3c679101 Remove the inclusion of Func.h from NetVar.h, which reduces the inclusion of Func.h overall. 2020-06-01 15:00:35 -07:00
Tim Wojtulewicz
503ef26a17 Merge remote-tracking branch 'origin/topic/jsiwek/gh-893-intrusive-ptr-migration'
* origin/topic/jsiwek/gh-893-intrusive-ptr-migration: (151 commits)
  Integrate review feedback
  Switch Broker Val converter visitor to return IntrusivePtr
  Change BroFunc ctor to take const-ref IntrusivePtr<ID>
  Add version of Frame::SetElement() taking IntrusivePtr<ID>
  Change Scope/Func inits from id_list* to vector<IntrusivePtr<ID>>
  Change Scope::GenerateTemporary() to return IntrusivePtr
  Deprecate Scope::ReturnType(), replace with GetReturnType()
  Deprecate Scope::ScopeID(), replace with GetID()
  Switch parsing to use vector<IntrusivePtr<Attr>> from attr_list
  Deprecate TableVal::FindAttr(), replace with GetAttr()
  Deprecate TypeDecl::FindAttr(), replace with GetAttr()
  Deprecate ID::FindAttr(), replace with GetAttr()
  Deprecate Attributes::FindAttr(), replace with Find()
  Deprecate Attributes::AddAttrs(Attributes*)
  Add Attributes ctor that takes IntrusivePtrs
  Change Attributes to store std:vector<IntrusivePtr<Attr>>
  Change Attr::SetAttrExpr() to non-template
  Deprecate Attr::AttrExpr(), replace with GetExpr()
  Deprecate ID::Attrs(), replace with GetAttrs()
  Remove weak_ref param from ID::SetVal()
  ...
2020-06-01 10:58:02 -07:00
Jon Siwek
1c08be1c0f Fix crash on using some deprecated environment variables
If the global Reporter hasn't been created before trying to use a
deprecated environment variable, emit the warning to stderr directly
instead of via Reporter.

Fixes GH-989
2020-05-28 15:24:25 -07:00
Jon Siwek
f3d160d034 Deprecate RecordVal::Assign(int, Val*)
And adapt all usages to the existing overload taking IntrusivePtr.
2020-05-19 15:44:15 -07:00
Johanna Amann
892023ed9a Merge remote-tracking branch 'origin/master' into topic/johanna/hash-unification
* origin/master:
  Use zeek::detail namespace for fuzzer utils
  Set terminating flag during fuzzer cleanup
  Add missing include to standalone fuzzer driver
  Improve standalone fuzzer driver error messages
  Test fuzzers against seed corpus under CI ASan build
  Update fuzzing README with OSS-Fuzz integration notes
  Link fuzzers against shared library to reduce executable sizes
  Improve FuzzBuffer chunking
  Fix compiler warning in standalone fuzzer driver
  Adjust minor fuzzing documentation
  Exit immediately after running unit tests
  Add OSS-Fuzz Zeek script search path to fuzzers
  Assume libFuzzer when LIB_FUZZING_ENGINE file doesn't exist
  Change handling of LIB_FUZZING_ENGINE
  Change --enable-fuzzing to --enable-fuzzers
  Add standalone driver for fuzz targets
  Add basic structure for fuzzing targets
2020-05-13 14:19:44 +00:00
Johanna Amann
ce8b121e12 Hash unification: address PR feedback 2020-05-13 14:07:59 +00:00
Johanna Amann
360c06a3f8 Start refactoring hashing.
This commit moves some of the hash datastructures and code from
util.cc into Hash.cc - where it seems more appropriate.

It also starts to make more Keyed hash functions available - still
using siphash as the default 64 bit keyed hash, but also making
128 and 256 bit highway hashes available.

There already are a few other functions that are defined but not
yet implemented - these will be "static" keyed hashes - which use
an installation specific key. These will be used to, e.g., get
rid of md5 hashing for the generation of file UIDs.
2020-04-24 18:27:09 -07:00
Jon Siwek
98845e89aa Add OSS-Fuzz Zeek script search path to fuzzers 2020-04-24 17:53:52 -07:00
Johanna Amann
5e7915ae7a Remove the siphash->hmac-md5 switch after 36 bytes.
Currently, siphash is used for strings up to 36 bytes. hmac-md5 is used
for longer strings.

This switch-over is a remnant of the previous hash-function that was
used, which apparently was slower with longer input strings.

This change serves no purpose anymore. I performed a few performance tests
on strings of varying sizes:

For a 40 byte string with 10 million iterations:

siphash: 0.31 seconds
hmac-md5: 3.8 seconds

For a 1080 byte string with 10 million iterations:

siphash: 4.2 seconds
hmac-md5: 17 seconds

For a 18360 byte string with 10 million iterations:

siphash: 69 seconds
hmac-md5: 240 seconds

Hence, this commit removes the use of hmac-md5.

This change causes reordering of lines in a few logs.

This commit also changes the datastructure for the seed in probabilistic/Hasher
to get rid of a type-punning warning.
2020-04-24 13:14:29 -07:00
Johanna Amann
3937fff57f Replace siphash with Google implementation
This adds the entirety of the highwayhash implementation of Google.
This includes siphash as well as severl highwayhash variants - which
are faster.

This first commit only switches out the siphash implementation. All
hashes that are generated are exactly the same as before. However, this
does make all other hashes available to be used by us.

I did some performance tests vs the previous siphash implementation by
running the 2009-M57-day11-18 trace 100x through both cases. The average
runtime was virtually the same (within 0.014 seconds of each other).

Note that the way that I included the highwayhash implementation in our
cmake setup is... well, let's say hacky. This definitely needs to be
changed a bit before including this in a real build.
2020-04-23 16:05:03 -07:00
Jon Siwek
8f1b34b915 Add basic structure for fuzzing targets
General changes:

* Add -D/--deterministic command line option as
  convenience/alternative to -G/--load-seeds (i.e. no file needed, it just
  uses zero-initialized random seeds).  It also changes Broker data
  stores over to using deterministic timing rather than real time.

* Add option to make Reporter abort on runtime scripting errors
2020-04-23 12:51:25 -07:00
Jon Siwek
15a19414ca GH-895: Remove use of Variable-Length-Arrays 2020-04-15 16:25:21 -07:00
Johanna Amann
876c803d75 Merge remote-tracking branch 'origin/topic/timw/776-using-statements'
* origin/topic/timw/776-using-statements:
  Remove 'using namespace std' from SerialTypes.h
  Remove other using statements from headers
  GH-776: Remove using statements added by PR 770

Includes small fixes in files that changed since the merge request was
made.

Also includes a few small indentation fixes.
2020-04-09 13:31:07 -07:00
Tim Wojtulewicz
0a47588d0b The remaining nulls 2020-04-07 16:08:34 -07:00
Tim Wojtulewicz
d53c1454c0 Remove 'using namespace std' from SerialTypes.h
This unfortunately cuases a ton of flow-down changes because a lot of other
code was depending on that definition existing. This has a fairly large chance
to break builds of external plugins, considering how many internal ones it broke.
2020-04-07 15:59:59 -07:00
Jon Siwek
8fed26824b Replace va_list fmt() overload with vfmt()
Using an overload that takes a va_list argument potentially causes
accidental misuse on platforms (e.g. 32-bit) where va_list is
implemented as a type that may collide with commonly-used argument
types.

For example:

    char* c = copy_string("hi");
    fmt("%s", (const char*)c);
    fmt("%s", c);

The first fmt() call correctly goes through fmt(const char*, ...) first,
but the second mistakenly goes through fmt(const char*, va_list) first
because variadic function overloads have lower priority during overload
resolution and va_list on a 32-bit system happens to be defined as a
pointer type that can match with "char*" but not "const char*".
2020-02-14 21:40:36 -08:00
Max Kellermann
4aac78cf29 Val: forward-declare class PDict, reduce includes 2020-02-12 10:10:26 +01:00
Max Kellermann
95e646fca7 util: optimize the normal_path() common case
Speeds up Zeek startup by 2%.
2020-02-07 10:56:14 +01:00
Max Kellermann
98241bbc60 util: pass string_view to without_bropath_component() 2020-02-07 10:56:14 +01:00
Max Kellermann
0db61f3094 include cleanup
The Zeek code base has very inconsistent #includes.  Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed.  Another side effect was a lot of header
bloat which slows down the build.

First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.

After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations.  In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.

This patch speeds up the build by 19%, because each compilation unit
gets smaller.  Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):

Before this patch:

 3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
 760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps

After this patch:

 2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
 72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
2020-02-04 20:51:02 +01:00
Jon Siwek
cd74d6f392 Change various functions to by-value std::string_view args 2020-01-31 15:08:48 -08:00
Jon Siwek
fa5b3bb91e Merge branch 'no_sscanf' of https://github.com/MaxKellermann/zeek
* 'no_sscanf' of https://github.com/MaxKellermann/zeek:
  util: optimize expand_escape() by avoiding sscanf()
2020-01-31 14:19:12 -08:00
Jon Siwek
d39bb42b14 Merge branch 'optimize_normalize_path' of https://github.com/MaxKellermann/zeek
- Minor changes in merge: extended unit test, prefer emplace_back(),
  remove unused "found" count in new function

* 'optimize_normalize_path' of https://github.com/MaxKellermann/zeek:
  util: add a tokenize_string() overload which returns string_views
  util: store std::string_view in "final_components" vector
  util: use "auto" in normalize_path()
  util: reserve space in normalize_path()
  util: skip "." completely in normalize_path()
  util: pass std::string_view to normalize_path()
  util: pass std::string_view to tokenize_string()
  util: don't modify the input string in tokenize_string()
2020-01-31 13:23:39 -08:00
Tim Wojtulewicz
a2a2ff325f Have terminate_processing() raise SIGTERM instead of calling the signal handler directly 2020-01-31 10:13:09 -07:00