Using exit() here may generally not work well since:
* That will result in calling global destructors
* We have global state that we potentially modify at run-time and
are in the middle of modiying at the time the FatalError occurs.
E.g. out-of-memory is one situation where it's likely we could
call the dtor of an object in which operation on it's internal
state is no longer consistent/safe.
For broker.log and cluster.log: there was a race condition. A worker's
first IOSource that it processes is potentially Broker if there were
no packets available yet and thread scheduling happens to work out
such that network connections (inside CAF threads) become established
before we enter the main I/O loop. Such peering establishments would
generate logs with timestamp 0 as there was not yet any code path
taken that would update network_time.
For reporter.log: any non-worker (packet-processing) node would just
unnecessarily use a timestamp of 0 for their reporter messages.
Otherwise, setting Reporter::errors_to_stderr=F causes important
error messages to be lost (and this setting is the default for
ZeekCtl). E.g. now that we terminate if there's errors during
zeek_init, GH-369 shows that the only error message given was
"fatal error: errors occurred while initializing", which is not
helpful in determining the actual issue.
This also installs symlinks from "zeek" and "bro-config" to a wrapper
script that prints a deprecation warning.
The btests pass, but this is still WIP. broctl renaming is still
missing.
#239
Added ConnectionEventFast() and QueueEventFast() methods to avoid
redundant event handler existence checks.
It's common practice for caller to already check for event handler
existence before doing all the work of constructing the arguments, so
it's desirable to not have to check for existence again.
E.g. going through ConnectionEvent() means 3 existence checks:
one you do yourself before calling it, one in ConnectionEvent(), and then
another in QueueEvent().
The existence check itself can be more than a few operations sometimes
as it needs to check a few flags that determine if it's enabled, has
a local body, or has any remote receivers in the old comm. system or
has been flagged as something to publish in the new comm. system.
Majority of PLists are now created as automatic/stack objects,
rather than on heap and initialized either with the known-capacity
reserved upfront or directly from an initializer_list (so there's no
wasted slack in the memory that gets allocated for lists containing
a fixed/known number of elements).
Added versions of the ConnectionEvent/QueueEvent methods that take
a val_list by value.
Added a move ctor/assign-operator to Plists to allow passing them
around without having to copy the underlying array of pointers.
This changes many weird names to move non-static content from the
weird name into the "addl" field to help ensure the total number of
weird names is reasonably bounded. Note the net_weird and flow_weird
events do not have an "addl" parameter, so information may no longer
be available in those cases -- to make it available again we'd need
to either (1) define new events that contain such a parameter, or
(2) change net_weird/flow_weird event signature (which is a breaking
change for user-code at the moment).
Also, the generic handling of binpac exceptions for analyzers which
to not otherwise catch and handle them has been changed from a Weird
to a ProtocolViolation.
Finally, a new "file_weird" event has been added for reporting
weirdness found during file analysis.
Scripting errors/mistakes now consistently generate a runtime error
which have the behavior of unwinding the call stack all the way out of
the current event handler.
Before, such errors were not treated consistently and either aborted
the process entirely or emitted a message while continuing to execute
subsequent statements without well-defined behavior (possibly causing
a cascade of errors).
The previous behavior also would only unwind out of the current
function (if within a function body), not out the current event
handler, which is especially problematic for functions that return
a value: the caller is essentially left a mess with no way to deal
with it.
This also changes the behavior of the startup/initialization process
to abort if there's errors during bro_init() rather than continue one
to the main run loop. The `allow_init_errors` option may change this
new, default behavior.
This change allows a weird sampling rate of 0, which completely suppresses
all notifications (previously this crashed Bro). If also fixes the sampling
threshold to work with sampling rates of 0.
The generation of weird events, by default, are now rate-limited
according to these tunable options:
- Weird::sampling_whitelist
- Weird::sampling_threshold
- Weird::sampling_rate
- Weird::sampling_duration
The new get_reporter_stats() BIF also allows one to query the
total number of weirds generated (pre-sampling) which the new
policy/misc/weird-stats.bro script uses periodically to populate
a weird_stats.log.
There's also new reporter BIFs to allow generating weirds from the
script-layer such that they go through the same, internal
rate-limiting/sampling mechanisms:
- Reporter::conn_weird
- Reporter::flow_weird
- Reporter::net_weird
Some of the code was adapted from previous work by Johanna Amann.
The hook being added is:
bool HookReporter(const std::string& prefix, const EventHandlerPtr event,
const Connection* conn, const val_list* addl, bool location,
const Location* location1, const Location* location2,
bool time, const std::string& buffer) override;
This hook gives access to basically all information that is available in
the function in Reporter.cc that performs the logging. The hook is
called each time when anything passes through the reporter in the cases
in which an event usually would be called. This includes weirds. The
hook can return false to prevent the normal reporter events from being
raised.
This change introduces error events for Table and Event readers. Users
can now specify an event that is called when an info, warning, or error
is emitted by their input reader. This can, e.g., be used to raise
notices in case errors occur when reading an important input stream.
Example:
event error_event(desc: Input::TableDescription, msg: string, level: Reporter::Level)
{
...
}
event bro_init()
{
Input::add_table([$source="a", $error_ev=error_event, ...]);
}
For the moment, this converts all errors in the Asciiformatter into
warnings (to show that they are non-fatal) - the Reader itself also has
to throw an Error to show that a fatal error occurred and processing
will be abort.
It might be nicer to change this and require readers to mark fatal
errors as such when throwing them.
Addresses BIT-1181
This issue is especially helpful in the case of the Val::CONVERTER error and having:
"fatal error in <no location>: Val::CONVERTER ..."
Nebulous error and sans location, it is extremely hard to figure out the culprit. Thus, if Bro is built DEBUG, fatal should provide a core.
This subtle change prevents having to change FatalErrors to FatalErrorWithCore everywhere.
Replaced some with InternalWarning or InternalAnalyzerError, the later
being a new method which signals the analyzer to not process further
input. Some usages I just removed if they didn't make sense or clearly
couldn't happen. Also did some minor refactors of related code while
reviewing/exploring ways to get rid of InternalError usages.
Also, for TCP content file write failures there's a new event:
"contents_file_write_failure".
Moved this functionality to be internal instead of in the script-layer
event handlers. The issue with the later is that bad things can happen
between the time a reporter event handler is dispatched and the time it
is executed, and if bro crashes in that time, the message may never be
seen/logged.
Addressed #930 (and revisits #836).
Internally, all BROv6 preprocessor switches were removed and
addr/subnet representations wrapped in the new IPAddr/IPPrefix classes.
Some script-layer changes of note:
- dns_AAAA_reply event signature changed: the string representation
of an IPv6 addr is easily derived from the addr value, it doesn't
need to be another parameter. This event also now generated directly
by the DNS analyzer instead of being "faked" into a dns_A_reply event.
- removed addr_to_count BIF. It used to return the host-order
count representation of IPv4 addresses only. To make it more
generic, we might later add a BIF to return a vector of counts
in order to support IPv6.
- changed the result of enclosing addr variables in vertical pipes
(e.g. |my_addr|) to return the bit-width of the address type which
is 128 for IPv6 and 32 for IPv4. It used to function the same
way as addr_to_count mentioned above.
- remove bro_has_ipv6 BIF
Currently, a lot of interpreter runtime errors, such as an access to
an unset optional record field, cause Bro to abort with an internal
error. This is an experimental branch that turns such errors into
non-fatal runtime errors by internally raising exceptions. These are
caught upstream and processing continues afterwards.
For now, not many errors actually raise exceptions (the example above
does though). We'll need to go through them eventually and adapt the
current Internal() calls (and potentially others). More generally, at
some point we should cleanup the interpreter error handling (unifying
errors reported at parse- and runtime; and switching to exceptions for
all Expr/Stmt/Vals). But that's a larger change and left for later.
The main question for now is if this code is already helpful enough to
go into 2.0. It will quite likely prevent a number of crashes due to
script errors.
Both related to Val lists constructed as arguments to events that were
not freed because the event function was never called (e.g. no handlers).
Addresses #574
Binpac exceptions caught in Analyzer.cc are passed to Reporter::Weird
and from there to Reporter::WeirdHelper. WeirdHelper has var args, to
support passing them on to DoLog, but there were no forced format
strings. Since the binpac exception can contain network data which can
contain %-characters, that caused segfaults.
When not reporting via events, the final contents of the message buffer
after formatting was being used as a format string to fprintf instead of
writing out the actual string.
* origin/topic/robin/reporting:
Syslog BiF now goes through the reporter as well.
Avoiding infinite loops when an error message handlers triggers errors itself.
Renaming the Logger to Reporter.
Overhauling the internal reporting of messages to the user.
Updating a bunch of tests/baselines as well.
Conflicts:
aux/broccoli
policy.old/alarm.bro
policy/all.bro
policy/bro.init
policy/frameworks/notice/weird.bro
policy/notice.bro
src/SSL-binpac.cc
src/bro.bif
src/main.cc