* 'intrusive_ptr' of https://github.com/MaxKellermann/zeek: (32 commits)
Scope: store IntrusivePtr in `local`
Scope: pass IntrusivePtr to AddInit()
DNS_Mgr: use class IntrusivePtr
Scope: use class IntrusivePtr
Attr: use class IntrusivePtr
Expr: check_and_promote_expr() returns IntrusivePtr
Frame: use class IntrusivePtr
Val: RecordVal::LookupWithDefault() returns IntrusivePtr
Type: RecordType::FieldDefault() returns IntrusivePtr
Val: TableVal::Delete() returns IntrusivePtr
Type: base_type() returns IntrusivePtr
Type: init_type() returns IntrusivePtr
Type: merge_types() returns IntrusivePtr
Type: use class IntrusivePtr in VectorType
Type: use class IntrusivePtr in EnumType
Type: use class IntrusivePtr in FileType
Type: use class IntrusivePtr in TypeDecl
Type: make TypeDecl `final` and the dtor non-`virtual`
Type: use class IntrusivePtr in TypeType
Type: use class IntrusivePtr in FuncType
...
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
Added ConnectionEventFast() and QueueEventFast() methods to avoid
redundant event handler existence checks.
It's common practice for caller to already check for event handler
existence before doing all the work of constructing the arguments, so
it's desirable to not have to check for existence again.
E.g. going through ConnectionEvent() means 3 existence checks:
one you do yourself before calling it, one in ConnectionEvent(), and then
another in QueueEvent().
The existence check itself can be more than a few operations sometimes
as it needs to check a few flags that determine if it's enabled, has
a local body, or has any remote receivers in the old comm. system or
has been flagged as something to publish in the new comm. system.
Majority of PLists are now created as automatic/stack objects,
rather than on heap and initialized either with the known-capacity
reserved upfront or directly from an initializer_list (so there's no
wasted slack in the memory that gets allocated for lists containing
a fixed/known number of elements).
Added versions of the ConnectionEvent/QueueEvent methods that take
a val_list by value.
Added a move ctor/assign-operator to Plists to allow passing them
around without having to copy the underlying array of pointers.
The two hooks being added are:
void HookLogInit(const std::string& writer, const std::string& instantiating_filter, bool local, bool remote, const logging::WriterBackend::WriterInfo& info, int num_fields, const threading::Field* const* fields);
which is called when a writer is being instantiated and contains
information about the fields being logged, as well as
bool HookLogWrite(const std::string& writer, const std::string& filter, const logging::WriterBackend::WriterInfo& info, int num_fields, const threading::Field* const* fields, threading::Value** vals);
which is called for each log line being written by each writer. It
contains all the data being written. The data can be changed in the
function call and lines can be prevented from being written.
This commit also fixes a few small problems with plugin hooks itself,
and extends the tests that were already there, besides introducing tests
for the added functionality.
Broker had changed the semantics of remote logging: it sent over the
original Bro record containing the values to be logged, which on the
receiving side would then pass through the logging framework normally,
including triggering filters and events. The old communication system
however special-cases logs: it sends already processed log entries,
just as they go into the log files, and without any receiver-side
filtering etc. This more efficient as it short-cuts the processing
path, and also avoids the more expensive Val serialization. It also
lets the sender determine the specifics of what gets logged (and how).
This commit changes Broker over to now use the same semantics as the
old communication system.
TODOs:
- The new Broker code doesn't have consistent #ifdefs yet.
- Right now, when a new log receiver connects, all existing logs
are broadcasted out again to all current clients. That doesn't so
any harm, but is unncessary. Need to add a way to send the
existing logs to just the new client.
* origin/topic/seth/log-framework-ext:
Log extensions: series of small fixes and new tests.
Change the function for log extension to take a path only and update tests.
Final changes to log framework ext code.
Add logging framework metadata mechanism.
Add unrolling separator & field name map to logging framework.
The extensions now work with optional types, as well with complex types
(like subrecords). Not returning a record in the ext_func no longer
crashes bro.
The default_ext_func was switched to return void in
cases where no extension revord is defined (was bool).
I also got rid of the offsets in the indices - with the rest of the
implementation, that was not really necessary and made the code more
complex.
The "metadata" functionality has been renamed to "ext" to
represent that the logs are being extended. The function that
returns the record which is used to extend the log now receives
a log filter as it's single argument.
The field name "unrolling" is now renamed to "scope" so the variables
names now look like this: "Log::default_scope_sep"
- When a log record is being "unrolled" (sub-records flattened
out into a single record), it's now possible to choose the
character/string to separate the outer name from the inner
name. This can be used to work around the problems
with ElasticSearch 2.0 not supporting dots "." in field names.
This value can be provided per-filter as well as a global
default value.
- Log fields can be renamed by providing a table per-filter
(or a global default) to rename fields for any log writer.
The name translation is performed after unrolling so the
value in the field name table must match whatever is being
used to separate field names.
For example if the unrolling separator was set to "*":
redef Log::default_unrolling_sep = "*";
The field name map would need to reflect it:
redef Log::default_field_name_map = {
["id*orig_h"] = "src",
["id*orig_p"] = "src_port",
["id*resp_h"] = "dst",
["id*resp_p"] = "dst_port",
};
It now works a bit differently than before: whether to send a remote log
write is now a property of the logging stream, not the logging filter
and it's now up the the receiver side filters to instantiate the desired
writer. i.e. the sender now has no say in what the receiver should use
as the log writer backend.
Under the new style of remote logging, the "Log::enable_remote_logging"
option is repurposed to set the default behavior for new logging
streams. There's also "Comm::{enable,disable}_remote_logging()" to
explicitly set the desired behavior for a given logging stream. To
receive remote logs, one calls "Comm::subscribe_to_logs(<topic>)", where
senders implicitly use topics of the form "bro/log/<stream id>".
It doesn't do anything else than simply forwarding to FlushBuffers().
This is just for consistency in terminate_bro() where components get
their Terminate() called so that the main code doesn't need to know
anything more specific about what particular action to take at
shutdown.
* origin/topic/bernhard/thread-cleanup:
and just to be really sure - always make threads go through OnWaitForStop
hopefully finally fix last interesting race-condition
it is apparently getting a bit late for changes at important code...
spoke to soon (forgot to comment in line again).
Change thread shutdown again to also work with input framework.
Changing semantics of thread stop methods.
Support for cleaning up threads that have terminated.
Seems to work, tests pass, but not really verified.
Major change 1:
finished flag in MsgThread was replaced by 2 flags:
child_finished and main_finished.
child_finished is set by child_thread and means that the processing
loop is stopped immediately (no longer needed, no new input messages
will be processed, if loop continues running there is an ugly delay
on shutdown). (This took me a while to realize...)
main_finished is set by a message that is sent back by the child
to the main thread when Finished() is called (and child_finished
is set). when main_finished is set, processing of output messages
stops. But all messages that the child thread pushed in the queue
before calling Finish() are still processed.
Change 2:
Logging terminate call was replaced by a smaller call that just
flushes out the cache held by the main thread. This call
has to be done before thread shutdown is called - otherwhise
the threads will be shut down before all messages are pushed
on them. (This also took me a while to realize...).
Change 3:
Input framework actually calls it stop methods correctly (everything
was prepared, function call was missing)
Once a BasicThread leaves its run() method, a thread is now marked for
cleaning up, and the ThreadMgr will soon join it to release the OS
resources.
Also, adding a function Log::remove_stream() that remove a logging
stream, stopping all writer threads that are associated with it.
Note, however, that removing a *filter* from a stream still doesn't
clean up any threads. The problem is that because of the output paths
potentially being created dynamically it's unclear if the writer
thread will still be needed in the future. We could add clean writers
up with timeouts, but that doesn't sound great either. So for now, the
only way to sure clean up logging threads is to remove the entire
stream.
Also note that cleanup doesn't work with input threads yet, which
don't seem to terminate (at least in the case I tried).
- 'when' statements were problematic when used in a function/event/hook
that had local variables with an assigned function value. This was
because 'when' blocks operate on a clone of the frame and the cloning
process serializes locals and the serialization of functions had an
infinite cycle in it (ID -> BroFunc -> ID -> BroFunc ...). The ID
was only used for the function name and type information, so
refactoring Func and subclasses to depend on those two things instead
fixes the issue.
- 'return when' blocks, specifically, didn't work whenever execution
of the containing function's body does another function call before
reaching the 'return when' block, because of an assertion. This was
was due to logic in CallExpr::Eval always clearing the CallExpr
associated with the Frame after doing the call, instead of restoring
any previous CallExpr, which the code in Trigger::Eval expected to
have available.
- An assert could be reached when the condition of a 'when' statement
depended on checking the value of global state variables. The assert
in Trigger::QueueTrigger that checks that the Trigger isn't disabled
would get hit because Trigger::Eval/Timeout disable themselves after
running, but don't unregister themselves from the NotifierRegistry,
which keeps calling QueueTrigger for every state access of the global.