The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
Closes BIT-1900.
* origin/topic/johanna/config:
Use port_mgr->Get() in the input framework config changes.
Allow the empty field separator to be empty; use in config framework.
Fix small bug in config reader.
Fix segmentation fault when parsing sets containing invalid elements.
Add config framework.
The configuration framework consists of three mostly distinct parts:
* option variables
* the config reader
* the script level framework
I will describe the three elements in the following.
Internally, this commit also performs a range of changes to the Input
manager; it marks a lot of functions as const and introduces a new
ValueToVal method (which could in theory replace the already existing
one - it is a bit more powerful).
This also changes SerialTypes to have a subtype for Values, just as
Fields already have it; I think it was mostly an oversight that this was
not introduced from the beginning. This should not necessitate any code
changes for people already using SerialTypes.
option variable
===============
The option keyword allows variables to be specified as run-tine options.
Such variables cannot be changed using normal assignments. Instead, they
can be changed using Option::set. It is possible to "subscribe" to
options and be notified when an option value changes.
Change handlers can also change values before they are applied; this
gives them the opportunity to reject changes. Priorities can be
specified if there are several handlers for one option.
Example script:
option testbool: bool = T;
function option_changed(ID: string, new_value: bool): bool
{
print fmt("Value of %s changed from %s to %s", ID, testbool, new_value);
return new_value;
}
event bro_init()
{
print "Old value", testbool;
Option::set_change_handler("testbool", option_changed);
Option::set("testbool", F);
print "New value", testbool;
}
config reader
=============
The config reader provides a way to read configuration files back into
Bro. Most importantly it automatically converts values to the correct
types. This is important because it is at least inconvenient (and
sometimes near impossible) to perform the necessary type conversions in
Bro scripts themselves. This is especially true for sets/vectors.
Configuration generally look like this:
[option name][tab/spaces][new variable value]
so, for example:
testaddr 2607:f8b0:4005:801::200e
testinterval 60
testtime 1507321987
test_set a b c d erdbeerschnitzel
The reader uses the option name to look up the type that variable has in
the Bro core and automatically converts the value to the correct type.
Example script use:
type Idx: record {
option_name: string;
};
type Val: record {
option_val: string;
};
global currconfig: table[string] of string = table();
event InputConfig::new_value(name: string, source: string, id: string, value: any)
{
print id, value;
}
event bro_init()
{
Input::add_table([$reader=Input::READER_CONFIG, $source="../configfile", $name="configuration", $idx=Idx, $val=Val, $destination=currconfig, $want_record=F]);
}
Script-level config framework
=============================
The script-level framework ties these two features together and makes
them a bit more convenient to use. Configuration files can simply be
specified by placing them into Config::config_files. The framework also
creates a config.log that shows all value changes that took place.
Usage example:
redef Config::config_files += {configfile};
export {
option testbool : bool = F;
}
The file is now monitored for changes; when a change occurs the
respective option values are automatically updated and the value change
is written to config.log.
This change introduces error events for Table and Event readers. Users
can now specify an event that is called when an info, warning, or error
is emitted by their input reader. This can, e.g., be used to raise
notices in case errors occur when reading an important input stream.
Example:
event error_event(desc: Input::TableDescription, msg: string, level: Reporter::Level)
{
...
}
event bro_init()
{
Input::add_table([$source="a", $error_ev=error_event, ...]);
}
For the moment, this converts all errors in the Asciiformatter into
warnings (to show that they are non-fatal) - the Reader itself also has
to throw an Error to show that a fatal error occurred and processing
will be abort.
It might be nicer to change this and require readers to mark fatal
errors as such when throwing them.
Addresses BIT-1181
into BroVals (especially enums)
Not we do not force an internal error anymore. Instead, we raise an
normal error and set an error flag that signals to the top-level
functions that the value could not be converted and should not be
propagated to the Bro core. This sadly makes the already messy code even
more messy - but since errors can happen in deeply nested data
structures, the alternative (catching the error at every possible
location and then trying to clean up there instead of recursively
deleting the data that cannot be used later) is much worse.
Addresses BIT-1199
Closes#1021.
* origin/topic/bernhard/input-update:
this event handler fails the unused-event-handlers test because it is a bit of a special case.
...and fix the event ordering issue. Dispatch != QueueEvent
add Terminate to input framework to prevent potential shutdown race-conditions.
fix warning.
fix stderr test. ls behaves differently on errors on linux...
small fixes.
linux does not have strnstr
and close only fds that are currently open (the logging framework really did not like that :) )
A bunch of more changes for the raw reader
make reading from stdout and stderr simultaneously work.
allow sending data to stdin of child process
Streaming reads from external commands work without blocking anything.
replace popen with fork and exec.
change raw reader to use basic c io instead of fdstream encapsulation class.
more cases.
It will now not only fire after table-reads have been completed,
but also after the last event of a whole-file-read (or whole-db-read, etc.).
The interface also has been extended a bit to allow readers to
directly fire the event should they so choose. This allows the
event to be fired in direct table-setting/event-sending modes,
which was previously not possible.
Also going through the internal_handler() function will set the
event as "used" (i.e. it's marked as being raised somewhere) and
fixes the core.check-unused-event-handlers test failure
(addresses #823).
* topic/robin/input-threads-merge: (130 commits)
And now it even compiles after my earlier changes.
A set of input framework refactoring, cleanup, and polishing.
another small memory leak in ascii reader:
and another small memory leak when using streaming reads.
fix another memory lead (when updating tables).
Input framework merge in progress.
filters have been called streams for eternity. And I always was too lazy to change it everywhere...
reactivate network_time check in threading manager. previously this line made all input framework tests fail - it works now. Some of the other recent changes of the threading manager must have fixed that problem.
fix up the executeraw test - now it works for the first time and does not always fail
baselines for the autostart removal.
remove last remnants of autostart, which has been removed for quite a while.
make input framework source (hopefully) adhere to the usual indentation style. No functional changes.
fix two memory leaks which occured when one used filters.
update description to current interface.
rename a couple of structures and make the names in manager fit the api more.
fix memory leak in tables and vectors that are read into tables
fix missing get call for heart beat in benchmark reader.
fix heart_beat_interval -- initialization in constructor does not work anymore (probably due to change in init ordering?)
fix memory leak for tables... nearly completely.
fix a couple more leaks. But - still leaking quite a lot with tables.
...
This should it make easier for other people to understand what is going on without having knowledge of an "internal api * means * in external api" mapping.
compiles, not really tested.
basic test works 70% of the time, coredumps in the other 30 - but was not easy to debug on a first glance (most interestingly the crash happens in the logging framework - I wonder how that works).
Other tests are not adjusted to the new interface yet.
But: there are still a few places where I am sure that there are race conditions & memory leaks & I do not really like the current interface & I have to add a few more messages between the front and backend.
But - it works :)
most stuff is inplace, logging framework needs a few changes merged before continuing here...
Conflicts:
src/CMakeLists.txt
src/LogMgr.h
src/logging/Manager.cc
src/main.cc