Only one instance of base_type() getting a NewRef instead of AdoptRef
fixed in merge. All other changes are superficial formatting and
factoring.
* 'leaks' of https://github.com/MaxKellermann/zeek: (22 commits)
Stmt: use class IntrusivePtr
Stmt: remove unused default constructors and `friend` declarations
Val: remove unimplemented prototype recover_val()
Val: cast_value_to_type() returns IntrusivePtr
Val: use IntrusivePtr in check_and_promote()
Val: use nullptr instead of 0
zeekygen: use class IntrusivePtr
ID: use class IntrusivePtr
Expr: use class IntrusivePtr
Var: copy Location to stack, to fix use-after-free crash bug
Scope: lookup_ID() and install_ID() return IntrusivePtr<ID>
Scope: delete duplicate locals
EventRegistry: automatically delete EventHandlers
main: destroy event_registry after iosource_mgr
zeekygen/IdentifierInfo: delete duplicate fields
main: free the global scope in terminate_bro()
Scope: pop_scope() returns IntrusivePtr<>
Scope: unref all inits in destructor
Var: pass IntrusivePtr to add_global(), add_local() etc.
plugin/ComponentManager: hold a reference to the EnumType
...
Those functions don't have a well-defined reference passing API, and
we had lots of memory leaks here. By using IntrusivePtr, reference
ownership is well-defined.
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
For example, circular references between a lambda function the frame
it's stored within and/or its closure could cause memory leaks.
This also fixes other various reference-count ownership issues that
could lead to memory errors.
There may still be some potential/undiscovered issues because the "outer
ID" finding logic doesn't look quite right as the AST traversal descends
within nested lambdas and considers their locals as "outer", but
possibly the other logic for locating values in closures or cloning
closures just works around that behavior.
This allows anonymous functions in Zeek to capture their closures.
they do so by creating a copy of their enclosing frame and joining
that with their own frame.
There is no way to specify what specific items to capture from the
closure like C++, nor is there a nonlocal keyword like Python.
Attemptying to declare a local variable that has already been caught
by the closure will error nicely. At the worst this is an inconvenience
for people who are using lambdas which use the same variable names
as their closures.
As a result of functions copying their enclosing frames there is no
way for a function with a closure to reach back up and modify the
state of the frame that it was created in. This lets functions that
generate functions work as expected. The function can reach back and
modify its copy of the frame that it is captured in though.
Implementation wise this is done by creating two new subclasses in
Zeek. The first is a LambdaExpression which can be thought of as a
function generator. It gathers all of the ingredients for a function
at parse time, and then when evaluated creats a new version of that
function with the frame it is being evaluated in as a closure. The
second subclass is a ClosureFrame. This acts for most intents and
purposes like a regular Frame, but it routes lookups of values to its
closure as needed.
The configuration framework consists of three mostly distinct parts:
* option variables
* the config reader
* the script level framework
I will describe the three elements in the following.
Internally, this commit also performs a range of changes to the Input
manager; it marks a lot of functions as const and introduces a new
ValueToVal method (which could in theory replace the already existing
one - it is a bit more powerful).
This also changes SerialTypes to have a subtype for Values, just as
Fields already have it; I think it was mostly an oversight that this was
not introduced from the beginning. This should not necessitate any code
changes for people already using SerialTypes.
option variable
===============
The option keyword allows variables to be specified as run-tine options.
Such variables cannot be changed using normal assignments. Instead, they
can be changed using Option::set. It is possible to "subscribe" to
options and be notified when an option value changes.
Change handlers can also change values before they are applied; this
gives them the opportunity to reject changes. Priorities can be
specified if there are several handlers for one option.
Example script:
option testbool: bool = T;
function option_changed(ID: string, new_value: bool): bool
{
print fmt("Value of %s changed from %s to %s", ID, testbool, new_value);
return new_value;
}
event bro_init()
{
print "Old value", testbool;
Option::set_change_handler("testbool", option_changed);
Option::set("testbool", F);
print "New value", testbool;
}
config reader
=============
The config reader provides a way to read configuration files back into
Bro. Most importantly it automatically converts values to the correct
types. This is important because it is at least inconvenient (and
sometimes near impossible) to perform the necessary type conversions in
Bro scripts themselves. This is especially true for sets/vectors.
Configuration generally look like this:
[option name][tab/spaces][new variable value]
so, for example:
testaddr 2607:f8b0:4005:801::200e
testinterval 60
testtime 1507321987
test_set a b c d erdbeerschnitzel
The reader uses the option name to look up the type that variable has in
the Bro core and automatically converts the value to the correct type.
Example script use:
type Idx: record {
option_name: string;
};
type Val: record {
option_val: string;
};
global currconfig: table[string] of string = table();
event InputConfig::new_value(name: string, source: string, id: string, value: any)
{
print id, value;
}
event bro_init()
{
Input::add_table([$reader=Input::READER_CONFIG, $source="../configfile", $name="configuration", $idx=Idx, $val=Val, $destination=currconfig, $want_record=F]);
}
Script-level config framework
=============================
The script-level framework ties these two features together and makes
them a bit more convenient to use. Configuration files can simply be
specified by placing them into Config::config_files. The framework also
creates a config.log that shows all value changes that took place.
Usage example:
redef Config::config_files += {configfile};
export {
option testbool : bool = F;
}
The file is now monitored for changes; when a change occurs the
respective option values are automatically updated and the value change
is written to config.log.
Doesn't generate any docs, but it's hooked in to all places needed to
gather the necessary stuff w/ significantly less coupling than before.
The gathering now always occurs unconditionally to make documentation
available at runtime and a command line switch (-X) only toggles whether
to output docs to disk (reST format).
Should also improve the treatment of type name aliasing which wasn't a
big problem in practice before, but I think it's more correct now:
there's now a distinct BroType for each alias, but extensible types
(record/enum) will automatically update the types for aliases on redef.
Other misc refactoring of note:
- Removed a redundant/unused way of declaring event types.
- Changed type serialization format/process to preserve type name
information and remove compatibility code (since broccoli will
have be updated anyway).