* origin/topic/timw/266-namespaces-part4: (34 commits)
Add missing namespace to usage of get_exe_path in fuzzer
Rename methods in RunState to remove 'net' from their names
Move zeek::net namespace to zeek::run_state namespace.
Move ScannedFile class and associated globals into ScannedFile.h and out of Net.h and scan.l
Rename types in ZeekList.h to be consistent with the style guide
Move NetVar from zeek to zeek::detail namespace
Remove PRI_PTR_COMPAT macros
Fix indentation of namespaced aliases
Move zeek-setup code into namespaces
Move ZeekList types to zeek namespace
Move __RegisterBif from zeek::detail::plugin to zeek::plugin::detail
Remove unimplemented zeek_magic_path/bro_magic_path method
Move all plugin classes into zeek::plugin::detail namespaces
Rename BroList.h to ZeekList.h
Move a few smaller files to zeek namespaces
Tag the end of some namespaces for consistency
Move a few of the zeek::util methods and variables to zeek::util::detail
Move zeekygen code to zeek::zeekygen::detail namespace
Mark zeek::util::pad_size as constexpr, which provides a small performance improvement
Move everything in util.h to zeek::util namespace.
...
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
* origin/topic/jsiwek/reassembly-improvements-map:
Rename a reassembly DataBlockList function
Add comments to reassembly classes
Use DataBlock value instead of pointer in reassembly map
Remove linked list from reassembly data structures
Use an std::map for reassembly DataBlock searches
Refactor Reassembler/DataBlock bookkeeping
Reorganize reassembly data structures
Remove a superfluous reassembler DataBlock member
Started by factoring some details into a new DataBlockList class to at
least make it more clear where modifications occur. More abstractions
likely to happen later as I experiment with alternate data structures
aimed at improving worse-case scenarios.
Note - this compiles, but you cannot run Bro anymore - it crashes
immediately with a 0-pointer access. The reason behind it is that the
required clone functionality does not work anymore.
This commit marks (hopefully) ever one-parameter constructor as explicit.
It also uses override in (hopefully) all circumstances where a virtual
method is overridden.
There are a very few other minor changes - most of them were necessary
to get everything to compile (like one additional constructor). In one
case I changed an implicit operation to an explicit string conversion -
I think the automatically chosen conversion was much more convoluted.
This took longer than I want to admit but not as long as I feared :)
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
type) get updated/set for consistency.
- Add more robust mechanisms for flushing the reassembly buffer.
The goal being to report all gaps and deliveries to file analyzers
regardless of the state of the reassembly buffer at the time it has to
be flushed.
For example, if we have a connection between TCP "A" and TCP "B" and "A"
sends segments "1" and "2", but we don't see the first and then the next
acknowledgement from "B" is for everything up to, and including, "2",
the gap would be reported to include both segments instead of just the
first and then delivering the second. Put generally: any segments that
weren't yet delivered because they're waiting for an earlier gap to be
filled would be dropped when an ACK comes in that includes the gap as
well as those pending segments. (If a distinct ACK was seen for just
the gap, that situation would have worked).
Addresses BIT-1246.
The main change is that reassembly code (e.g. for TCP) now uses
int64/uint64 (signedness is situational) data types in place of int
types in order to support delivering data to analyzers that pass 2GB
thresholds. There's also changes in logic that accompany the change in
data types, e.g. to fix TCP sequence space arithmetic inconsistencies.
Another significant change is in the Analyzer API: the *Packet and
*Undelivered methods now use a uint64 in place of an int for the
relative sequence space offset parameter.
- Since it's just the handshake packets out of order, they're no
longer treated as partial connections, which some protocol analyzers
immediately refuse to look at.
- The TCP_Reassembler "is_orig" state failed to change, which led to
protocol analyzers sometimes using the wrong value for that.
- Add a unit test which exercises the Connection::FlipRoles() code
path (i.e. the SYN/SYN-ACK reversal situation).
Addresses BIT-1148.