The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
Started by factoring some details into a new DataBlockList class to at
least make it more clear where modifications occur. More abstractions
likely to happen later as I experiment with alternate data structures
aimed at improving worse-case scenarios.
To be more exact: &encrypt, &mergeable, &rotate_interval, &rotate_size
Also removes no longer used redef-able constants:
log_rotate_interval, log_max_size, log_encryption_key
GH-243
Note - this compiles, but you cannot run Bro anymore - it crashes
immediately with a 0-pointer access. The reason behind it is that the
required clone functionality does not work anymore.
* origin/topic/jsiwek/plist-and-event-cleanup:
Add comments to QueueEvent() and ConnectionEvent()
Add methods to queue events without handler existence check
Cleanup/improve PList usage and Event API
Added ConnectionEventFast() and QueueEventFast() methods to avoid
redundant event handler existence checks.
It's common practice for caller to already check for event handler
existence before doing all the work of constructing the arguments, so
it's desirable to not have to check for existence again.
E.g. going through ConnectionEvent() means 3 existence checks:
one you do yourself before calling it, one in ConnectionEvent(), and then
another in QueueEvent().
The existence check itself can be more than a few operations sometimes
as it needs to check a few flags that determine if it's enabled, has
a local body, or has any remote receivers in the old comm. system or
has been flagged as something to publish in the new comm. system.
Majority of PLists are now created as automatic/stack objects,
rather than on heap and initialized either with the known-capacity
reserved upfront or directly from an initializer_list (so there's no
wasted slack in the memory that gets allocated for lists containing
a fixed/known number of elements).
Added versions of the ConnectionEvent/QueueEvent methods that take
a val_list by value.
Added a move ctor/assign-operator to Plists to allow passing them
around without having to copy the underlying array of pointers.
Includes small readability tweaks, see BIT-1854.
Closes BIT-1854.
* origin/topic/jsiwek/bit-1854-reassembler-improvements:
BIT-1854: improve reassembly overlap checking
BIT-1854: fix the 'tcp_excessive_data_without_further_acks' option
This previously checked against the amount of out-of-sequence data
being buffered by the reassembler. It now checks against the total
size of all blocks being buffered in the reassembler, which, by nature
of still being buffered there, means it's not been acked yet.
- Removed the gap_report event. It wasn't used anymore
and functionally no more capable that scheduling events
and using the get_gap_summary bif.
- Added functionality to Dictionaries to count cumulative
numbers of inserts performed. This is further used to
measure the total number of connections of various types.
Previously only the number of active connections was
available.
- The Reassembler base class now tracks active reassembly
size for all subclasses (File/TCP/Frag & unknown).
- Improvements to the stats.log. Mostly, more information.
TCP_Reassembler can now keep a history of old TCP segments using the
`tcp_max_old_segments` option. A value of zero will disable it.
An overlapping segment with different data can indicate a possible
TCP injection attack. The rexmit_inconsistency event will fire if this
is the case.
Due to the change in f1cef9d2a9, it was possible for the TCP reassembler
to deliver the same data twice because Undelivered did not take in to
account that the reassembly stream could now advance past the end of the
gap.
Addresses BIT-1259.
* origin/topic/jsiwek/jj-bugs:
Fix incorrect data delivery skips after gap in HTTP Content-Range.
Fix file analysis placement of data after gap in HTTP Content-Range.
Fix issue w/ TCP reassembler not delivering some segments.
Raise http_entity_data in line with data arrival.
Implement file ID caching for MIME_Mail.
BIT-1240: Fix MIME entity file data/gap ordering.
BIT-1240 #closed
BIT-1246 #closed
BIT-1247 #closed
BIT-1248 #closed
For example, if we have a connection between TCP "A" and TCP "B" and "A"
sends segments "1" and "2", but we don't see the first and then the next
acknowledgement from "B" is for everything up to, and including, "2",
the gap would be reported to include both segments instead of just the
first and then delivering the second. Put generally: any segments that
weren't yet delivered because they're waiting for an earlier gap to be
filled would be dropped when an ACK comes in that includes the gap as
well as those pending segments. (If a distinct ACK was seen for just
the gap, that situation would have worked).
Addresses BIT-1246.
The main change is that reassembly code (e.g. for TCP) now uses
int64/uint64 (signedness is situational) data types in place of int
types in order to support delivering data to analyzers that pass 2GB
thresholds. There's also changes in logic that accompany the change in
data types, e.g. to fix TCP sequence space arithmetic inconsistencies.
Another significant change is in the Analyzer API: the *Packet and
*Undelivered methods now use a uint64 in place of an int for the
relative sequence space offset parameter.
- Since it's just the handshake packets out of order, they're no
longer treated as partial connections, which some protocol analyzers
immediately refuse to look at.
- The TCP_Reassembler "is_orig" state failed to change, which led to
protocol analyzers sometimes using the wrong value for that.
- Add a unit test which exercises the Connection::FlipRoles() code
path (i.e. the SYN/SYN-ACK reversal situation).
Addresses BIT-1148.
The previous behavior was to accomodate SYN/FIN/RST-filtered traces by
not reporting missing data (via the content_gap event) for such
connections. The new behavior always reports gaps for connections that
are established and terminate normally, but sequence numbers indicate
that all data packets of the connection were missed. The behavior can
be reverted by redef'ing "detect_filtered_trace".
Replaced some with InternalWarning or InternalAnalyzerError, the later
being a new method which signals the analyzer to not process further
input. Some usages I just removed if they didn't make sense or clearly
couldn't happen. Also did some minor refactors of related code while
reviewing/exploring ways to get rid of InternalError usages.
Also, for TCP content file write failures there's a new event:
"contents_file_write_failure".