* origin/topic/vern/remove-uu:
fix up for linking w/ doc update
documentation update
script simplification that removes an unnecessary &is_assigned
removing -uu functionality and associated script analysis now no longer needed
New connections already do
conn_val->Assign(6, val_mgr->EmptyString());
This second assignment was effectively doing
conn_val->Assign(6, "")
for all new connections, causing a new empty ZeekString to be allocated.
On a pcap containing 100% syn packets this gives a noticeable perf improvement.
Benchmark #1: zeek.orig -r /data/pcaps/scan.pcap
Time (mean ± σ): 47.082 s ± 0.547 s [User: 57.555 s, System: 9.114 s]
Range (min … max): 46.516 s … 47.834 s 5 runs
Benchmark #2: zeek -r /data/pcaps/scan.pcap
Time (mean ± σ): 45.260 s ± 0.378 s [User: 55.438 s, System: 8.537 s]
Range (min … max): 44.783 s … 45.789 s 5 runs
Summary
'zeek -r /data/pcaps/scan.pcap' ran
1.04 ± 0.01 times faster than 'zeek.orig -r /data/pcaps/scan.pcap'
This preserves the previous hash key buffer layout (so the testsuite still
passes) and overall approach but gets rid of the codepath for writing singleton
serializations. This code path required a fourth switch block over all types
(besides reads, writes, and size computation) and was inconsistent with the one
for writing non-atomic types.
This allows tracing of hash key buffer reservations, reads, and writes via a new
debug stream, and supports printing a summary of a HashKey object via
Describe(). The latter comes in handy e.g. in TableVal::Describe() (where
including the hash key is now available but commented out).
This preserves the optimization of storing values directly in the key_u member
union when feasible, and using a variable size buffer otherwise. It also adds
bounds-checking for that buffer, moves size arguments to size_t, decouples
construction from hash computation, emulates the tagging feature found in
SerializationFormat to assist troubleshooting, and switches feasible
reinterpret_casts to static_casts.
This functionality previously lived in the CompHash class, with one difference:
this removes a discrepancy between the offset aligner and the memory pointer
aligner/padder. The size aligner used to align the provided offset and then add an
additional alignment size (for example, 1 aligned to 4 wouldn't yield 4 but 8).
Like the memory aligners it now only rounds up as needed.
Includes unit tests.
This takes the existing sorting for table index hashkeys we had in place during
hash key writes and applies it also during buffer size reservation. It changes
the approach slightly: the underlying map now points to the TableVal entry index
vals directly, rather than to the numerical index into an additional list that
gets built up to store those indexes. Doing so removes the need for that list.
Changes during merge:
- Add dedicated test (w/ trace "client_timestamp_enabled.pcapng" from Cloudshark)
- Change types from signed to unsigned.
- Add cast for bit-shifting operand.
- clang-format run
* origin/topic/seth/tsv-logs-utf8-by-default:
Fix mis-usage of string::append that leads to an overflow
Use json_escape_utf8 for all utf8 data in ODesc
Switch the TSV Zeek logs to be UTF8 by default.