This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.
There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.
Due to the fact that we switch the file hashing algorithm, all file
hashes change.
The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
* origin/topic/timw/776-using-statements:
Remove 'using namespace std' from SerialTypes.h
Remove other using statements from headers
GH-776: Remove using statements added by PR 770
Includes small fixes in files that changed since the merge request was
made.
Also includes a few small indentation fixes.
This unfortunately cuases a ton of flow-down changes because a lot of other
code was depending on that definition existing. This has a fairly large chance
to break builds of external plugins, considering how many internal ones it broke.
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
* origin/topic/timw/deprecate-int-types:
Deprecate the internal int/uint types in favor of the cstdint types they were based on
Merge adjustments:
* A bpf type mistakenly got replaced (inside an unlikely #ifdef)
* Did a few substitutions that got missed (likely due to
pre-processing out of DEBUG macros)
Added ConnectionEventFast() and QueueEventFast() methods to avoid
redundant event handler existence checks.
It's common practice for caller to already check for event handler
existence before doing all the work of constructing the arguments, so
it's desirable to not have to check for existence again.
E.g. going through ConnectionEvent() means 3 existence checks:
one you do yourself before calling it, one in ConnectionEvent(), and then
another in QueueEvent().
The existence check itself can be more than a few operations sometimes
as it needs to check a few flags that determine if it's enabled, has
a local body, or has any remote receivers in the old comm. system or
has been flagged as something to publish in the new comm. system.
Majority of PLists are now created as automatic/stack objects,
rather than on heap and initialized either with the known-capacity
reserved upfront or directly from an initializer_list (so there's no
wasted slack in the memory that gets allocated for lists containing
a fixed/known number of elements).
Added versions of the ConnectionEvent/QueueEvent methods that take
a val_list by value.
Added a move ctor/assign-operator to Plists to allow passing them
around without having to copy the underlying array of pointers.
Closes#1830.
* origin/topic/johanna/ocsp-sct-validate: (82 commits)
Tiny script changes for SSL.
Update CT Log list
SSL: Update OCSP/SCT scripts and documentation.
Revert "add parameter 'status_type' to event ssl_stapled_ocsp"
Revert "parse multiple OCSP stapling responses"
SCT: Fix script error when mime type of file unknown.
SCT: another memory leak in SCT parsing.
SCT validation: fix small memory leak (public keys were not freed)
Change end-of-connection handling for validation
OCSP/TLS/SCT: Fix a number of test failures.
SCT Validate: make caching a bit less aggressive.
SSL: Fix type of ssl validation result
TLS-SCT: compile on old versions of OpenSSL (1.0.1...)
SCT: Add caching support for validation
SCT: Add signed certificate timestamp validation script.
SCT: Allow verification of SCTs in Certs.
SCT: only compare correct OID/NID for Cert/OCSP.
SCT: add validation of proofs for extensions and OCSP.
SCT: pass timestamp as uint64 instead of time
Add CT log information to Bro
...
This makes it much easier for protocols where the mime type is known in
advance like, for example, TLS. We now do no longer have to perform deep
script-level magic.
- Plain text now identified with BOMs for UTF8,16,32
(even though 16 and 32 wouldn't get identified as plain text, oh-well)
- X.509 certificates are now populating files.log with
the mime type application/pkix-cert.
- File signatures are split apart into file types
to help group and organize signatures a bit better.
- Normalized some FILE_ANALYSIS debug messages.
- Improved Javascript detection.
- Improved HTML detection.
- Removed a bunch of bad signatures.
- Merged a bunch of signatures that ultimately detected
the same mime type.
- Added detection for MS LNK files.
- Added detection for cross-domain-policy XML files.
- Added detection for SOAP envelopes.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
type) get updated/set for consistency.
- Add more robust mechanisms for flushing the reassembly buffer.
The goal being to report all gaps and deliveries to file analyzers
regardless of the state of the reassembly buffer at the time it has to
be flushed.
file_analysis::Manager's dtor now doesn't assume any more analysis
progress can be made because too many of Bro's other subsystems
are shutdown by that point. Any file analysis requests made after
Terminate cannot be reliably processed.
- Improve or just remove some file magic signatures ported from libmagic
that were too general and matched incorrectly too often.
- Fix MHR script's use of fa_file$mime_type before checking if it's
initialized. It may be uninitialized if no signatures match.
- The "fa_file" record now contains a "mime_types" field that contains
all magic signatures that matched the file content (where the
"mime_type" field is just a shortcut for the strongest match).
Put some methods in file_analysis::Manager that can perform the
matching process and return MIME type results. Also helps to
centralize the management/re-use of a signature matcher object.
* origin/topic/jsiwek/http-file-id-caching:
Revert use of HTTP file ID caching for gaps range request content.
Extend file analysis API to allow file ID caching, adapt HTTP to use it.
BIT-1125 #merged
This allows an analyzer to either provide file IDs associated with some
file content or to cache a file ID that was already determined by
script-layer logic so that subsequent calls to the file analysis
interface can bypass costly detours through script-layer. This can
yield a decent performance improvement for analyzers that are able to
take advantage of it and deal with streaming content (like HTTP).
- The reassembly behavior can be modified per-file by enabling or
disabling the reassembler and/or modifying the size of the reassembly
buffer.
- Changed the file extraction analyzer to use the stream to avoid
issues with the chunk based approach not immediately triggering
the file_new event due to mime-type detection delay. Early chunks
frequently ended up lost before.
- Generally things are working now and I'd consider this in testing.
- Move more functionality into base class.
- Remove cctors and assignment operators (weren't actually needed anymore)
- Switch from const char* to std::string.