* origin/topic/timw/776-using-statements:
Remove 'using namespace std' from SerialTypes.h
Remove other using statements from headers
GH-776: Remove using statements added by PR 770
Includes small fixes in files that changed since the merge request was
made.
Also includes a few small indentation fixes.
This unfortunately cuases a ton of flow-down changes because a lot of other
code was depending on that definition existing. This has a fairly large chance
to break builds of external plugins, considering how many internal ones it broke.
* 'intrusive_ptr' of https://github.com/MaxKellermann/zeek: (32 commits)
Scope: store IntrusivePtr in `local`
Scope: pass IntrusivePtr to AddInit()
DNS_Mgr: use class IntrusivePtr
Scope: use class IntrusivePtr
Attr: use class IntrusivePtr
Expr: check_and_promote_expr() returns IntrusivePtr
Frame: use class IntrusivePtr
Val: RecordVal::LookupWithDefault() returns IntrusivePtr
Type: RecordType::FieldDefault() returns IntrusivePtr
Val: TableVal::Delete() returns IntrusivePtr
Type: base_type() returns IntrusivePtr
Type: init_type() returns IntrusivePtr
Type: merge_types() returns IntrusivePtr
Type: use class IntrusivePtr in VectorType
Type: use class IntrusivePtr in EnumType
Type: use class IntrusivePtr in FileType
Type: use class IntrusivePtr in TypeDecl
Type: make TypeDecl `final` and the dtor non-`virtual`
Type: use class IntrusivePtr in TypeType
Type: use class IntrusivePtr in FuncType
...
The Zeek code base has very inconsistent #includes. Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed. Another side effect was a lot of header
bloat which slows down the build.
First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.
After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations. In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.
This patch speeds up the build by 19%, because each compilation unit
gets smaller. Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):
Before this patch:
3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps
After this patch:
2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
* origin/topic/timw/deprecate-int-types:
Deprecate the internal int/uint types in favor of the cstdint types they were based on
Merge adjustments:
* A bpf type mistakenly got replaced (inside an unlikely #ifdef)
* Did a few substitutions that got missed (likely due to
pre-processing out of DEBUG macros)
Added ConnectionEventFast() and QueueEventFast() methods to avoid
redundant event handler existence checks.
It's common practice for caller to already check for event handler
existence before doing all the work of constructing the arguments, so
it's desirable to not have to check for existence again.
E.g. going through ConnectionEvent() means 3 existence checks:
one you do yourself before calling it, one in ConnectionEvent(), and then
another in QueueEvent().
The existence check itself can be more than a few operations sometimes
as it needs to check a few flags that determine if it's enabled, has
a local body, or has any remote receivers in the old comm. system or
has been flagged as something to publish in the new comm. system.
Majority of PLists are now created as automatic/stack objects,
rather than on heap and initialized either with the known-capacity
reserved upfront or directly from an initializer_list (so there's no
wasted slack in the memory that gets allocated for lists containing
a fixed/known number of elements).
Added versions of the ConnectionEvent/QueueEvent methods that take
a val_list by value.
Added a move ctor/assign-operator to Plists to allow passing them
around without having to copy the underlying array of pointers.
This changes many weird names to move non-static content from the
weird name into the "addl" field to help ensure the total number of
weird names is reasonably bounded. Note the net_weird and flow_weird
events do not have an "addl" parameter, so information may no longer
be available in those cases -- to make it available again we'd need
to either (1) define new events that contain such a parameter, or
(2) change net_weird/flow_weird event signature (which is a breaking
change for user-code at the moment).
Also, the generic handling of binpac exceptions for analyzers which
to not otherwise catch and handle them has been changed from a Weird
to a ProtocolViolation.
Finally, a new "file_weird" event has been added for reporting
weirdness found during file analysis.
This makes it much easier for protocols where the mime type is known in
advance like, for example, TLS. We now do no longer have to perform deep
script-level magic.
This undoes the changes applied in merge 9db27a6d60
and goes back to the state in the branch as of the merge 5ab3b86.
Getting rid of the additional layer of removing analyzers and just
keeping them in the set introduced subtle differences in behavior since
a few calls were still passed along. Skipping all of these with SetSkip
introduced yet other subtle behavioral differences.
* origin/topic/robin/file-analysis-fixes:
Adding test with command line that used to trigger a crash.
Cleaning up a couple of comments.
Fix delay in disabling file analyzers.
Fix file analyzer memory management.
The merge changes around functionality a bit again - instead of having
a list of done analyzers, analyzers are simply set to skipping when they
are removed, and cleaned up later on destruction of the AnalyzerSet.
BIT-1782 #merged
When a file analyzer signaled being done with data delivery, the
analyzer would only be scheduled for removal at that poing, meaning it
could still receive more data until that action actually took effect.
Now we make sure to not send any more data to an analyzer.
File analyzers got deleted immediately once the queue with the
corresponding removal operation got drained. That however can happen
while the analyzer is still doing stuff: the queue is drained whenever
any the "special" file analysis events needing immediate attention has
been executed. This fix now only schedules the analyzer for deletion
at that time, but postpones the actual operation until file object
itself is being destroyed.
* origin/topic/seth/more-file-type-ident-fixes:
File API updates complete.
Fixes for file type identification.
API changes to file analysis mime type detection.
Make HTTP 206 reassembly require ETags by default.
More file type identification improvements
Fix an issue with files having gaps before the bof_buffer is filled.
Fix an issue with packet loss in http file reporting.
Adding WOFF fonts to file type identification.
Extended JSON matching and added OCSP responses.
Another large signature update.
More signature updates.
Even more file type ident clean up.
Lots of fixes for file type identification.
BIT-1368 #merged
Removed "file_mime_type" and "file_mime_types" event, replacing them
with a new event called "file_metadata_inferred". It has a record
argument of type "inferred_file_metadata", which contains the mime type
information that the earlier events used to supply. The idea here is
that future extensions to the record with new metadata will be less
likely to break user code than the alternatives (adding new events or
new event parameters).
Addresses BIT-1368.
When files had gaps prior to the bof_buffer completely filling, the
file gap handling code was never sniffing and passing along as much
data as possible so file type identification wasn't working correctly.
- Plain text now identified with BOMs for UTF8,16,32
(even though 16 and 32 wouldn't get identified as plain text, oh-well)
- X.509 certificates are now populating files.log with
the mime type application/pkix-cert.
- File signatures are split apart into file types
to help group and organize signatures a bit better.
- Normalized some FILE_ANALYSIS debug messages.
- Improved Javascript detection.
- Improved HTML detection.
- Removed a bunch of bad signatures.
- Merged a bunch of signatures that ultimately detected
the same mime type.
- Added detection for MS LNK files.
- Added detection for cross-domain-policy XML files.
- Added detection for SOAP envelopes.
- Any files where the total size was below the size of the
default bof_buffer size couldn't have stream analyzers successfully
attached because the bof_buffer never reached the full size
and was never flushed. This branch explicitly marks the buf_buffer
as full and flushes it when the file is being removed.
- Re-arrange how some fa_file fields (e.g. source, connection info, mime
type) get updated/set for consistency.
- Add more robust mechanisms for flushing the reassembly buffer.
The goal being to report all gaps and deliveries to file analyzers
regardless of the state of the reassembly buffer at the time it has to
be flushed.
MIME entities buffered data and passed it along to protocol analyzers in
discrete amounts, but a gap is always passed along right away, so the
ordering of these "events" can cause incorrect file analysis. The
change here is to never leave any MIME data buffered -- it should now be
passed along line by line as it is seen, but may still temporarily make
use of a buffer allocated by the analyzer as it works on decoding
content.
The value of *bof_buffer_size* in the *fa_file* record was supposed to
always limit the amount of data used by the signature matching engine,
but some corner cases would cause matching to be performed on data
beyond that.
* origin/topic/jsiwek/file-signatures:
File type detection changes and fix https.log {orig,resp}_fuids fields.
Various minor changes related to file mime type detection.
Refactor common MIME magic matching code.
Replace libmagic w/ Bro signatures for file MIME type identification.
Conflicts:
scripts/base/init-default.bro
testing/btest/Baseline/coverage.bare-load-baseline/canonified_loaded_scripts.log
testing/btest/Baseline/coverage.default-load-baseline/canonified_loaded_scripts.log
BIT-1143 #merged
- Improve or just remove some file magic signatures ported from libmagic
that were too general and matched incorrectly too often.
- Fix MHR script's use of fa_file$mime_type before checking if it's
initialized. It may be uninitialized if no signatures match.
- The "fa_file" record now contains a "mime_types" field that contains
all magic signatures that matched the file content (where the
"mime_type" field is just a shortcut for the strongest match).
Put some methods in file_analysis::Manager that can perform the
matching process and return MIME type results. Also helps to
centralize the management/re-use of a signature matcher object.