* 'topic/xb-anssi/http_signature_body_end_match' of https://github.com/xb-anssi/zeek:
Let signature framework match HTTP body end
Test how the signature framework matches HTTP body
The HTTP analyzer never tells the signature framework when the body of a
request or a response ends, so any signature regex ending in a '$' used
in an 'http-request-body' or in an 'http-reply-body' condition will
never match.
This made it impossible to write a signature which could distinguish an
HTTP body consisting only of something from an HTTP body prefixed by
that same something.
- Fix:
The fix notifies the signature framework on EndOfData() that there will
be no further data to match for this body by giving it an empty buffer
of length 0 with the eol parameter set to true and all others set to
false. This lets it reach the '$' state in its DFA, and doesn't affect
other documented HTTP match behaviours.
- Limitation:
Since the signature framework doesn't appear to keep previously consumed
data on hand, any match of an http-*-body condition whose patterns ends
with a '$' will lead to an empty data parameter being passed to the
signature_match() event because the body data is no longer available
when EndOfData() happens.
Due to segmentation there is anyway no guarantee the data parameter
would have held the entire match even without the '$', since the data
parameter only receives the last chunk of data which completed the match
condition, as can be seen on prefix matches in the btest cases where the
matching data spans multiple segments (the event gives 'B' and not
'AB'), so this is only an extreme case of partial data being given to
that event.
This adds a signatures/http-body-match btest to verify how the signature
framework matches HTTP body in requests and responses.
It currently fails because the 'http-request-body' and 'http-reply-body'
clauses never match anything when there is a '$' in their regular
expressions.
The other pattern clauses such as the 'payload' clause do not suffer
from that restriction and it is not documented as a limitation of HTTP
body pattern clauses either, so it is probably a bug.
The "http-body-match" btest shows that without a fix any signatures
which ends with a '$' in a http-request-body or http-reply-body rule
will never raise a signature_match() event, and that signatures which do
not end with a '$' cannot distinguish an HTTP body prefixed by the
matching pattern (ex: ABCD) from an HTTP body consisting entirely of the
matching pattern (ex: AB).
Test cases by source port:
- 13579:
- GET without body, plain res body (CD, only)
- 13578:
- GET without body, plain res body (CDEF, prefix)
- 24680:
- POST plain req body (AB, only), plain res body (CD, only)
- 24681:
- POST plain req body (ABCD, prefix), plain res body (CDEF, prefix)
- 24682:
- POST gzipped req body (AB, only), gzipped res body (CD, only)
- POST plain req body (CD, only), plain res body (EF, only)
- 33210:
- POST multipart plain req body (AB;CD;EF, prefix)
- plain res body (CD, only)
- 33211:
- POST multipart plain req body (ABCD;EF, prefix)
- plain res body (CDEF, prefix)
- 34527:
- POST chunked gzipped req body (AB, only)
- chunked gzipped res body (CD, only)
- 34528:
- POST chunked gzipped req body (ABCD, prefix)
- chunked gzipped res body (CDEF, prefix)
The tests with source ports 24680, 24682 and 34527 should
match the signature http_request_body_AB_only and the signature
http_request_body_AB_prefix, but they only match the latter.
The tests with source ports 13579, 24680, 24682, 33210 and 34527 should
match the signature http_response_body_CD_only and the signature
http_response_body_CD_prefix, but they only match the latter.
The tests with source ports 24680, 24681, 33210 and 33211 show how the
http_request_body_AB_then_CD signature with two http-request-body
conditions match either on one or multiple requests (documented
behaviour).
The test cases with other source ports show where the
http_request_body_AB_only and http_response_body_CD_only signatures
should not match because their bodies include more than the searched
patterns.
Add a new overload to `copy_string` that takes the input characters plus
size. The new overload avoids inefficient scanning of the input for the
null terminator in cases where we know the size beforehand. Furthermore,
this overload *must* be used when dealing with input character sequences
that may have no null terminator, e.g., when the input is from a
`std::string_view` object.
* origin/topic/awelzel/3379-shared-ptr-and-micro-optimizations:
build_inner_connection: Use the outer packet's timestamp
build_inner_connection: Avoid one extra Init()
packet_analysis: Do not run DetectProtocol() on disabled analyzers
packet_analysis/Dispatcher: Do not index table twice
packet_analysis: Avoid shared_ptr copying for analyzer lookups
Packet::Init() is not so cheap as one might think: It computes a
timestamp from { 0, 0 } using double division. Just avoid this
by not initializing an empty Packet.
For deeply encapsulated connections (think AWS traffic mirroring format
like IP,UDP,GENEVE,IP,UDP,VXLAN,ETH,IP,TCP), the Dispatcher::Lookup()
method is fairly visible in profiles when running in bare mode.
This changes the Analyzer::Lookup() and Dispatcher::Lookup() return value
breaking the API in favor of the performance improvement.
Relates to zeek/zeek#3379.
This commit adds a multitude of new extension types that were added in
the last few years; it also adds grease values to extensions, curves,
and ciphersuites.
Furthermore, it adds a test that contains a encrypted-client-hello
key-exchange (which uses several extension types that we do not have in
our baseline so far).
We do not activate support for JavaScript at this time since most of our
JavaScript code is in BTest files to test zeekjs, but these files also
contain other languages which leads to largely misformated files.
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
A number of analyzers that we've been fuzzing with the generic-analyzer-fuzzer
setup do not implement DeliverStream() and instead only work with DeliverPacket()
(ntp, syslog, sip, radius, ...). Calling DeliverStream() on those is
pretty much a noop and fuzzing not effective.
This change adds support to fuzz DeliverPacket(). Whether to use packet
or stream fuzzing is configured through a define via CMake.
This is still a bit limited in that for analyzers that support both,
DeliverPacket() and DeliverStream(), only one code path is fuzzed.
Closed#3398