When Zeek flips roles of a HTTP connection subsequent to the HTTP analyzer
being attached, that analyzer would not update its own ContentLine analyzer
state, resulting in the wrong ContentLine analyzer being switched into
plain delivery mode.
In debug builds, this would result in assertion failures, in production
builds, the HTTP analyzer would receive HTTP bodies as individual header
lines, or conversely, individual header lines would be delivered as a
large chunk from the ContentLine analyzer.
PCAPs were generated locally using tcprewrite to select well-known-http ports
for both endpoints, then editcap to drop the first SYN packet.
Kudos to @JordanBarnartt for keeping at it.
Closes#3789
Changing the default_file_bof_buffer_size has subtle impact on
MIME type detection and changed the zeek-testing baseline. Do
not load this new script via test-all-policy to avoid this.
The new test was mainly an aid to understand what is actually going on.
In short, if default_file_bof_buffer_size is larger than the file MIME
detection only runs when the buffer is full, or when the file is removed.
When a file transfer happens over multiple HTTP connections, only
some or one of the http.log entries will have a proper response MIME type.
PCAP extracted from 2009-M57-day11-18.trace.gz.
OSS-Fuzz managed to produce a MIME multipart message construction with
thousands of nested entities (or that's what Zeek makes out of it anyhow).
Prevent such deep analysis by capping at a nesting depth of 100,
preventing unnecessary resource usage. A new weird named exceeded_mime_max_depth
is reported when this limit is reached.
This change reduces the runtime of the OSS-Fuzz reproducer from ~45 seconds
to ~2.5 seconds.
The test PCAP was produced from a Python script using the email package
and sending the rendered version via POST to a HTTP server.
Closes#208
This adds a signatures/http-body-match btest to verify how the signature
framework matches HTTP body in requests and responses.
It currently fails because the 'http-request-body' and 'http-reply-body'
clauses never match anything when there is a '$' in their regular
expressions.
The other pattern clauses such as the 'payload' clause do not suffer
from that restriction and it is not documented as a limitation of HTTP
body pattern clauses either, so it is probably a bug.
The "http-body-match" btest shows that without a fix any signatures
which ends with a '$' in a http-request-body or http-reply-body rule
will never raise a signature_match() event, and that signatures which do
not end with a '$' cannot distinguish an HTTP body prefixed by the
matching pattern (ex: ABCD) from an HTTP body consisting entirely of the
matching pattern (ex: AB).
Test cases by source port:
- 13579:
- GET without body, plain res body (CD, only)
- 13578:
- GET without body, plain res body (CDEF, prefix)
- 24680:
- POST plain req body (AB, only), plain res body (CD, only)
- 24681:
- POST plain req body (ABCD, prefix), plain res body (CDEF, prefix)
- 24682:
- POST gzipped req body (AB, only), gzipped res body (CD, only)
- POST plain req body (CD, only), plain res body (EF, only)
- 33210:
- POST multipart plain req body (AB;CD;EF, prefix)
- plain res body (CD, only)
- 33211:
- POST multipart plain req body (ABCD;EF, prefix)
- plain res body (CDEF, prefix)
- 34527:
- POST chunked gzipped req body (AB, only)
- chunked gzipped res body (CD, only)
- 34528:
- POST chunked gzipped req body (ABCD, prefix)
- chunked gzipped res body (CDEF, prefix)
The tests with source ports 24680, 24682 and 34527 should
match the signature http_request_body_AB_only and the signature
http_request_body_AB_prefix, but they only match the latter.
The tests with source ports 13579, 24680, 24682, 33210 and 34527 should
match the signature http_response_body_CD_only and the signature
http_response_body_CD_prefix, but they only match the latter.
The tests with source ports 24680, 24681, 33210 and 33211 show how the
http_request_body_AB_then_CD signature with two http-request-body
conditions match either on one or multiple requests (documented
behaviour).
The test cases with other source ports show where the
http_request_body_AB_only and http_response_body_CD_only signatures
should not match because their bodies include more than the searched
patterns.
Setting this option to false does not count missing bytes in files towards the
extraction limits, and allows to extract data up to the desired limit,
even when partial files are written.
When missing bytes are encountered, files are now written as sparse
files.
Using this option requires the underlying storage and utilities to support
sparse files.
Setting this option to false does not count missing bytes in files towards the
extraction limits, and allows to extract data up to the desired limit,
even when partial files are written.
When missing bytes are encountered, files are now written as sparse
files.
Using this option requires the underlying storage and utilities to support
sparse files.
(cherry picked from commit afa6f3a0d3b8db1ec5b5e82d26225504c2891089)
When http_reply events are received before http_request events, either
through faking traffic or possible re-ordering, it is possible to trigger
unbounded state growth due to later http_requests never being matched
again with responses.
Prevent this by synchronizing request/response counters when late
requests come in.
Also forcefully flush pending requests when http_replies are never
observed either due to the analyzer having been disabled or because
half-duplex traffic.
Fixes#1705
This was exposed by OSS-Fuzz after the HTTP/0.9 changes in zeek/zeek#2851:
We do not check the result of parsing the from and last bytes of a
Content-Range header and would reference uninitialized values on the stack
if these were not valid.
This doesn't seem as bad as it sounds outside of yielding non-sensible values:
If the result was negative, we weird/bailed. If the result was positive, we
already had to treat it with suspicion anyway and the SetPlainDelivery()
logic accounts for that.
OSS-Fuzz tickled an assert when sending a HTTP response before a HTTP/0.9
request. Avoid this by resetting reply_message upon seeing a HTTP/0.9 request.
PCAP was generated artificially: Server sending a reply providing a
Content-Length. Because HTTP/0.9 processing would remove the ContentLine
support analyzer, more data was delivered to the HTTP_Message than
expected, triggering an assert.
This is a follow-up for zeek/zeek#2851.
The seen/file-names script relies on f$info$filename to be populated.
For HTTP and other network protocols, however, this field is only
populated during file_over_new_connection() that's running after
file_new().
Use the file_new() event only for files without connections and
file_over_new_connection() implies that f$conns is populated, anyway.
Special case SMB to avoid finding files twice, because there's a
custom implementation in seen/smb-filenames.zeek.
Fixes#2647
* When a file is transferred over multiple connection, have
create_file_info() just pick the first one instead of none.
* Do not unconditionally assume cid and cuid as set on a
Notice::FileInfo object.
oss-fuzz generated "HTTP traffic" containing 250k+ sequences of "T<space>\r\r"
which Zeek then logged as individual HTTP requests. Add a heuristic to bail
on such request lines. It's a bit specific to the test case, but should work.
There are more issues around handling HTTP/0.9, e.g. triggering
"not a http reply line" when HTTP/0.9 never had such a thing, but
I don't think that's worth fixing up.
Fixes#119
The current_entity tracking in HTTP assumes that client/server never
send HTTP entities at the same time. The attached pcap (generated
artificially) violates this and triggers:
1663698249.307259 expression error in <...>base/protocols/http/./entities.zeek, line 89: field value missing (HTTP::c$http$current_entity)
For the http-no-crlf test, include weird.log as baseline. Now that weird is
@load'ed from http, it is actually created and seems to make sense
to btest-diff it, too.
This is a script-only change that unrolls File::Info records into
multiple files.log entries if the same file was seen over different
connections by single worker. Consequently, the File::Info record
gets the commonly used uid and id fields added. These fields are
optional for File::Info - a file may be analyzed without relation
to a network connection (e.g by using Input::add_analysis()).
The existing tx_hosts, rx_hosts and conn_uids fields of Files::Info
are not meaningful after this change and removed by default. Therefore,
files.log will have them removed, too.
The tx_hosts, rx_hosts and conn_uids fields can be revived by using the
policy script frameworks/files/deprecated-txhosts-rxhosts-connuids.zeek
included in the distribution. However, with v6.1 this script will be
removed.
This is to avoid missing large sessions where a single side exceeds
the DPD buffer size. It comes with the trade-off that now the analyzer
can be triggered by anybody controlling one of the endpoints (instead
of both).
Test suite changes are minor, and nothing in "external".
Closes#343.
The body-lengths of sub-entities, like multipart messages, got counted
twice by mistake: once upon the end of the sub-entity and then again
upon the end of the top-level entity that contains all sub-entities.
The size of just the top-level entity is the correct one to use.
Switches from pcap_next() to pcap_next_ex() to better handle all error
conditions. This allows, for example, to have a non-zero exit code for
a Zeek process that fails to fully process all packets in a pcap file.
This adds a slight patch to the HTTP analyzer, which recognizez when a connection is
upgraded to a different protocol (using a 101 reply with a few specific headers being
set).
In this case, the analyzer stops further processing of the connection (which will
result in DPD errors) and raises a new event:
event http_connection_upgrade(c: connection, protocol: string);
Protocol contains the name of the protocol that is being upgraded to, as specified in
one of the header values.
The change from #49 made it an error to not have a URI. That however
then led requests with an URI yet no version to abort as well.
Instead, we now check if the token following the method is an "HTTP/"
version identifier. If, so accept that the URI is empty (and trigger
a weird) but otherwise keep processing.
Adding test cases for both HTTP requests without URI and without
version.
The HTTP analyzer was propogating Gaps to the files framework even
in the case of a packet drop occurring immediately after the headers
are completed in an HTTP response when the response content length
was declared to be zero (no file started, so no loss).
Includes passing test.
The logic for determining whether a gap was entirely within a MIME
entity body was not asking the current entity, which may be better able
to answer that question if it was using the Content-Range header and
thus knows if the gap exceeds the length of the body that's still
expected.
Addresses BIT-1247
For example, if we have a connection between TCP "A" and TCP "B" and "A"
sends segments "1" and "2", but we don't see the first and then the next
acknowledgement from "B" is for everything up to, and including, "2",
the gap would be reported to include both segments instead of just the
first and then delivering the second. Put generally: any segments that
weren't yet delivered because they're waiting for an earlier gap to be
filled would be dropped when an ACK comes in that includes the gap as
well as those pending segments. (If a distinct ACK was seen for just
the gap, that situation would have worked).
Addresses BIT-1246.
As opposed to delaying until a certain-sized-buffer fills, which is
problematic because then the event becomes out of sync with the "rest of
the world". E.g. content_gap handlers being called sooner than
expected.
Addresses BIT-1240.
If reading a trace file w/ only TCP control packets, a warning is
emitted to suggest the 'detect_filtered_traces' option if the user
doesn't desire Bro to report missing TCP segments for such a trace file.