When Zeek flips roles of a HTTP connection subsequent to the HTTP analyzer
being attached, that analyzer would not update its own ContentLine analyzer
state, resulting in the wrong ContentLine analyzer being switched into
plain delivery mode.
In debug builds, this would result in assertion failures, in production
builds, the HTTP analyzer would receive HTTP bodies as individual header
lines, or conversely, individual header lines would be delivered as a
large chunk from the ContentLine analyzer.
PCAPs were generated locally using tcprewrite to select well-known-http ports
for both endpoints, then editcap to drop the first SYN packet.
Kudos to @JordanBarnartt for keeping at it.
Closes#3789
`FirstPacket()` so far supported only TCP. To extend this to UDP, we
move the method into the PIA base class; give it a protocol parameter
for the case that there's no actual packet is available; and add the
ability to create fake UDP packets as well, not just TCP.
This whole thing is pretty ugly to begin with, and this doesn't make
it nicer, but we need this extension that so we can feed UDP data into
the signature engine that's tunneled over other protocols. Without the
fake packets, DPD signatures in particular wouldn't have anything to
match on.
After an HTTP upgrade to another protocol, create a weird if the packet
that contains the HTTP reply *also* contains some additional data
belonging to the upgraded to protocol already.
With configurability through script-land comes the draw back
that we actually need to execute event handlers in the middle
of the parsing process: This might not be the best model, but
the script-side configurability it enables is kind of nice.
This explicit call only matters here when the HTTP reply is
directly followed by some WebSocket message data within the
same network packet, otherwise the queue is drained once the
packet has been completely processed anyhow.
* 'topic/xb-anssi/http_signature_body_end_match' of https://github.com/xb-anssi/zeek:
Let signature framework match HTTP body end
Test how the signature framework matches HTTP body
The HTTP analyzer never tells the signature framework when the body of a
request or a response ends, so any signature regex ending in a '$' used
in an 'http-request-body' or in an 'http-reply-body' condition will
never match.
This made it impossible to write a signature which could distinguish an
HTTP body consisting only of something from an HTTP body prefixed by
that same something.
- Fix:
The fix notifies the signature framework on EndOfData() that there will
be no further data to match for this body by giving it an empty buffer
of length 0 with the eol parameter set to true and all others set to
false. This lets it reach the '$' state in its DFA, and doesn't affect
other documented HTTP match behaviours.
- Limitation:
Since the signature framework doesn't appear to keep previously consumed
data on hand, any match of an http-*-body condition whose patterns ends
with a '$' will lead to an empty data parameter being passed to the
signature_match() event because the body data is no longer available
when EndOfData() happens.
Due to segmentation there is anyway no guarantee the data parameter
would have held the entire match even without the '$', since the data
parameter only receives the last chunk of data which completed the match
condition, as can be seen on prefix matches in the btest cases where the
matching data spans multiple segments (the event gives 'B' and not
'AB'), so this is only an extreme case of partial data being given to
that event.
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
This was exposed by OSS-Fuzz after the HTTP/0.9 changes in zeek/zeek#2851:
We do not check the result of parsing the from and last bytes of a
Content-Range header and would reference uninitialized values on the stack
if these were not valid.
This doesn't seem as bad as it sounds outside of yielding non-sensible values:
If the result was negative, we weird/bailed. If the result was positive, we
already had to treat it with suspicion anyway and the SetPlainDelivery()
logic accounts for that.
OSS-Fuzz tickled an assert when sending a HTTP response before a HTTP/0.9
request. Avoid this by resetting reply_message upon seeing a HTTP/0.9 request.
PCAP was generated artificially: Server sending a reply providing a
Content-Length. Because HTTP/0.9 processing would remove the ContentLine
support analyzer, more data was delivered to the HTTP_Message than
expected, triggering an assert.
This is a follow-up for zeek/zeek#2851.
Mostly, treat HTTP0.9 completely separate. Because we're doing raw
delivery of a body directly, fake enough (connection_close=1, and finish
headers manually) so that the MIME infrastructure thinks it is seeing a
body.
This deals better with the body due to accounting for the first line. Also
it avoids the content line analyzer to strip CRLF/LF and the analyzer
then adding CRLF unconditionally by fully bypassing the content line
analyzer.
Concretely, the vlan-mpls test case contains a HTTP response with LF only,
but the previous implementation would use CRLF, accounting for two many bytes.
Same for the http.no-version test which would previously report a body
length of 280 and now is at 323 (which agrees with wireshark).
Further, the mime_type detection for the http-09 test case works because
it's now seeing the full body.
Drawback: We don't extract headers when a server actually replies with
a HTTP/1.1 message, but grrr, something needs to give I guess.
The #124 PR introduced special treatment when HTTP version 0.9
was set. With #127, a reproducer that set HTTP/1.0 in the first
request was created and subsequent requests wouldn't reset to
HTTP version 0.9.
This is subtle, but doesn't seem like things fall apart.
Improves runtime from 20 seconds to 2 seconds for the given
reproducer.
Fixes#127.
oss-fuzz generated "HTTP traffic" containing 250k+ sequences of "T<space>\r\r"
which Zeek then logged as individual HTTP requests. Add a heuristic to bail
on such request lines. It's a bit specific to the test case, but should work.
There are more issues around handling HTTP/0.9, e.g. triggering
"not a http reply line" when HTTP/0.9 never had such a thing, but
I don't think that's worth fixing up.
Fixes#119
This enables locating the headers within the install-tree using the
dirs provided by `zeek-config --include_dir`.
To enable locating these headers within the build-tree, this change also
creates a 'build/src/include/zeek -> ..' symlink.
The body-lengths of sub-entities, like multipart messages, got counted
twice by mistake: once upon the end of the sub-entity and then again
upon the end of the top-level entity that contains all sub-entities.
The size of just the top-level entity is the correct one to use.