zeek/doc/logs/files.rst
Tim Wojtulewicz ded98cd373 Copy docs into Zeek repo directly
This is based on commit 2731def9159247e6da8a3191783c89683363689c from the
zeek-docs repo.
2025-09-26 02:58:29 +00:00

371 lines
12 KiB
ReStructuredText
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

=========
files.log
=========
One of Zeeks powerful features is the ability to extract content from network
traffic and write it to disk as a file, via its
:ref:`File Analysis framework <file-analysis-framework>`. This is easiest to understand with a
protocol like File Transfer Protocol (FTP), a classic means to exchange files
over a channel separate from that used to exchange commands. Protocols like
HTTP are slightly more complicated, as it includes headers which must be
interpreted and not included in any file content transferred by the protocol.
Zeeks :file:`files.log` is a record of files that Zeek observed while
inspecting network traffic. The existence of an entry in :file:`files.log` does
not mean that Zeek necessarily extracted file content and wrote it to disk.
Analysts must configure Zeek to extract files by type in order to have them
written to disk.
In the following example, an analyst has configured Zeek to extract files of
MIME type ``application/x-dosexec`` and write them to disk. To understand the
chain of events that result in having a file on disk, we will start with the
:file:`conn.log`, progress to the :file:`http.log`, and conclude with the
:file:`files.log`.
The Zeek scripting manual, derived from the Zeek source code, completely
explains the meaning of each field in the :file:`files.log` (and other logs).
It would be duplicative to manually recreate that information in another format
here. Therefore, this entry seeks to show how an analyst would make use of the
information in the :file:`files.log`. Those interested in getting details on
every element of the :file:`files.log` should refer to :zeek:see:`Files::Info`.
Throughout the sections that follow, we will inspect Zeek logs in JSON format.
As we have shown how to access logs like this previously using the command
line, we will only show the log entries themselves.
Inspecting the :file:`conn.log`
===============================
The log with which we begin our analysis for this case is the :file:`conn.log`.
It contains the following entry of interest.
.. literal-emph::
{
"ts": 1596820191.94147,
**"uid": "CzoFRWTQ6YIzfFXHk"**,
**"id.orig_h": "192.168.4.37",**
"id.orig_p": 58264,
**"id.resp_h": "23.195.64.241",**
**"id.resp_p": 80,**
"proto": "tcp",
**"service": "http",**
"duration": 0.050640106201171875,
"orig_bytes": 211,
"resp_bytes": 179604,
"conn_state": "SF",
"missed_bytes": 0,
"history": "ShADadtFf",
"orig_pkts": 93,
"orig_ip_bytes": 5091,
"resp_pkts": 129,
"resp_ip_bytes": 186320
}
We see that ``192.168.4.37`` contacted ``23.195.64.241`` via HTTP and connected
to port 80 TCP. The responder sent 179604 bytes of data during the
conversation.
Because this conversation appears to have taken place using HTTP, a clear text
protocol, there is a good chance that we can directly inspect the HTTP headers
and the payloads that were exchanged.
We will use the UID, ``CzoFRWTQ6YIzfFXHk``, to find corresponding entries in
other log sources to better understand what happened during this conversation.
Inspecting the :file:`http.log`
===============================
We search our :file:`http.log` files for samples containing the UID of interest
and find the following entry:
.. literal-emph::
{
"ts": 1596820191.94812,
**"uid": "CzoFRWTQ6YIzfFXHk",**
"id.orig_h": "192.168.4.37",
"id.orig_p": 58264,
"id.resp_h": "23.195.64.241",
"id.resp_p": 80,
"trans_depth": 1,
**"method": "GET",**
**"host": "download.microsoft.com",**
**"uri": "/download/d/e/5/de5351d6-4463-4cc3-a27c-3e2274263c43/wfetch.exe",**
"version": "1.1",
**"user_agent": "Wget/1.19.4 (linux-gnu)",**
"request_body_len": 0,
"response_body_len": 179272,
**"status_code": 200,**
**"status_msg": "OK",**
"tags": [],
"resp_fuids": [
**"FBbQxG1GXLXgmWhbk9"**
],
"resp_mime_types": [
**"application/x-dosexec"**
]
}
The most interesting elements of this log entry include the following::
"method": "GET",
"host": "download.microsoft.com",
"uri": "/download/d/e/5/de5351d6-4463-4cc3-a27c-3e2274263c43/wfetch.exe",
This shows us what file the client was trying to retrieve, ``wfetch.exe``,
from what site, ``download.microsoft.com``.
The following element shows us the client that made the request::
"user_agent": "Wget/1.19.4 (linux-gnu)",
According to this log entry, the user agent was not a Microsoft product, but
was a Linux version of the :program:`wget` utility. User agent fields can be
manipulated, so we cannot trust that this was exactly what happened. It is
probable however that :program:`wget` was used in this case.
The following entry shows us that the Web server responding positively to the
request::
"status_code": 200,
"status_msg": "OK",
Based on this entry and the amount of bytes transferred, it is likely that the
client received the file it requested.
The final two entries of interest tell us something more about the content that
was transferred and how to locate it::
"resp_fuids": [
"FBbQxG1GXLXgmWhbk9"
],
"resp_mime_types": [
"application/x-dosexec"
The first entry provides a file identifier. This is similar to the connection
identifier in the :file:`conn.log`, except that we use the file identifier to
locate specific file contents when written to disk.
The second entry shows that Zeek recognized the file content as
``application/x-dosexec``, which likely means that the client retrieved a
Windows executable file.
Inspecting the :file:`files.log`
================================
Armed with the file identifier value, we can search any of our
:file:`files.log` repositories for matching values. By searching for the FUID
of ``FBbQxG1GXLXgmWhbk9`` we find the following entry.
.. literal-emph::
{
"ts": 1596820191.969902,
**"fuid": "FBbQxG1GXLXgmWhbk9",**
"uid": "CzoFRWTQ6YIzfFXHk",
"id.orig_h": "192.168.4.37",
"id.orig_p": 58264,
"id.resp_h": "23.195.64.241",
"id.resp_p": 80,
"source": "HTTP",
"depth": 0,
"analyzers": [
"EXTRACT",
"PE"
],
**"mime_type": "application/x-dosexec",**
"duration": 0.015498876571655273,
"is_orig": false,
"seen_bytes": 179272,
"total_bytes": 179272,
"missing_bytes": 0,
"overflow_bytes": 0,
"timedout": false,
**"extracted": "HTTP-FBbQxG1GXLXgmWhbk9.exe",**
"extracted_cutoff": false
}
Note that this :file:`files.log` entry also contains the UID we found in the
:file:`conn.log`, e.g., ``CzoFRWTQ6YIzfFXHk``. Theoretically we could have just
searched for that UID value and not bothered to locate the FUID in the
:file:`http.log`. However, I find that it makes sense to follow this sort of
progression, as we cannot rely on this same analytical workflow for all cases.
In this :file:`files.log` data, we see that the ``EXTRACT`` and ``PE`` analyzer
events were activated. Zeek saw 179272 bytes transferred and does not appear to
have missed any bytes. Zeek extracted the file it saw as
``HTTP-FBbQxG1GXLXgmWhbk9.exe``, which means we should be able to locate that
file on disk.
The ``is_orig`` field in a :file:`files.log` entry can be used to determine
which endpoint sent the file. When ``is_orig`` is ``false``, the responder of
the connection is sending the file. In the example above we can tell that
the HTTP server at ``23.195.64.241`` is sending the file and ``192.168.4.37``
is receiving it.
Inspecting the Extracted File
=============================
The location for extracted files will vary depending on your Zeek
configuration. In my example, Zeek wrote extracted files to a directory called
:file:`extract_files/`. Here is the file in question:
.. code-block:: console
$ ls -al HTTP-FBbQxG1GXLXgmWhbk9.exe
::
-rw-rw-r-- 1 zeek zeek 179272 Aug 7 17:23 HTTP-FBbQxG1GXLXgmWhbk9.exe
Note the byte count, 179272, matches the value in the :file:`files.log`.
Here is what the Linux file command thinks of this file.
.. code-block:: console
$ file HTTP-FBbQxG1GXLXgmWhbk9.exe
::
HTTP-FBbQxG1GXLXgmWhbk9.exe: PE32 executable (GUI) Intel 80386, for MS Windows, MS CAB-Installer self-extracting archive
This looks like a Windows executable. You can use the :program:`md5sum` utility to
generate a MD5 hash of the file.
.. code-block:: console
$ md5sum HTTP-FBbQxG1GXLXgmWhbk9.exe
::
6711727adf76599bf50c9426057a35fe HTTP-FBbQxG1GXLXgmWhbk9.exe
We can search by the hash value on VirusTotal using the :program:`vt` command
line tool, provided we have registered and initialized :program:`vt` with our
free API key.
.. code-block:: console
$ ./vt file 6711727adf76599bf50c9426057a35fe
::
- _id: "82f39086658ce80df4da6a49fef9d3062a00fd5795a4dd5042de32907bcb5b89"
_type: "file"
authentihash: "2a07d356273d32bf0c5aff83ea847351128fc3971b44052f92b6fb4f45c2272f"
creation_date: 1030609542 # 2002-08-29 08:25:42 +0000 UTC
first_submission_date: 1354191312 # 2012-11-29 12:15:12 +0000 UTC
last_analysis_date: 1592215708 # 2020-06-15 10:08:28 +0000 UTC
last_analysis_results:
ALYac:
category: "undetected"
engine_name: "ALYac"
engine_update: "20200615"
engine_version: "1.1.1.5"
method: "blacklist"
...edited…
last_analysis_stats:
confirmed-timeout: 0
failure: 0
harmless: 0
malicious: 0
suspicious: 0
timeout: 0
type-unsupported: 2
undetected: 74
last_modification_date: 1592220693 # 2020-06-15 11:31:33 +0000 UTC
last_submission_date: 1539056691 # 2018-10-09 03:44:51 +0000 UTC
magic: "PE32 executable for MS Windows (GUI) Intel 80386 32-bit"
md5: "6711727adf76599bf50c9426057a35fe"
meaningful_name: "WEXTRACT.EXE"
names:
- "Wextract"
- "WEXTRACT.EXE"
- "wfetch.exe"
- "583526"
packers:
F-PROT: "CAB, ZIP"
PEiD: "Microsoft Visual C++ v6.0 SPx"
pe_info:
entry_point: 23268
imphash: "1494de9b53e05fc1f40cb92afbdd6ce4"
import_list:
- imported_functions:
- "GetLastError"
- "IsDBCSLeadByte"
- "DosDateTimeToFileTime"
- "ReadFile"
- "GetStartupInfoA"
- "GetSystemInfo"
- "lstrlenA"
...edited...
size: 179272
ssdeep: "3072:BydJq5oyVzs+h0Jk5irDStDD5QOsP0CLRQq8ZZ3xlf/AQnFlFuKIUaKJH:UW2+AiDWOsPxQq8HHf/A07namH"
tags:
- "invalid-signature"
- "peexe"
- "signed"
- "overlay"
times_submitted: 33
total_votes:
harmless: 1
malicious: 0
trid:
- file_type: "Microsoft Update - Self Extracting Cabinet"
probability: 46.3
- file_type: "Win32 MS Cabinet Self-Extractor (WExtract stub)"
probability: 41.4
- file_type: "Win32 Executable MS Visual C++ (generic)"
probability: 4.2
- file_type: "Win64 Executable (generic)"
probability: 3.7
- file_type: "Win16 NE executable (generic)"
probability: 1.9
type_description: "Win32 EXE"
type_tag: "peexe"
unique_sources: 24
vhash: " size: 179272
ssdeep: "3072:BydJq5oyVzs+h0Jk5irDStDD5QOsP0CLRQq8ZZ3xlf/AQnFlFuKIUaKJH:UW2+AiDWOsPxQq8HHf/A07namH"
tags:
- "invalid-signature"
- "peexe"
- "signed"
- "overlay"
times_submitted: 33
total_votes:
harmless: 1
malicious: 0
trid:
- file_type: "Microsoft Update - Self Extracting Cabinet"
probability: 46.3
- file_type: "Win32 MS Cabinet Self-Extractor (WExtract stub)"
probability: 41.4
- file_type: "Win32 Executable MS Visual C++ (generic)"
probability: 4.2
- file_type: "Win64 Executable (generic)"
probability: 3.7
- file_type: "Win16 NE executable (generic)"
probability: 1.9
type_description: "Win32 EXE"
type_tag: "peexe"
unique_sources: 24
vhash: "0150366d1570e013z1004cmz1f03dz"
You can access the entire report `via the Web here
<https://www.virustotal.com/gui/file/82f39086658ce80df4da6a49fef9d3062a00fd5795a4dd5042de32907bcb5b89/detection>`_.
It appears this is a harmless Windows executable. However, by virtue of having
it extracted from network traffic, analysts have many options for investigation
when the file is not considered benign.
Conclusion
==========
Zeeks file extraction capabilities offer many advantages to analysts.
Administrators can configure Zeek to compute MD5 hashes of files that Zeek sees
in network traffic. Rather than computing a hash on a file written to disk,
Zeek could simply compute the hash as part of its inspection process. The
purpose of this section was to show some of the data in the :file:`files.log`,
how it relates to other Zeek logs, and how analysts might make use of it.