zeek/doc/monitoring.rst

====================
Monitoring With Zeek
====================

Detection and Response Workflow
===============================

As noted in the previous sections, Zeek is optimized, more or less “out of the
box,” to provide two of the four types of network security monitoring data.
Without any major configuration, Zeek offers transaction data and extracted
content data, in the form of logs summarizing protocols and files seen
traversing the wire. Zeek can also provide some degree of alert data in the
form of notices, and analysts can modify Zeek to create custom alerts if
desired. A dedicated intrusion detection engine like Suricata or Snort might be
more appropriate, however. Finally, Zeek does not collect full content data in
pcap format, although other open source projects do provide that functionality.

Broadly speaking, incident detection and response begins with the collection of
security data, followed by its analysis. In the analysis phase, in the absence
of an explicit alert of malicious activity, investigators can work two broad
investigative categories: “matching” and “hunting.” Matching means querying and
reviewing security data for signs of known indicators of compromise. Hunting
means working without indicators of compromise, relying instead on creating a
hypothesis of how adversary activity might manifest in security data. Matching
is the sort of activity that can be easily automated. Hunting is an activity
that is difficult to automate because it relies upon the creation of a cyber
security “experiment” to yield results and often a little bit of human
intuition.

In the common vernacular, some security teams believe hunting involves querying
data for indicators of compromise. That is really just a search function, i.e.,
looking for matches of “expected bad” in collected data. True hunting involves
more of a scientific method that requires formulating a hypothesis, testing the
hypothesis in sample and production data, and then refining the process until
it yields results or is disproved. Investigative methods which yield results
Zeek data plays a role in matching or hunting operations. Analysts may query a
store of Zeek transaction logs for indicators of compromise, and begin a
security investigation when they see a match on an IP address, or username, or
HTTP user-agent string, or any single or combination of the hundreds of
elements Zeek derives from network traffic. Analysts can also pose a hypothesis
of how certain adversary behavior may appear in Zeek data, and then query that
data for signs that prove or disprove their hypothesis.

Beyond the matching and hunting paradigms, analysts can use Zeek within an
“incident detection alert” workflow. In this scenario, an IDS creates an alert
that catches the attention of a security team member. Because IDS alerts are
often light on details, analysts require corroborating data to decide if the
alert represents normal, suspicious, or malicious activity. Analysts can
“pivot” from the IDS alert to a variety of logs generated by Zeek. If the IDS
alert provides the community identification (community ID) supported by Zeek,
the analyst can easily tie the IDS alert to specific Zeek logs. Based on the
data provided by Zeek, analysts may be able to resolve the incident. At the
very least, the analyst can accelerate the alert validation and verification
process by having access to data beyond the initial IDS notification.

Finally, analysts can use Zeek data to improve the validation process when
prompted by any other external stimulus. For example, an analyst might notice
an odd process running on a system, as reported by their endpoint detection and
response (EDR) or anti-virus agent. Alternatively, an analyst might receive a
report from a user or a peer involving suspicious activity on an
Internet-facing Web server. In either case, the analyst with access to Zeek
data can seek to learn all they can about the systems in question, simply by
querying the repository storing their Zeek logs. This security design pattern
has immense benefits, as it does not affect the end state of the suspicious
asset. Not touching a system that may be compromised has two benefits. First,
an intruder who has compromised the asset remains unawares that the security
team is investigating it. Second, the forensic integrity of the asset remains
intact, as the analyst is working with logs stored off-device.

Instrumentation and Collection
==============================

Zeek is designed to watch live network traffic. Although Zeek can process
packet captures saved in PCAP format, most users deploy Zeek to gain
near-real-time insights into network usage patterns. Administrators run Zeek
by telling it to “sniff” one or more network interfaces, generating
transaction logs, insights, and extracted file contents, based on the network
traffic seen on those network interfaces.

Some users may choose to run Zeek on a single computer used for general
computing purposes, watching network traffic to and from that single
computer. That system might be an office laptop used for business purposes,
chosen for experimentation with Zeek. This is a simple way to become familiar
with the logs that Zeek creates. This approach is similar to running Tcpdump
or Wireshark on one’s computer for the same educational purposes.

Most users, however, run Zeek on a computer selected solely for the purpose
of network security monitoring. Security personnel call that computer a
“sensor” and they select, configure, and deploy it specifically to watch
network traffic. They select a location in an environment that offers
visibility to multiple computers, and deploy the sensor with Zeek to
instrument that network segment.

When choosing a place to deploy a sensor, users will likely prioritize a
requirement like the following:

Identify a single location in the network to instrument with a network tap or
switch span port that provides the maximum visibility. This means seeing
traffic from all devices on the network, with a strong preference for
identifying devices by observing them with their original source IP address.

Users new to Zeek may choose to try Zeek in their home or in a small office
environment. Figure 1 depicts the standard SOHO network architecture. Letters
A-D are possible monitoring locations, to be discussed below.

.. figure:: /images/collection-figure1.png

   Figure 1: Standard SOHO Architecture

Most home users and many small office environments are connected to the
Internet via customer premise equipment (CPE) provided by their Internet
service provider (ISP). This box may or may not be available or visible to
the customer. In the context of a system like Verizon FIOS, for example, the
ISP CPE is the box attached to the outside of a residence, with a warning
that only Verizon technicians should open it. For fiber connectivity, the ISP
might call this device an Optical Network Terminal or ONT.

The ISP also provides a gateway device that provides routing and wireless
access point (WAP) functionality. This is the piece of equipment familiar to
most home and small office users. It typically has a gigabit copper Ethernet
connection that connects to the ISP CPE, on its wide area network (WAN) side,
and four gigabit copper Ethernet ports for devices on its local area network
(LAN) side. Customer devices gain network access via WiFi to the ISP WAP or
via copper Ethernet cables to the embedded switch on the same device.

On the WAN side of the router, the device usually has a public IP address
provided by the ISP. This may not necessarily be the case, however. On the
LAN side of the router, the device provides RFC 1918 private addresses, often
in the 192.168.0.0/16 subnet. The router acts as a gateway, using network
address translation (NAT), or for the more strictly minded, network port
address translation (NPAT), so that client devices share a single IP address
provided by the ISP. (Note that in some situations, multiple residences even
share the same public IP address, and differentiate between each other via
the port range. We'll not consider this further for now, as it is extraneous
to the discussion.)

Where does one monitor, given this architecture?

Location A is off limits to the customer. It is likely a cable exiting the
ISP CPE and entering the ground.

Location B is a possibility, assuming the cable between the ISP CPE and
router is a copper Ethernet cable. One could insert a reliable network tap
(typically outside the home user’s budget) or a decent small managed switch
with a span port (like a Netgear GS30Xe model).

However, and this is crucial: because of the NAT done by the router, all
traffic will appear to originate from a single IP address. Whether the
customer has 100 devices or 1 device, they will all share the single IP
address. This reality makes it much more difficult for a security analyst to
track down the originator of suspicious or malicious network traffic.

Location C is essentially not possible. Yes, there are various penetration
testing tools and wireless network troubleshooting tools that can try to
access WiFi traffic. However, they do not expose the traffic in a form usable
to security analysts, assuming that the WiFi protocols in use are at all
modern.

Location D is a possibility, assuming that the user installs a network tap or
switch span port as in location B. However, monitoring only at location D
would ignore WiFi traffic.

In other words, the standard SOHO network architecture is not well-suited for
network security monitoring, because there isn’t a good place, by default, to
see the originating IP addresses, which are generally needed to investigate
suspicious and malicious activity.

In contrast, the Visible Network Architecture shown in Figure 2 depicts the
sort of setup one needs if visibility is designed into the architecture,
rather than added as an afterthought.

.. figure:: /images/collection-figure2.png

   Figure 2: Visible Network Architecture

The major changes include the following:

The ISP router is no longer also acting as a WAP. The WiFi capability is
disabled. No other changes are required on the router. Strictly speaking,
WiFi need not be disabled, so long as no one uses it.

The customer has purchased her own router. That device may or may not also
provide NAT.

The customer explicitly owns a switch, to which wired devices may connect.
That switch has a span port.

The customer explicitly owns her own wireless access point, acting as a
bridge, and not offering NAT.

Don’t be fooled into thinking that one need only buy a new combination
router/WAP. It’s essential to split these functions. Consumer-grade customer
routers do not offer span ports, which cheap consumer-grade network switches
do. This architecture takes advantage of that fact in order to provide
suitable monitoring locations.

Let’s review the options.

Location A is still off-limits.

Location B is still a bad idea.

Location C is a good option, if one places a network tap here, or another
small switch with a span port, and neither the customer router nor customer
WAP is doing NAT.

Location D is a better option. Now one need only ensure that the customer WAP
is not doing NAT. In fact, one need not introduce another switch or tap here,
assuming one can span the uplink port on the customer switch.

Location E would only see wired devices, and is not a good option because it
ignores WiFi devices.

Location F would only see WiFi devices, and is not a good option because it
ignores wired devices.

Location G is essentially impossible, as with Figure 1.

The bottom line is that the location D is the best monitoring location,
assuming that the customer WAP is not doing NAT. If the customer WAP is
acting as a router with NAT, then all of the wireless devices will have the
same source IP address as seen in location D.

In an architecture designed for visibility, introducing a network tap, or
simply spanning the uplink from the network switch, at point D, satisfies the
visibility requirement.

It is possible to simplify the architecture shown in figure 2 to that which
follows:

.. figure:: /images/collection-figure3.png

   Figure 3: Simplified Visible Network Architecture

The customer router between monitoring points C and D is gone, as one can
rely upon the ISP router if so desired.

In summary, one could deploy a Zeek sensor at location D, or C, if the
simplified architecture is in place, as C and D are logically similar. Going
forward, we'll discuss monitoring at location D.

Gaining access to traffic at point D requires either a span port to be
enabled on the customer switch, or a network tap to be deployed at location
D. Professional Zeek users prefer high-quality, powered network taps wherever
possible, for a variety of reasons. When they are not available, as in the
case of a SOHO or test environment, then a span port on a managed switch is
an acceptable alternative.

Once the network tap or span port is providing network traffic to the Zeek
sensor, one can turn to matters beyond instrumentation and collection.

Storage and Review
==================

As Zeek ingests network traffic, either by monitoring one or more live
network interfaces or by processing stored traffic in a capture file, it
creates a variety of logs and other artifacts. By default Zeek writes that
data to a storage location designated via its configuration files. Zeek
possesses the capability to write the logs in several formats and perform
certain log management processes like compression and archiving.

Analysts make use of Zeek data by reviewing the logs it generates. Review
methods can be as simple as using text processing tools packaged with the
underlying operating system. Depending on the format of the logs, users may
apply more specialized processing tools, some of which are available with
Zeek. In many cases, Zeek administrators ship logs to specialized storage and
review applications. These are usually referred to collectively as Security
and Information Event Management (SIEM) platforms. Some of these log
management and SIEM platforms are available as open source offerings, while
others are commercially available.