The PktSrc::Stats object works with 64bit unsigned integers. Unfortunately,
libpcap's struct pcap_stat is using 32bit values and users have reported
the wrapping of these values being visible in their stats.log roughly every
7.5 hours (~160kpps).
This change moves tracking of link and drop counters into the PktSrc::Stats
object (like is done for received and bytes_received) and updates them
on a call to PcapSource::Statistics() with the difference to the
previous stats values to prevent the wrap from becoming visible to
script land.
This doesn't cover the case of the stats counters wrapping around multiple
times between two invocations of PktSrc::Statistics(). With the default
interval of 5 minutes for the stats script, this seems acceptable.
Closes#2791.
Testing io_poll_interval_live tweaks with @dopheide-esnet on a Myricom based
system to reduce CPU usage showed no visible effect.
It turns out, the pkt_src->IsLive() call used to update poll_interval is only
valid *after* calling ->Register() with the source. The conditional updating
of the poll_interval introduced in 4fa3e4b9b4
never worked out how it was intended to.
The fix ensures that
* we actually use a poll_interval of 10 in the live case
* changing io_poll_interval_live does have an effect
This is a bit of a major change due to lowering the default poll_interval
by a magnitude, but that seemed to have been the intention always. It's also
tunable via redef, so worst case it can be adapted via configuration.
As reference, with the default a Pcap::non_fd_timeout of 20usec *and* a
poll_interval of 100, theoretically we'd be trying to ask a non-selectable
packet source 500000 per second for a new packet. This is not a likely packet
rate that a single worker would currently observe or manage to process.
Increasing this value 10x has lowered CPU usage on a Myricom based
deployment significantly with reportedly no adverse side-effects.
After reviewing the Zeek 3 IO loop, my hunch is that previously when
no packets were available, we'd sleep 20usec every loop iteration after
calling ->Process() on the packet source. With current master ->Process()
is called 10 times on a packet source before going to sleep just once
for 20 usec. Likely this explains the increased CPU usage reported.
It's probably too risky to increase the current value, so introduce
a const &redef value for advanced users to tweak it. A middle ground
might be to lower ``io_poll_interval_live`` to 5 and increase the new
``Pcap::non_fd_timeout`` setting to 100usec.
While this doesn't really fix#2296, we now have enough knobs for tweaking.
Closes#2296.
This method will be used by the main loop to determine if an interface
has become idle. Initially this will be used to determine when it is
acceptable to update network_time to the current time (wallclock).
This also removes setting pseduo_realtime to 0.0 in the main loop
when the packet source has been closed. I had tried to understand
the implications it actually seems, if we shutdown the iosource::Manager
anyway, it shouldn't and it's just confusing.
This probably should not be changed by users, but it's useful for
testing and experimentation rather than needing to recompile.
Processing 100 packets without checking an FD based IO source can
actually mean that FD based sources are never checked during a read
of a very small pcap...
This would generally happen the next loop iteration around anyway, but
seems nice to ensure a zero timeout source will be processed at the same
time as sources with ready FDs.
Previously, if two iosources returned 0.0 as their timeout, only
one of them would be considered ready. An always ready source
therefore may starve other ready ones due to this and minimally
this behavior seems surprising.
Offline pcap sources are always ready and return 0.0 for
GetNextTimeout() (unless in pseudo-realtime), so we can
also remove the offline source special case.
One subtle side-effect of this change is that if an IO source
returns a 0.0 timeout *and* it's file descriptor is ready in
the same loop iteration, it may be processed twice.
This reworks 2aec7640dd (zeek/zeek#2039) to
avoid calling ExtractNextPacketInternal() within GetNextTimeout() for
the non-pseudo-realtime case. Also relates to zeek/zeek#2842.
The intention of the referenced change was to avoid a 0.00002 timeout when
a non-selectable packet source has more packets queued. This was implemented
by checking for a new packet within GetNextTimeout().
The proposed change switches to an predictive approach: Use the result of
the previous ExtractNextPacket() call (stored as had_packet) as an indication
whether more packets are to be expected.
Calling ExtractNextPacketInternal() within GetNextTimeout() may cause
surprising behavior as some packet source may block [1] or spent a significant
amount of time (e.g. applying BPF filters [2]) within ExtractNextPacket().
The result of GetNextTimeout() should be available immediately as guidance
for the main-loop and the actual work should happen within the ->Process()
method.
This change also attempts to separate the pseudo-realtime logic from the
non-pseudo-realtime in an attempt show pseudo-realtime as special.
[1] 00c4d657e0/src/Napatech.cc (L116)
[2] 58b25c8eba/src/Myricom.cc (L250)
This reverts commit 957825441a, reversing
changes made to c8cdc75f2b.
Caused spurious CI failures in the external testing baselines. See zeek/zeek#2842.
* microsoft/master: (71 commits)
Clang formatting
Mask ports before inserting them into the map
Fix compiler warning from applied patch
Remove statistics plugin in favor of stats bif
Add EventHandler version of stats plugin
Mark a few EventHandler methods const
Changed implementation from std::map to std::unordered_map of Val.cc
Removed const, Windows build is now working
Added fixes suggested in PR
Update src/packet_analysis/protocol/ip/IP.cc
Apply suggestions from code review
Clang format again but now with v13.0.1
Rewrote usages of define(_MSC_VER) to ifdef _MSC_VER
Clang format it all
Fixed initial CR comments
Add NEWS entry about Windows port
Add a couple of extra unistd.h includes to fix a build failure
Use std::chrono instead of gettimeofday
Update libkqueue submodule [nomail]
Don't call tokenize_string if the input string is empty
...
- iosource_mgr can now track write events to file descriptors as well
as read events. This adds an argument to both RegisterFd() and
UnregisterFd() for setting the mode, defaulting to read.
- IOSources can now implement a ProcessFd() method that allows them to
handle events to single file descriptors instead of of having to
loop through/track sets of them at processing time.
PktSrc::GetNextTimeout always returned a fixed timeout of 20 microseconds for non-selectable packet sources regardless of whether they have packets available. This adds unnecessary delay every FindReadySources poll_interval when packets are available to be read.
Instead, for non-selectable packet sources, check whether packets are available and return a timeout of 0 to indicate data is available. This is closer to the behaviour of the old capture loop.
This was mitigated somewhat by the fact FindReadySources poll interval defaults to 100 packets, and live sources are added to the ready list regardless of whether they have packets available (unless it is time to force a poll).