speed up file analysis, remove IncrementByteCount

Avoid creating and recreating count objects for each chunk of file
analyzed.  This replaces counts inside of records with c++ uint64_ts.

On a pcap containing a 100GB file download this gives a 9% speedup

    Benchmark 1 (3 runs): zeek-master/bin/zeek -Cr http_100g_zeroes.pcap tuning/json-logs frameworks/files/hash-all-files
      measurement          mean ± σ            min … max           outliers         delta
      wall_time           102s  ± 1.23s      101s  …  103s           0 ( 0%)        0%
      peak_rss            108MB ±  632KB     107MB …  109MB          0 ( 0%)        0%
      cpu_cycles          381G  ±  862M      380G  …  382G           0 ( 0%)        0%
      instructions        663G  ± 5.16M      663G  …  663G           0 ( 0%)        0%
      cache_references   1.03G  ±  109M      927M  … 1.15G           0 ( 0%)        0%
      cache_misses       12.3M  ±  587K     11.7M  … 12.9M           0 ( 0%)        0%
      branch_misses      1.23G  ± 2.10M     1.22G  … 1.23G           0 ( 0%)        0%
    Benchmark 2 (3 runs): zeek-file_analysis_speedup/bin/zeek -Cr http_100g_zeroes.pcap tuning/json-logs frameworks/files/hash-all-files
      measurement          mean ± σ            min … max           outliers         delta
      wall_time          92.9s  ± 1.85s     91.8s  … 95.1s           0 ( 0%)        -  9.0% ±  3.5%
      peak_rss            108MB ±  393KB     108MB …  109MB          0 ( 0%)          +  0.1% ±  1.1%
      cpu_cycles          341G  ±  695M      341G  …  342G           0 ( 0%)        - 10.4% ±  0.5%
      instructions        605G  ±  626M      605G  …  606G           0 ( 0%)        -  8.7% ±  0.2%
      cache_references    831M  ± 16.9M      813M  …  846M           0 ( 0%)        - 19.6% ± 17.2%
      cache_misses       12.4M  ± 1.48M     11.4M  … 14.1M           0 ( 0%)          +  0.3% ± 20.8%
      branch_misses      1.02G  ± 3.45M     1.02G  … 1.02G           0 ( 0%)        - 16.8% ±  0.5%
This commit is contained in:
Justin Azoff 2025-04-18 14:01:47 -04:00 committed by Justin Azoff
parent 20ada619c5
commit 7f350587b0
3 changed files with 26 additions and 14 deletions

View file

@ -325,6 +325,9 @@ protected:
bool reassembly_enabled; /**< Whether file stream reassembly is needed. */
bool postpone_timeout; /**< Whether postponing timeout is requested. */
bool done; /**< If this object is about to be deleted. */
uint64_t seen_bytes; /**< Number of bytes processed for this file. */
uint64_t missing_bytes; /**< Number of bytes missed for this file. */
uint64_t overflow_bytes; /**< Number of bytes not delivered. */
detail::AnalyzerSet analyzers; /**< A set of attached file analyzers. */
std::list<Analyzer*> done_analyzers; /**< Analyzers we're done with, remembered here until they
can be safely deleted. */