Commit graph

405 commits

Author SHA1 Message Date
Arne Welzel
d7fbd49d9e Merge remote-tracking branch 'origin/topic/vern/zam-record-fields-fixes'
* origin/topic/vern/zam-record-fields-fixes:
  fixes for specialized ZAM operations needing to check whether record fields exist
2025-07-30 10:08:21 +02:00
Vern Paxson
47bf6af6a5 fixes for specialized ZAM operations needing to check whether record fields exist 2025-07-30 08:36:04 +02:00
Tim Wojtulewicz
414728cc71 Switch to using c++20 constraints instead of std::enable_if 2025-07-28 13:03:25 -07:00
Benjamin Bannier
b9eabbabba Bump pre-commit hooks 2025-07-01 10:39:47 +02:00
Arne Welzel
dad5ccd622 Val: Switch TablePatternMatcher to std::string_view
...and add TableVal::LookupPattern(std::string_view sv).
2025-06-30 13:22:31 +02:00
Tim Wojtulewicz
e84c99fb14 Fix clang-tidy cppcoreguidelines-macro-usage warnings in headers 2025-06-23 08:35:24 -07:00
Tim Wojtulewicz
451b25cfad Fix clang-tidy modernize-macro-to-enum warnings in headers 2025-06-23 08:35:24 -07:00
Tim Wojtulewicz
f386deba94 Fix clang-tidy performance-enum-size warnings in headers 2025-06-23 08:35:24 -07:00
Tim Wojtulewicz
975f24bde6 Fix clang-tidy bugprone-suspicious-stringview-data-usage warnings 2025-05-27 11:58:27 -07:00
Justin Azoff
7f350587b0 speed up file analysis, remove IncrementByteCount
Avoid creating and recreating count objects for each chunk of file
analyzed.  This replaces counts inside of records with c++ uint64_ts.

On a pcap containing a 100GB file download this gives a 9% speedup

    Benchmark 1 (3 runs): zeek-master/bin/zeek -Cr http_100g_zeroes.pcap tuning/json-logs frameworks/files/hash-all-files
      measurement          mean ± σ            min … max           outliers         delta
      wall_time           102s  ± 1.23s      101s  …  103s           0 ( 0%)        0%
      peak_rss            108MB ±  632KB     107MB …  109MB          0 ( 0%)        0%
      cpu_cycles          381G  ±  862M      380G  …  382G           0 ( 0%)        0%
      instructions        663G  ± 5.16M      663G  …  663G           0 ( 0%)        0%
      cache_references   1.03G  ±  109M      927M  … 1.15G           0 ( 0%)        0%
      cache_misses       12.3M  ±  587K     11.7M  … 12.9M           0 ( 0%)        0%
      branch_misses      1.23G  ± 2.10M     1.22G  … 1.23G           0 ( 0%)        0%
    Benchmark 2 (3 runs): zeek-file_analysis_speedup/bin/zeek -Cr http_100g_zeroes.pcap tuning/json-logs frameworks/files/hash-all-files
      measurement          mean ± σ            min … max           outliers         delta
      wall_time          92.9s  ± 1.85s     91.8s  … 95.1s           0 ( 0%)        -  9.0% ±  3.5%
      peak_rss            108MB ±  393KB     108MB …  109MB          0 ( 0%)          +  0.1% ±  1.1%
      cpu_cycles          341G  ±  695M      341G  …  342G           0 ( 0%)        - 10.4% ±  0.5%
      instructions        605G  ±  626M      605G  …  606G           0 ( 0%)        -  8.7% ±  0.2%
      cache_references    831M  ± 16.9M      813M  …  846M           0 ( 0%)        - 19.6% ± 17.2%
      cache_misses       12.4M  ± 1.48M     11.4M  … 14.1M           0 ( 0%)          +  0.3% ± 20.8%
      branch_misses      1.02G  ± 3.45M     1.02G  … 1.02G           0 ( 0%)        - 16.8% ±  0.5%
2025-05-09 10:50:04 -04:00
Tim Wojtulewicz
dbb3144e2d Remove unnecessary includes in Val.h 2025-04-14 10:11:13 -07:00
Tim Wojtulewicz
201d4508e6 Make ValFromJSON return zeek::expected instead of a variant 2025-04-14 10:02:35 -07:00
Tim Wojtulewicz
43e3de5c79 Add interval_as_double argument to control how intervals are converted to JSON 2024-12-03 09:26:08 -07:00
Arne Welzel
ec1088c3ef Merge remote-tracking branch 'origin/topic/vern/zam-regularization'
* origin/topic/vern/zam-regularization: (33 commits)
  simpler and more robust identification of function parameters for AST profiling
  fixes to limit AST traversal in the face of recursive types
  address some script optimization compiler warnings under Linux
  fix for -O C++ construction of variable names that use multiple module namespaces
  fix for script optimization of "opaque" values that are run-time constants
  fix for script optimization of nested switch statements
  script optimization fix for complex "in" expressions in conditionals
  updates to typos allow-list reflecting ZAM regularization changes
  BTest updates for ZAM regularization changes
  convert new ZAM operations to use typed operands
  complete migration of ZAM to use only public ZVal methods
  "-O validate-ZAM" option to validate generated ZAM instructions
  internal option to suppress control-flow optimization
  exposing some functionality for greater flexibility in structuring run-time execution
  rework ZAM compilation of type switches to leverage value switches
  add tracking of control flow information
  factoring of ZAM operation specifications into separate files
  updates to ZAM operations / gen-zam regularization, other than the operations themselves
  type-checking fix for vector-of-string operations
  ZVal constructor for booleans
  ...
2024-08-16 12:10:33 +02:00
Vern Paxson
3962810e4b ListVal method to clear the list to allow reusing w/o new construction 2024-08-16 11:18:54 +02:00
Vern Paxson
5d37e6bb5c accessor for smart-pointer version of FileVal's value 2024-08-05 09:12:36 +01:00
Christian Kreibich
a29f862f95 Document the field_escape_pattern in the to_json() BiF
This argument, and its corresponding use in Val.cc's BuildJSON(),
were never explained.
2024-07-02 14:46:16 -07:00
Vern Paxson
1f9fa4304d refine Val "footprint" to equate long strings with multiple objects 2024-04-29 12:39:36 -07:00
Vern Paxson
5311904bb1 removing vestigial same_val() function 2024-04-25 09:15:12 -07:00
Vern Paxson
91cab9931d ZAM optimizations for record creation
includes reworking of managing "auxiliary" information for ZAM instructions
2024-01-25 20:49:12 +01:00
Vern Paxson
bae87fb606 script optimization fixes for "concretizing" vector-of-any's 2024-01-15 15:03:56 +01:00
Vern Paxson
a11ee9038b streamlining of constructing script-level tables 2023-12-16 17:33:46 +01:00
Tim Wojtulewicz
ef5b169acd Add some uses of std::move in constructors and simple functions for pass-by-value arguments 2023-11-28 13:40:28 -07:00
Arne Welzel
cf9afd7b77 TableVal: Replace raw subnets/pattern_matcher with unique_ptr 2023-11-21 11:16:17 +01:00
Arne Welzel
c113b9b297 Expr/Val: Add support for in set[pattern] 2023-11-21 10:34:17 +01:00
Arne Welzel
e39f280e3d zeek.bif: Implement table_pattern_matcher_stats() bif for introspection
Provide a script accessible way to introspect the DFA stats that can be
leveraged to gather runtime statistics of the underlying DFA. This
re-uses the existing MatcherStats used by ``get_matcher_stats()``.
2023-11-21 10:34:17 +01:00
Arne Welzel
501b582bc7 TablePatternMatcher: Use const StringValPtr& instead of const StringVal* 2023-11-21 10:34:16 +01:00
Arne Welzel
c426304c27 Val: Move TablePatternMatcher into detail namespace
There's anyway only prototype in the headers, so detail seems better
than the public zeek namespace.
2023-11-21 10:34:16 +01:00
Vern Paxson
699549eb45 support for indexing "table[pattern] of T" with strings to get multi-matches 2023-11-21 10:34:15 +01:00
Dominik Charousset
cebb85b1e8 Fix unsafe and inefficient uses of copy_string
Add a new overload to `copy_string` that takes the input characters plus
size. The new overload avoids inefficient scanning of the input for the
null terminator in cases where we know the size beforehand. Furthermore,
this overload *must* be used when dealing with input character sequences
that may have no null terminator, e.g., when the input is from a
`std::string_view` object.
2023-11-03 15:25:38 +01:00
Benjamin Bannier
f5a76c1aed Reformat Zeek in Spicy style
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
2023-10-30 09:40:55 +01:00
Arne Welzel
cbaf43e8ea VectorVal: Embed vector_val
Similar motivation as for RecordVal, save an extra malloc/free
and pointer indirection.

This breaks the `auto& RawVec()` API which previously returned
a reference to the std::vector*. It now returns a reference
to the vector instead. It's commented as intended for internal
and compiled code, so even though it's public API,

The previous `std::vector<std::optional<ZVal>>*&` return type was also very
likely not intended (all consumers just dereference it anyhow). I'm certain
this API was never meant to modify the actual pointer value.

I've switched to explicit typing, too.
2023-09-22 21:52:52 +02:00
Arne Welzel
f362935a66 RecordVal: Embed record_val
This should remove one malloc/free per created and destroyed record instance
and avoid one extra pointer indirection to access fields.
2023-09-22 19:43:07 +02:00
Arne Welzel
aaa81cae5d CompositeHash: Skip record initialization when recovering vals
Initializing fields of recovered records caused running &default expression
of fields just so that they are re-assigned in the next step with the
recovered fields. The second test case still shows that the loop var
is initialized as well even though that's not needed.

Add tests for iterating over records with &default attributes for both,
tables and vectors.

Fixes #3267
2023-09-08 13:02:34 +02:00
Arne Welzel
73a7fdad95 TableVal: Unify &default and &default_insert lookups
Introduce DefaultAttr() helper to avoid a bit of duplicated code.
2023-08-04 12:31:27 +02:00
Arne Welzel
431767d04b Add &default_insert attribute for tables
This is based on the discussion in zeek/zeek#2668. Using &default with tables
can be confusing as the default value is not inserted. The following example
prints an empty table at the end even new Service records was instantiated.

    type Service: record {
        occurrences: count &default=0;
        last_seen: time &default=network_time();
    };

    global services: table[string] of Service &default=Service();

    event zeek_init()
        {
        services["http"]$occurrences += 1;
        services["http"]$last_seen = network_time();

        print services;
        }

Changing above &default to &default_insert will insert the newly created
default value upon a missed lookup and act less surprising.

Other examples that caused confusion previously revolved around table of sets
 or table of vectors and `add` or `+=` not working as expected.

    tbl_of_vector["http"] += 1
    add tbl_of_set["http"][1];
2023-08-04 12:30:36 +02:00
Vern Paxson
6b0d595dae avoid constructing TypeList's on-the-fly for ListVal's with fixed types 2023-07-17 16:31:30 -07:00
Tim Wojtulewicz
90d0bc64fa Replace empty destructor bodies with =default definitions 2023-07-07 09:17:05 -07:00
Vern Paxson
1505fd4aa1 added some class accessors/set-ers 2023-06-30 09:36:14 +02:00
Arne Welzel
480d52ca1f from_json: Support function to normalize key names
When a JSON document contains key names containing colons or other
special characters that are not valid in Zeek identifiers, from_json()
cannot be used to parse such input.

This change allows a customizable normalization function.

Closes #3142.
2023-06-29 15:57:49 +02:00
Arne Welzel
1facc34e09 Fixup Val.h/Val.cc: Actually move ValFromJSON into zeek::detail
Lost during merge..
2023-05-09 11:23:32 +02:00
Arne Welzel
264284150b Merge remote-tracking branch 'amazing-pp/topic/fupeng/from_json_bif'
* amazing-pp/topic/fupeng/from_json_bif:
  Implement from_json bif

Minor updates during merge: Moved ValFromJSON into zeek::detail for the
time being, removed gotos, normalized some error messages to lower case,
minimal test extension and added a raw reader input framework test reading
"json lines" as a demo, adding notes about the implicit type
conversions.
2023-05-09 10:36:58 +02:00
Fupeng Zhao
584e68434d Implement from_json bif 2023-05-06 00:42:46 +00:00
Vern Paxson
c19eba62d6 restored RecordType::Create, now marked as deprecated
tidying of namespaces and private class members
simplification of flagging record field initializations that should be skipped
address peculiar MSVC compilation complaint for IntrusivePtr's
2023-04-18 09:41:45 -07:00
Vern Paxson
ee358affda clarifications and tidying for record field initializations 2023-04-15 20:12:49 -07:00
Vern Paxson
0787c130d0 optimize record construction by deferring initializations of aggregates 2023-04-10 11:44:11 -07:00
Vern Paxson
4600ca41f6 logging speedup by switching to raw record access 2023-04-10 11:43:19 -07:00
Tim Wojtulewicz
4f902c0f39 Add configure option for preallocating PortVal objects 2023-03-15 10:12:32 -07:00
Vern Paxson
b7f7d32bf7 Fix for EnumVal's returning their underlying value
Change EnumVal()->AsEnum() to zeek_int_t.
2023-03-08 10:10:24 +01:00
Tim Wojtulewicz
a8fc63e182 Merge remote-tracking branch 'microsoft/master'
* microsoft/master: (71 commits)
  Clang formatting
  Mask ports before inserting them into the map
  Fix compiler warning from applied patch
  Remove statistics plugin in favor of stats bif
  Add EventHandler version of stats plugin
  Mark a few EventHandler methods const
  Changed implementation from std::map to std::unordered_map of Val.cc
  Removed const, Windows build is now working
  Added fixes suggested in PR
  Update src/packet_analysis/protocol/ip/IP.cc
  Apply suggestions from code review
  Clang format again but now with v13.0.1
  Rewrote usages of define(_MSC_VER) to ifdef _MSC_VER
  Clang format it all
  Fixed initial CR comments
  Add NEWS entry about Windows port
  Add a couple of extra unistd.h includes to fix a build failure
  Use std::chrono instead of gettimeofday
  Update libkqueue submodule [nomail]
  Don't call tokenize_string if the input string is empty
  ...
2022-11-11 15:23:21 -07:00