Commit graph

72 commits

Author SHA1 Message Date
Tim Wojtulewicz
0ec2161b04 Add options to filter at the stream level as well as globally 2025-08-12 17:31:28 -07:00
Tim Wojtulewicz
837fde1a08 Add metrics to track string and container fields limited by length 2025-08-12 17:31:28 -07:00
Tim Wojtulewicz
cd74a4e138 Replace unused stream argument from RecordToLogRecord with WriterInfo
This also adds a WriterInfo argument to ValToLogVal and passes the one from
RecordToLogRecord into it.
2025-08-12 17:31:28 -07:00
Tim Wojtulewicz
e2e7ab28da Implement string- and container-length filtering at the log record level 2025-08-12 17:31:28 -07:00
Tim Wojtulewicz
e458da944f Return weird if a log line is over a configurable size limit 2025-07-21 09:14:52 -07:00
Tim Wojtulewicz
f386deba94 Fix clang-tidy performance-enum-size warnings in headers 2025-06-23 08:35:24 -07:00
Arne Welzel
ab1d48c95a logging/Manager: Implement new WriteBatchFromRemote() 2024-12-04 12:40:35 +01:00
Arne Welzel
78999d147d logging/Manager: Extract another CreateWriter() helper
For other cluster backends, CreateWriter() will use a logger's filter
configuration rather than receiving all configuration through CreateLog.
Extract a helper out from WriteToFilters() for reuse.
2024-09-27 15:32:09 +02:00
Arne Welzel
0d925e935e logging: Dedicated log flush timer
Log flushing is currently triggered based on the threading heartbeat timer
of WriterBackends and the hard-coded WRITE_BUFFER_SIZE 1000.

This change introduces a separate timer that is managed by the logger
manager instead of piggy-backing on the heartbeat timer, as well as a
const &redef for the buffer size.

This allows to modify the log flush frequency and batch size independently
of the threading heartbeat interval. Later, this will allow to re-use the
buffering and flushing logic of writer frontends for non-Broker cluster
backends, too.

One change here is that even frontends that do not have a backend will
be flushed regularly. This is wanted for non-Broker backends and should be
very cheap. Possibly, Broker can piggy back on this timer down the road, too,
rather than using its own script-level timer (see Broker::log_flush()).
2024-09-27 15:30:35 +02:00
Arne Welzel
245fd0c94f broker/logging: Change threading::Value** usage std::vector instead
This allows to leverage automatic memory management, less allocations
and using move semantics for expressing ownership.

This breaks the existing logging and broker API, but keeps the plugin
DoWrite() and HookLogWrite() methods functioning.

It further changes ValToLogVal to return a threading::Value rather than
a threading::Value*. The vector_val and set_val fields unfortunately
use the same pointer-to-array-of-pointers approach. this can'tbe changed
as it'd break backwards compatibility for plugin provided input readers
and log writers.
2024-08-30 10:58:57 +02:00
Tim Wojtulewicz
46ff48c29a Change all instruments to only handle doubles 2024-05-31 13:36:37 -07:00
Tim Wojtulewicz
a0ae06b3cd Convert telemetry code to use prometheus-cpp 2024-05-31 13:30:31 -07:00
Arne Welzel
ee65623600 logging/Manager: Make LogDelayExpiredTimer an implementation detail
The only reason this was a private component of Manager was to access
the Stream's function. Use a generic callback and a lambda to avoid
that exposure.
2023-11-30 12:25:49 +01:00
Arne Welzel
2dbb467ba2 logging: Implement get_delay_queue_size()
Primarily for introspection given that re-delaying may exceed
queue sizes.
2023-11-29 11:53:11 +01:00
Arne Welzel
f0e67022fd logging: Introduce Log::delay() and Log::delay_finish()
This is a verbose, opinionated and fairly restrictive version of the log delay idea.
Main drivers are explicitly, foot-gun-avoidance and implementation simplicity.

Calling the new Log::delay() function is only allowed within the execution
of a Log::log_stream_policy() hook for the currently active log write.

Conceptually, the delay is placed between the execution of the global stream
policy hook and the individual filter policy hooks. A post delay callback
can be registered with every Log::delay() invocation. Post delay callbacks
can (1) modify a log record as they see fit, (2) veto the forwarding of the
log record to the log filters and (3) extend the delay duration by calling
Log::delay() again. The last point allows to delay a record by an indefinite
amount of time, rather than a fixed maximum amount. This should be rare and
is therefore explicit.

Log::delay() increases an internal reference count and returns an opaque
token value to be passed to Log::delay_finish() to release a delay reference.
Once all references are released, the record is forwarded to all filters
attached to a stream when the delay completes.

This functionality separates Log::log_stream_policy() and individual filter
policy hooks. One consequence is that a common use-case of filter policy hooks,
removing unproductive log records, may run after a record was delayed. Users
can lift their filtering logic to the stream level (or replicate the condition
before the delay decision). The main motivation here is that deciding on a
stream-level delay in per-filter hooks is too late. Attaching multiple filters
to a stream can additionally result in hard to understand behavior.

On the flip side, filter policy hooks are guaranteed to run after the delay
and can be used for further mangling or filtering of a delayed record.
2023-11-29 11:53:11 +01:00
Arne Welzel
3afd6242c7 logging/Manager: Split Write()
If we delay in the stream policy hook, we'll need to resume writing
to the attached filters later on. Prepare for that by splitting out
the filter processing.
2023-11-29 11:53:11 +01:00
Benjamin Bannier
f5a76c1aed Reformat Zeek in Spicy style
This largely copies over Spicy's `.clang-format` configuration file. The
one place where we deviate is header include order since Zeek depends on
headers being included in a certain order.
2023-10-30 09:40:55 +01:00
Vern Paxson
4600ca41f6 logging speedup by switching to raw record access 2023-04-10 11:43:19 -07:00
Arne Welzel
69a98e2cbb logging: Add telemetry for streams and log writers
This adds one metric per log stream and one metric per log writer (path based)
to track the number of writes on a stream level as well as on a writer level.

    $ curl -sSf localhost:8181/metrics | grep Conn
    zeek_log_writer_writes_total{endpoint="",filter-name="default",module="HTTP",path="http",stream="HTTP::LOG",writer="Log::WRITER_SQLITE"} 1 1677497572770
    zeek_log_stream_writes_total{endpoint="",module="HTTP",stream="HTTP::LOG"} 1 1677497572770

The initial version of this change also included metrics around log
write vetoes, but given no log policies exist in the default configuration
and they are mostly interesting for a few streams/writers only, skip this
for now. These can always be added by the script writer, too.

The difference between the stream level writes and concrete writers can
be used to deduce the number of vetoes (or errors) as a starting point.
2023-02-27 12:51:03 +01:00
Josh Soref
cd201aa24e Spelling src
These are non-functional changes.

* accounting
* activation
* actual
* added
* addresult
* aggregable
* aligned
* alternatively
* ambiguous
* analysis
* analyzer
* anticlimactic
* apparently
* application
* appropriate
* arithmetic
* assignment
* assigns
* associated
* authentication
* authoritative
* barrier
* boundary
* broccoli
* buffering
* caching
* called
* canonicalized
* capturing
* certificates
* ciphersuite
* columns
* communication
* comparison
* comparisons
* compilation
* component
* concatenating
* concatenation
* connection
* convenience
* correctly
* corresponding
* could
* counting
* data
* declared
* decryption
* defining
* dependent
* deprecated
* detached
* dictionary
* directional
* directly
* directory
* discarding
* disconnecting
* distinguishes
* documentation
* elsewhere
* emitted
* empty
* endianness
* endpoint
* enumerator
* essentially
* evaluated
* everything
* exactly
* execute
* explicit
* expressions
* facilitates
* fiddling
* filesystem
* flag
* flagged
* for
* fragments
* guarantee
* guaranteed
* happen
* happening
* hemisphere
* identifier
* identifies
* identify
* implementation
* implemented
* implementing
* including
* inconsistency
* indeterminate
* indices
* individual
* information
* initial
* initialization
* initialize
* initialized
* initializes
* instantiate
* instantiated
* instantiates
* interface
* internal
* interpreted
* interpreter
* into
* it
* iterators
* length
* likely
* log
* longer
* mainly
* mark
* maximum
* message
* minimum
* module
* must
* name
* namespace
* necessary
* nonexistent
* not
* notifications
* notifier
* number
* objects
* occurred
* operations
* original
* otherwise
* output
* overridden
* override
* overriding
* overwriting
* ownership
* parameters
* particular
* payload
* persistent
* potential
* precision
* preexisting
* preservation
* preserved
* primarily
* probably
* procedure
* proceed
* process
* processed
* processes
* processing
* propagate
* propagated
* prototype
* provides
* publishing
* purposes
* queue
* reached
* reason
* reassem
* reassemble
* reassembler
* recommend
* record
* reduction
* reference
* regularly
* representation
* request
* reserved
* retrieve
* returning
* separate
* should
* shouldn't
* significant
* signing
* simplified
* simultaneously
* single
* somebody
* sources
* specific
* specification
* specified
* specifies
* specify
* statement
* subdirectories
* succeeded
* successful
* successfully
* supplied
* synchronization
* tag
* temporarily
* terminating
* that
* the
* transmitted
* true
* truncated
* try
* understand
* unescaped
* unforwarding
* unknown
* unknowndata
* unspecified
* update
* usually
* which
* wildcard

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2022-11-09 12:08:15 -05:00
Vern Paxson
d758585e42 updated Bro->Zeek in comments in the source tree 2022-01-24 14:26:20 -08:00
Tim Wojtulewicz
d50dade24c GH-1768: Properly cleanup existing log stream when recreated on with the same ID 2021-12-03 13:46:28 -07:00
Tim Wojtulewicz
331161138a Unify all of the Tag types into one type
- Remove tag types for each component type (analyzer, etc)
- Add deprecated versions of the old types
- Remove unnecessary tag element from templates for TaggedComponent and ComponentManager
- Enable TaggedComponent to pass an EnumType when initializing Tag objects
- Update some tests that are affected by the tag enum values changing order
2021-11-23 19:36:49 -07:00
Tim Wojtulewicz
b2f171ec69 Reformat the world 2021-09-16 15:35:39 -07:00
Christian Kreibich
795a7ea98e Add a global log policy hook to the logging framework
This addresses the need for a central hook on any log write, which
wasn't previously doable without a lot of effort. The log manager
invokes the new Log::log_stream_policy hook prior to any filter-specific
hooks. Like filter-level hooks, it may veto a log write. Even when
it does, filter-level hooks still get invoked, but cannot "un-veto".

Includes test cases.
2021-07-02 12:42:45 -07:00
Tim Wojtulewicz
4ad08172d0 Remove obsolete ZEEK_FORWARD_DECLARE_NAMESPACED macros 2021-02-24 14:35:44 -07:00
Tim Wojtulewicz
0618be792f Remove all of the random single-file deprecations
These are the changes that don't require a ton of changes to other files outside
of the original removal.
2021-01-27 10:52:40 -07:00
Tim Wojtulewicz
96d9115360 GH-1079: Use full paths starting with zeek/ when including files 2020-11-12 12:15:26 -07:00
Tim Wojtulewicz
fe0c22c789 Base: Clean up explicit uses of namespaces in places where they're not necessary.
This commit covers all of the common and base classes.
2020-08-24 12:07:00 -07:00
Tim Wojtulewicz
4b61d60e80 Fix indentation of namespaced aliases 2020-08-20 16:11:46 -07:00
Tim Wojtulewicz
45b5c6e619 Move logging code to zeek namespaces 2020-08-20 15:55:17 -07:00
Tim Wojtulewicz
c9ab1f93e7 Move a few low-use classes to namespaces 2020-07-31 16:25:47 -04:00
Jon Siwek
a06ef66edc Add Log::rotation_format_func and Log::default_rotation_dir options
These may be redefined to customize log rotation path prefixes,
including use of a directory.  File extensions are still up to
individual log writers to add themselves during the actual rotation.

These new also allow for some simplication to the default
ASCII postprocessor function: it eliminates the need for it doing an
extra/awkward rename() operation that only changes the timestamp format.

This also teaches the supervisor framework to use these new options
to rotate ascii logs into a log-queue/ directory with a specific
file name format (intended for an external archiver process to
monitor separately).
2020-07-07 18:42:37 -07:00
Jon Siwek
11949ce37a Implement leftover log rotation/archival for supervised nodes
This helps prevent a node from being killed/crashing in the middle
of writing a log, restarting, and eventually clobbering that log
file that never underwent the rotation/archival process.

The old `archive-log` and `post-terminate` scripts as used by
ZeekControl previously implemented this behavior, but the new logic is
entirely in the ASCII writer.  It uses ".shadow" log files stored
alongside the real log to help detect such scenarios and rotate them
correctly upon the next startup of the Zeek process.
2020-07-07 18:39:23 -07:00
Tim Wojtulewicz
64332ca22c Move all Val classes to the zeek namespaces 2020-06-30 20:48:09 -07:00
Tim Wojtulewicz
137e416a03 Rename BroType to Type 2020-06-10 14:27:36 -07:00
Tim Wojtulewicz
ed13972924 Move Type types to zeek namespace 2020-06-09 17:20:45 -07:00
Johanna Amann
876c803d75 Merge remote-tracking branch 'origin/topic/timw/776-using-statements'
* origin/topic/timw/776-using-statements:
  Remove 'using namespace std' from SerialTypes.h
  Remove other using statements from headers
  GH-776: Remove using statements added by PR 770

Includes small fixes in files that changed since the merge request was
made.

Also includes a few small indentation fixes.
2020-04-09 13:31:07 -07:00
Tim Wojtulewicz
cb01e098df iosource/threading/input/logging: Replace nulls with nullptr 2020-04-07 16:08:34 -07:00
Tim Wojtulewicz
d53c1454c0 Remove 'using namespace std' from SerialTypes.h
This unfortunately cuases a ton of flow-down changes because a lot of other
code was depending on that definition existing. This has a fairly large chance
to break builds of external plugins, considering how many internal ones it broke.
2020-04-07 15:59:59 -07:00
Tim Wojtulewicz
5a237d3a3f Use const-references in lots of places (preformance-unnecessary-value-param) 2020-02-11 14:11:18 -08:00
Max Kellermann
0db61f3094 include cleanup
The Zeek code base has very inconsistent #includes.  Many sources
included a few headers, and those headers included other headers, and
in the end, nearly everything is included everywhere, so missing
#includes were never noticed.  Another side effect was a lot of header
bloat which slows down the build.

First step to fix it: in each source file, its own header should be
included first to verify that each header's includes are correct, and
none is missing.

After adding the missing #includes, I replaced lots of #includes
inside headers with class forward declarations.  In most headers,
object pointers are never referenced, so declaring the function
prototypes with forward-declared classes is just fine.

This patch speeds up the build by 19%, because each compilation unit
gets smaller.  Here are the "time" numbers for a fresh build (with a
warm page cache but without ccache):

Before this patch:

 3144.94user 161.63system 3:02.87elapsed 1808%CPU (0avgtext+0avgdata 2168608maxresident)k
 760inputs+12008400outputs (1511major+57747204minor)pagefaults 0swaps

After this patch:

 2565.17user 141.83system 2:25.46elapsed 1860%CPU (0avgtext+0avgdata 1489076maxresident)k
 72576inputs+9130920outputs (1667major+49400430minor)pagefaults 0swaps
2020-02-04 20:51:02 +01:00
Dominik Charousset
c1f3fe7829 Switch from header guards to pragma once 2019-09-17 14:10:30 +02:00
Johanna Amann
dcd6454530 Remove RemoteSerializer and related code/types.
Also removes broccoli from the source tree.
2019-05-03 15:00:13 -07:00
Robin Sommer
fe7e1ee7f0 Merge topic/actor-system throug a squashed commit. 2018-05-18 22:39:23 +00:00
Johanna Amann
1f2bf50b49 Remove unimplemented & unused functions from header files.
All of these functions were defined in header files without ever being
implemented or used.
2018-03-16 18:38:04 -07:00
Robin Sommer
5cf7803e68 Fix some minor issues.
From Daniel, thanks!
2017-02-23 17:18:43 -08:00
Robin Sommer
511ca9e043 Adding Broker ifdefs for new remote logging code. 2017-02-17 16:28:20 -08:00
Robin Sommer
a5e9a535a5 Changing semantics of Broker's remote logging to match old communication framework.
Broker had changed the semantics of remote logging: it sent over the
original Bro record containing the values to be logged, which on the
receiving side would then pass through the logging framework normally,
including triggering filters and events. The old communication system
however special-cases logs: it sends already processed log entries,
just as they go into the log files, and without any receiver-side
filtering etc. This more efficient as it short-cuts the processing
path, and also avoids the more expensive Val serialization. It also
lets the sender determine the specifics of what gets logged (and how).

This commit changes Broker over to now use the same semantics as the
old communication system.

TODOs:
     - The new Broker code doesn't have consistent #ifdefs yet.

     - Right now, when a new log receiver connects, all existing logs
     are broadcasted out again to all current clients. That doesn't so
     any harm, but is unncessary. Need to add a way to send the
     existing logs to just the new client.
2017-02-10 18:46:45 -08:00
Jon Siwek
b06d82cced broker integration: add API documentation (broxygen/doxygen)
Also changed asynchronous data store query code a bit; trying to make
memory management and handling of corner cases a bit clearer (former
maybe could still be better, but I need to lookup queries by memory
address to associate response cookies to them, and so wrapping pointers
kind of just gets in the way).
2015-02-17 10:50:57 -06:00