As a followup to 3bf8c8ceb6 that added the
parse cache, add a small short lived cache on the workers to effectively
debounce the number of Software::new events sent up to the proxies.
User-Agents are highly repetitive, workers often see exact duplicate
user-agents on the same orig_h. Worse, due to NAT, virtualization, and
the proliferation of Electron based applications, variations of the same
user-agent can be seen at the same time. For example:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.6613.18 Safari/537.36 Zoom/6.2.0 (1855)
When these two user-agents are seen concurrently, the software framework
will log each flip as a new user-agent. This can be fixed separately on
the proxy side, but a reduction of Software::new events is still needed
to reduce cluster communication overhead as well as the load on the
proxies.
With a 10 minute cache on the workers, this should greatly reduce the
number of redundant user-agents logged in the software.log
There was some confusion around which value was used subsequent to a strip(),
but sub not respecting anchors make it appear to work. Also seems that the
`\(?` part seems redundant.
Add a small cache in front of the parse method. This cache should
reduce most of the calls to parse, and ultimately save memory because
redundant versions of the parsed strings will not be created in memory.
Move the parsing itself to the proxies where the caching can be more
efficient.
This adds a "policy" hook into the logging framework's streams and
filters to replace the existing log filter predicates. The hook
signature is as follows:
hook(rec: any, id: Log::ID, filter: Log::Filter);
The logging manager invokes hooks on each log record. Hooks can veto
log records via a break, and modify them if necessary. Log filters
inherit the stream-level hook, but can override or remove the hook as
needed.
The distribution's existing log streams now come with pre-defined
hooks that users can add handlers to. Their name is standardized as
"log_policy" by convention, with additional suffixes when a module
provides multiple streams. The following adds a handler to the Conn
module's default log policy hook:
hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
if ( some_veto_reason(rec) )
break;
}
By default, this handler will get invoked for any log filter
associated with the Conn::LOG stream.
The existing predicates are deprecated for removal in 4.1 but continue
to work.
* All "Broxygen" usages have been replaced in
code, documentation, filenames, etc.
* Sphinx roles/directives like ":bro:see" are now ":zeek:see"
* The "--broxygen" command-line option is now "--zeexygen"