* origin/topic/dev/non-ascii-logging:
Removed Policy Script for UTF-8 Logs
Commented out UTF-8 Script in Test All Policy
Minor Style Tweak
Use getNumBytesForUTF8 method to determine number of bytes
Added Jon's test cases as unit tests
Prioritizes escaping predefined Escape Sequences over Unescaping UTF-8 Sequences
Added additional check to confirm anything unescaping is a multibyte UTF-8 sequence, addressing the test case Jon brought up
Added optional script and redef bool to enable utf-8 in ASCII logs
Initial Commit, removed std::isprint check to escape
Made minor code format and logic adjustments during merge.
* origin/topic/timw/cleaner-utf8:
GHI-486: Switch over to using LLVM utf8-checking code to better validate characters
I addressed a buffer over-read during the merge and added test-cases for
it.
* origin/topic/timw/150-to-json:
Update submodules for JSON work
Update unit tests for JSON logger to match new output
Modify JSON log writer to use the external JSON library
Update unit test output to match json.zeek being deprecated and slight format changes to JSON output
Add proper JSON serialization via C++, deprecate json.zeek
Add new method for escaping UTF8 strings for JSON output
Move do_sub method from zeek.bif to StringVal class method
Move record_fields method from zeek.bif to Val class method
Add ToStdString method for StringVal
By using a consistent timestamp. That avoids rare chances of sqlite
output from rounding the current time into such a form that happens
to bypass the timestamp canonifier script (whenever it happened to
land on a whole or tenth second).
For backward compatibility when reading values, we first check
the ZEEK-prefixed value, and if not set, then check the corresponding
BRO-prefixed value.
This also installs symlinks from "zeek" and "bro-config" to a wrapper
script that prints a deprecation warning.
The btests pass, but this is still WIP. broctl renaming is still
missing.
#239
Mostly trying to standardize the way tests sleep for arbitrary amounts
of time to make it easier to tell at which particular point the
unit test actually may need the timeout interval increased (or else
debugged further).
Occasionally a few lines in the first part of the output file were
not in the expected order (this seems to be caused by each line in the
output being created by a process that is run in the background but
bro doesn't wait for it to finish). Fixed by sorting the output.
get_filter_names(id: ID) : set[string] returns the names of the current
list of filters for a specified log stream.
Furthermore this commit makes a number of logging functions more robust
by checking existence of values before trying to modify them. This
commit also really implements (and tests) the enable_stream function.
This feature can be enabled globally for all logs by setting
LogAscii::gzip_level to a value greater than 0.
This feature can be enabled on a per-log basis by setting gzip-level in
$confic to a value greater than 0.
number of fields required.
Addresses BIT-1683
I do not think this quite fixes the underlying issue of BIT-1683 - it
should not be possible to get to this state in normal operations.
Also fixes a small memory leak for disabled writers.
* origin/topic/seth/log-framework-ext:
Log extensions: series of small fixes and new tests.
Change the function for log extension to take a path only and update tests.
Final changes to log framework ext code.
Add logging framework metadata mechanism.
Add unrolling separator & field name map to logging framework.
The extensions now work with optional types, as well with complex types
(like subrecords). Not returning a record in the ext_func no longer
crashes bro.
The default_ext_func was switched to return void in
cases where no extension revord is defined (was bool).
I also got rid of the offsets in the indices - with the rest of the
implementation, that was not really necessary and made the code more
complex.
The "metadata" functionality has been renamed to "ext" to
represent that the logs are being extended. The function that
returns the record which is used to extend the log now receives
a log filter as it's single argument.
The field name "unrolling" is now renamed to "scope" so the variables
names now look like this: "Log::default_scope_sep"
This allows all threads accessing the same database to share sqlite
objects. This, for example, fixes the issue with several threads
simultaneously writing to the same database file.
See https://www.sqlite.org/sharedcache.html
Addresses BIT-1325
- When a log record is being "unrolled" (sub-records flattened
out into a single record), it's now possible to choose the
character/string to separate the outer name from the inner
name. This can be used to work around the problems
with ElasticSearch 2.0 not supporting dots "." in field names.
This value can be provided per-filter as well as a global
default value.
- Log fields can be renamed by providing a table per-filter
(or a global default) to rename fields for any log writer.
The name translation is performed after unrolling so the
value in the field name table must match whatever is being
used to separate field names.
For example if the unrolling separator was set to "*":
redef Log::default_unrolling_sep = "*";
The field name map would need to reflect it:
redef Log::default_field_name_map = {
["id*orig_h"] = "src",
["id*orig_p"] = "src_port",
["id*resp_h"] = "dst",
["id*resp_p"] = "dst_port",
};
Cleaned up the surrounding code a bit and also added '[' as another
case (not sure that can happen, but doesn't hurt eihter).
* 'master' of https://github.com/aeppert/bro:
Whitespace
Remove
Remove.
Fix for JSON formatter
A fatal error, especially in DEBUG, should result in a core.
Seems to fix a case where an entry in the table may be null on insert.
With this patch the model is:
- "print" cleans the data so that non-printable characters get
escaped. This is not necessarily reversible.
- to print in a reversible way, one can go through
escape_string(); this escapes backslashes as well to make the
decoding non-ambigious.
- Logging always escapes similar to escape_string(), making it
reversible.
Compared to master, we also change the escaping as follows:
- We now only escape with "\xXX", no more "^X" or "\0". Exception:
backslashes.
- We escape backlashes as "\\".
- There's no "alternative" output style anymore, i.e., fmt() '%A'
qualifier is gone.
Baselines in testing/btest are updated, external tests not yet.
Addresses BIT-1333.
is present. Testcase crashes on unpatched versions of Bro.
Found by Aaron Eppert <aeppert@gmail.com>.
This (probably) fixes the crash issue with sqlite a few people have
reported on the mailing list in the past.
- It's not *exactly* ISO 8601 which doesn't seem to support
subseconds, but subseconds are very important to us and
most things that support ISO8601 seem to also support subseconds
in the way I'm implemented it.