Addig a new random seed for external tests.
I added a wrapper around the siphash() function to make calling it a
little bit safer at least.
BIT-1612 #merged
* origin/topic/johanna/bit-1612:
HLL: Fix missing typecast in test case.
Remove the -K/-J options for setting keys.
Add test checking the quality of HLL by adding a lot of elements.
Fix serializing probabilistic hashers.
Baseline updates after hash function change.
Also switch BloomFilters from H3 to siphash.
Change Hashing from H3 to Siphash.
HLL: Remove unnecessary comparison.
Hyperloglog: change calculation of Rho
The options were never really used and do not seem especially useful;
initialization with a seed file still works.
This also fixes a bug with the initialization of the siphash key.
The test adds 170,000 IP addresses. After the recent hashing changes,
HLL estimates 171,250 entries (completely stable). Before, HLL estimated,
depending on the initial seeds, ~700 to 300,000 entries.
This commit mostly changes the hash function that is used for Internal
hashing of data < 36 bytes from H3 to Siphash. This change is motivated
by the fact that it turns out that H3 apparently does not deliver a very
good source of data uniqueness; running HLL with H3 as a hashing
function results in quite poor results (up to of 75% off in my tests).
In difference, running HLL with Siphash (or HMAC-MD5) changes this
factor to ~2%.
This also fixes a long-standing bug in Hash.h which truncated our hash
values to 32 bit on most machines.
Furthermore, it once again fixes a problem with the Rank function in
HLL.
non-partial connections.
Before, if we saw a responder-side SYN/ACK, but had not seen the
initial orginator-side SYN, Bro would treat the connection as partial,
meaning that most application-layer analyzers would refuse to inspect
the payload. That was unfortunate because all payload data was
actually there (and even passed to the analyzers). This change make
Bro consider these connections as complete, so that analyzers will
just normally process them.
The leads to couple more connections in the test-suite to now being
analyzed.
Addresses #1492. (I used an HTTP trace for debugging instead of the
HTTPS trace from the ticket, as the clear-text makes it easier to
track the data flow).
One tweak: I made ts optional and set it to network_time() if not given.
BIT-1578 #merged
* origin/topic/johanna/bit-1578:
Weird: fix potential small issue when ignoring duplicates
Rewrite weird logging.
- SMTP protocol headers now do some minimal parsing to clean up
email addresses.
- New function named split_mime_email_addresses to take MIME headers
and get addresses split apart but including the display name.
- Update tests.
In all versions so far, the identifier string that was used for
comparisons might have been different from the identifier string that
was added (when certain notices are used).
This commit rewrites the way that weirds are logged and fixes a number
of issues on the way. Most prominently, flow weirds now actually log
information about the flow that they occur in (before this change, they
only logged the name of the weird, which is only marginally helpful).
Besides restructuring how weird logging works internally, weirds can now
also be generated by calling Weird::weird with the info record directly,
allowing more fine-granular passing of information. This is e.g. used
for DNS weirds, which do not have the connection record available any
more when they are generated (before data like the connection ID was
just not logged in these instances).
Addresses BIT-1578
We now extract email addresses in the fields that one would expect
to contain addresses. This makes further downstream processing of
these fields easier like log analysis or using these fields in the
Intel framework. The primary downside is that any other content
in these fields is no longer available such as full name and any
group information. I believe the simplification of the content in
these fields is worth the change.
Added "cc" to the script that feeds information from SMTP into the
Intel framework.
A new script for email handling utility functions has been created
as a side effect of these changes.
This changes the HTTP log format slightly but shouldn't mess
up anything that anyone was doing because the old "filename"
field was never actually filled out. Tests are updated as well.
The expiration attribute expression is now evaluated for every use. Thus
later adjustments of the value (e.g. by redefining a const) will now
take effect. Values less than 0 will disable expiration.
Added a new BIF haversine_distance that computes distance between two
geographic locations.
Added a new Bro script function haversine_distance_ip that does the same
but takes two IP addresses instead of latitude/longitude. This function
requires that Bro be built with libgeoip.
The link-layer addresses are now part of the connection endpoints
following the originator-responder-pattern. The addresses are printed
with leading zeros. Additionally link-layer addresses are also extracted
for 802.11 plus RadioTap.
The ascii reader now accepts \r\n newlines without complaining.
Furthermore, the reader was slightly rewritten in a more c++11-y way,
removing all raw pointers from the class.
Addresses BIT-1198