This is a script-only change that unrolls File::Info records into
multiple files.log entries if the same file was seen over different
connections by single worker. Consequently, the File::Info record
gets the commonly used uid and id fields added. These fields are
optional for File::Info - a file may be analyzed without relation
to a network connection (e.g by using Input::add_analysis()).
The existing tx_hosts, rx_hosts and conn_uids fields of Files::Info
are not meaningful after this change and removed by default. Therefore,
files.log will have them removed, too.
The tx_hosts, rx_hosts and conn_uids fields can be revived by using the
policy script frameworks/files/deprecated-txhosts-rxhosts-connuids.zeek
included in the distribution. However, with v6.1 this script will be
removed.
Due to different double precision on M1, file IDs for SMB could end up
changing on M1 because the access time of a file goes into their
computation. The real solution for this would be changing Zeek's
internal "time" representation to uint64; that's planned, but requires
major surgery. For now, this PR changes the SMB code to also pass SMB's
original time representation (which is a uint64) into script-land, and
then use that for computing the file ID.
Closes#1406
The modp_dtoa/modp_dtoa2 functions aren't capable of handling double
values larger than INT_MAX and fallback on using sprintf() in that
situation. Previously, the format string to that sprintf() was "%e",
defaulting to a precision of 6, which is already too few digits to
represent a number known to be larger than INT_MAX. Now, an sprintf()
is still performed for values larger than INT_MAX and still uses a
scientific notation format, but in a way that uses as many decimal
digits as needed to preserve information.
This commit switches UID hashing from md5 to a highway hash. It also
moves the salt value out of the file plugin - and makes it
installation-specific instead - it is moved to the global namespace.
There now are digest hash functions to make "static"
installation-specific hashes that are stable over workers available to
everyone; hashes can be 64, 128 or 256 bits in size.
Due to the fact that we switch the file hashing algorithm, all file
hashes change.
The underlyigng algorithm that is used for hashing is highwayhash-128,
which is significantly faster than md5.
SMB error handling improved. The analyzer isn't destroyed when a problem
is encoutered anymore. The flowbuffer in the parser is now flushed and
the analyzer is set to resync against an SMB command. This was needed
because there is some state about open files that is kept within the
parser itself which was being destroyed and that was causing analysis
after content gaps or parse errors to be faulty. The new mechanism
doesn't detroy the parser so parsing after gaps is improved.
DCE_RPC handling in SMB is improved in the edge case where a drive
mapping isn't seen. There is a new const named SMB::pipe_filenames
which is used as a heuristic for identifying "files" opened on named
pipe shares. If the share mapping type isn't known and a filename
in this set is found, the share type will change to "PIPE" by
generating an event named "smb_pipe_connect_heuristic". Reads and
writes to that file will be sent to the DCE_RPC analyzer instead of
to the files framework.
The concept of "unknown" share types has been removed due to the new
heuristic detection of share types.
Some general clean up of how the SMB cmd log is written and when.
There were some cases where the log would be missing a field
or data wouldn't get sent to file analysis. At least some of
this is fixed now and I get confused a bit less when I look
at the logs now.
Also, I made the default handling "FILE" so that things like
FILE_UNKNOWN wouldn't show up in the logs so regularly. It's
technically correct that way, but it doesn't look good and it's
correct as FILE often enough that it make sense to make it the
default I think.
- Actually get the path into the smb_files.log now.
- When a share root is having the "create" message used on it,
instead of giving a null file name, now give a special
indicator of "<share_root>".
- Update test baselines.