Add exact name of the Top-k algorithm.

I needed to figure out which exact algorithm we use for our probabilistic top-k measurements. It turns out that we do not mention this in our source tree at all so far.
2025-10-02 14:48:21 +00:00 · 2022-05-11 13:21:26 +01:00 · 2022-05-11 13:21:26 +01:00 · 25c33d2a29
commit 25c33d2a29
parent 0aafc8ae6c
2 changed files with 10 additions and 2 deletions
--- a/scripts/base/frameworks/sumstats/plugins/topk.zeek
+++ b/scripts/base/frameworks/sumstats/plugins/topk.zeek
@ -1,4 +1,9 @@
 ##! Keep the top-k (i.e., most frequently occurring) observations.
 ##!
 ##! This plugin uses a probabilistic algorithm to count the top-k elements.
 ##! The algorithm (calles Space-Saving) is described in the paper Efficient
 ##! Computation of Frequent and Top-k Elements in Data Streams", by
 ##! Metwally et al. (2005).
@load base/frameworks/sumstats
--- a/src/probabilistic/Topk.h
+++ b/src/probabilistic/Topk.h
@ -7,8 +7,11 @@
 #include "zeek/OpaqueVal.h"
 #include "zeek/Val.h"
-// This class implements the top-k algorithm. Or - to be more precise - an
+// This class implements the Space-Saving algorithm for counting the Topk- elements
-// interpretation of it.
+// in a datastream as presented in the paper "Efficient Computation of Frequent and
 // Top-k Elements in Data Streams", by Metwally et al. (2005).
 //
 // Or - to be more precise - it implements an interpretation of it.
 namespace zeek::detail
 	{