Management framework: tune request timeout granularity and interval

When the controller relays requests to agents, we want agents to time out more quickly than the corresponding controller requests. This allows agents to respond with more meaningful errors, while the controller's timeout acts mostly as a last resort to ensure a response to the client actually happens. This dials down the table_expire_interval to 2 seconds in both agent and controller, for more predictable timeout behavior. It also dials the agent-side request expiration interval down to 5 seconds, compared to the agent's 10 seconds. We may have to revisit this to allow custom expiration intervals per request/response message type.
2025-10-02 06:38:20 +00:00 · 2022-05-29 22:10:03 -07:00 · 2022-05-29 22:10:03 -07:00 · 83c60fd8ac
commit 83c60fd8ac
parent 4371c17d4c
3 changed files with 21 additions and 5 deletions
--- a/scripts/policy/frameworks/management/controller/main.zeek
+++ b/scripts/policy/frameworks/management/controller/main.zeek
@ -73,6 +73,10 @@ redef record Management::Request::Request += {
 # Tag our logs correctly
 redef Management::role = Management::CONTROLLER;

+# Conduct more frequent table expiration checks. This helps get more predictable
+# timing for request timeouts and only affects the agent, which is mostly idle.
+redef table_expire_interval = 2 sec;
+
 global check_instances_ready: function();
 global add_instance: function(inst: Management::Instance);
 global drop_instance: function(inst: Management::Instance);