Commit graph

4 commits

Author SHA1 Message Date
Tim Wojtulewicz
4e9d843cec Remove deprecated Cluster::Node::interface field 2024-08-07 11:58:22 -07:00
Tim Wojtulewicz
a716903f3a Remove deprecated time machine settings 2024-08-07 11:58:21 -07:00
Tim Wojtulewicz
e93e4cc26d Add a services.json endpoint for Prometheus service discovery 2024-05-31 13:30:31 -07:00
Arne Welzel
92565d4739 Supervisor: Handle EAGAIN error on stem pipe
util::safe_write() calls abort() in case of EAGAIN errors. This is
easily observed when starting clusters with 32 workers or more.

Add a custom write_message() function handling EAGAIN by retrying
after a small sleep. It's not clear a more complicated poll() would be
much better: The pipe might be ready for writing, but then our message
might not actually fit in, resulting in another EAGAIN error. And even
poll() would introduce blocking/sleeping code.

Take some precautions against the stem and the supervisor dead-locking
when both pipes are full by draining the other end on EAGAIN errors.

Closes #3043
2023-10-25 12:53:37 +02:00