Commit graph

1 commit

Author SHA1 Message Date
Arne Welzel
92565d4739 Supervisor: Handle EAGAIN error on stem pipe
util::safe_write() calls abort() in case of EAGAIN errors. This is
easily observed when starting clusters with 32 workers or more.

Add a custom write_message() function handling EAGAIN by retrying
after a small sleep. It's not clear a more complicated poll() would be
much better: The pipe might be ready for writing, but then our message
might not actually fit in, resulting in another EAGAIN error. And even
poll() would introduce blocking/sleeping code.

Take some precautions against the stem and the supervisor dead-locking
when both pipes are full by draining the other end on EAGAIN errors.

Closes #3043
2023-10-25 12:53:37 +02:00