Fix attempt for "internal error: unknown msg type 115 in Poll()"

Under remote communication overload conditions, the child->parent
chunked IO may start rejecting chunks if over the hard cap.  Some
messages are made of two chunks, accepting the first part, but rejecting
the second can put the parent in a bad state and the next two chunks it
reads are likely to cause the error.

This patch just removes the rejecting functionality completely and so
now relies solely on shutting down remote peer connections to help
alleviate temporary overload conditions. The
"chunked_io_buffer_soft_cap" script variable can now tune when this
shutting down starts happening and the default setting is now double
what it used to be.  For constant overload conditions, communication.log
should keep stating "queue to parent filling up; shutting down heaviest
connection".

An alternative to completely removing the hard cap rejection code could
be ensuring that messages that involve a pair of chunks can never have
the second chunk be rejected when attempting to write it.

Addresses BIT-1376
This commit is contained in:
Jon Siwek 2015-04-16 17:02:44 -05:00
parent 8789d7f527
commit effeaa5b13
6 changed files with 11 additions and 34 deletions

View file

@ -3207,6 +3207,11 @@ const forward_remote_events = F &redef;
## more sophisticated script-level communication framework.
const forward_remote_state_changes = F &redef;
## The number of IO chunks allowed to be buffered between the child
## and parent process of remote communication before Bro starts dropping
## connections to remote peers in an attempt to catch up.
const chunked_io_buffer_soft_cap = 800000 &redef;
## Place-holder constant indicating "no peer".
const PEER_ID_NONE = 0;