The controller's deployment request state now features a bit that indicates
whether the deployment was requested by a client, or triggered internally. This
affects logging and the transmission of deployment response events via Broker,
which are skipped when the deployment is internal.
This is in preparation of resilience features when the controller (re-)boots.
This allows us to handle loss of Broker peerings, updating instance state as we
see instances go away. This also tweaks logging slightly to differentiate
between an instance checking in for the first time, and checking in when the
controller already knows it.
These callbacks are handy for stringing together codepaths separated by event
request/response transactions: when such a transaction completes, the callback
allows locating a parent request for the finished one, to continue its
processing.
When an agent is already running the configuration it's asked to deploy,
it will now recognize this and by default do nothing. The requester can force
it if needed, via a new argument to the deploy_request event.
The agent's Broker::peer_added handler now recognizes the Supervisor and does
not trigger a notify_agent_hello event upon it. It might still send such events
repeatedly as other things peer with the agent.
The controller now knows three states that a cluster configuration can be in:
- STAGED: as uploaded by the client
- READY: with needed tweaks applied, e.g. to fill in ports
- DEPLOYED: as sent off to agents for deployment
These states aren't exclusive, they represent checkpoints that a config goes
through from upload through deployment. A deployed configuration will also exist
in its STAGED and READY versions, unless a client has uploaded a new
configuration, which will overwrite the STAGED and READY ones.
The controller saves all of these in a table, which lets us use Broker to
persist all states to disk. We use &broker_allow_complex_type, since we only
ever store entire configurations.
This separates uploading a configuration from deploying it to the instances into
separate event transactions. set_configuration_request/response remains, but now
only conducts validation and storage of the new configuration (upon validation
success, and not yet persisted to disk). The response event indicates success or
the list of validation errors. Successful upload now returns the configuration's
ID in the result record's data struct.
The new deploy_request/response event takes a previously uploaded configuration
and deploys it to the agents.
The controller now tracks uploaded and deployed configurations
separately. Uploading assigns g_config_staged; deployment assigns
g_config_deployed. Deployment does not affect g_config_staged.
The get_config_request/response event pair now allows selecting the
configuration the caller would like to retrieve.
This renames the agent's functionality for setting a configuration to reflect
the controller's upcoming separation of set_configuration and deployment.
The instance and error fields are now optional instead of defaulting to empty
strings, which caused minor output deviations in the client.
Agents now ensure that any Result record they create has the instance field
filled in.
During `set_configuration_request` handling the controller now validates
received configurations, checking for a few common gotchas around naming and
port use. Validation continues once it finds a problem, resulting in a list
summarizing all identified problems.
The numbering process now accounts for the possibility of colliding with the
agent port, as well as with ports explicitly assigned in the configuration. It
also avoids nondeterminism that could result from traversal of sets.
It helps during testing to be able to control whether the Supervisor process
also routs node output to the console, in addition to writing to output
files. Since the Supervisor runs as the main process in Docker containers, its
output becomes visible in "docker logs" that way, simplifying diagnostics.
When the controller receives a configuration with no instances (and thus no
nodes), it needs to roundtrip to agents and can send the response right away.
* origin/dependabot/github_actions/actions/download-artifact-3:
Bump actions/download-artifact from 2 to 3
* origin/dependabot/github_actions/docker/setup-buildx-action-2:
Bump docker/setup-buildx-action from 1 to 2
* origin/dependabot/github_actions/pre-commit/action-3.0.0:
Bump pre-commit/action from 2.0.3 to 3.0.0
* origin/dependabot/github_actions/docker/build-push-action-3:
Bump docker/build-push-action from 2 to 3
* origin/dependabot/github_actions/dawidd6/action-send-mail-3.6.1:
Fix pattern matching in Cirrus dependabot check
Bump dawidd6/action-send-mail from 3.4.1 to 3.6.1