We are seeing an issue with agents that remain in Creating state when we have a burst in the number of agents create by the autoscaler.
If agents are created gracefully, 1-4 there is generally no issue, but if we go from our minimum and requesting 10+ agents within a parallelised build, we se that some agents remain indefinitely in Creating state.
Logs on both the autoscaler and agent don’t seems to show anything wrong, the VM in GCP gets created, but we can see there are no Docker processes running yet.
Any hint on where to look for issues would be great. The current theory is that the autoscaler is waiting for the agent to report some sort of status or there is a concurrency issue when multiple agents get requested.
We’ve been looking into this to try understand what would be an expected response for the status to update and mark the agent ready to start taking builds: