I am using the kube-runner and am having some issues with very slow start times for builds due to a high number of pipelines being thrown into the job queue all at once. We have determined that there is a particular repo’s .drone.yml that is creating 67 parallel builds which is causing the job queue to fill up and significantly delay builds for all other repos.
We have tried increasing the DRONE_RUNNER_CAPACITY and DRONE_RUNNER_MAX_PROCS which seems to help the runner pull things off the queue faster, but this does not seem to help at all in terms of processing the builds and starting the underlying build containers.
I would appreciate any help with the following questions:
Is there a way to increase throughput with higher job concurrency for the kube runner and/or scale the kube runner horizontally?
Does anyone have any guidance about concurrency limits/expectations that should be documented or that I can potentially enforce through policy?
We are running the following:
- Server: drone/drone:1.9.2
- kube-runner: drone/drone-runner-kube:1.0.0-beta.9
- custom validation extension: GitHub - mongodb-labs/drone-validation: Drone validation extension