from time to time, I’m getting this:
This happens when drone-autoscaler has just scaled up because of additional build jobs.
I can get around this by implementing a custom userData block:
# use firewall to disable access to docker until it has restarted and has been able to pull an image runcmd: - ufw default allow outgoing - ufw default allow incoming - ufw deny 2376 - echo activating firewall - ufw enable - apt-get install -o Dpkg::Options::="--force-confold" --force-yes -y docker-ce #custom docker config is already in place. These options makes sure installing docker doesnt overwrite them. - docker pull drone/drone-runner-docker - echo sleeping for 30 secs - sleep 30 - echo opening firewall - ufw allow 2376
We inject this using the
DRONE_AMAZON_USERDATA_FILE environment variable.
This allows docker to get installed without overwriting the config (daemon.json), start and perform a docker pull before it becomes available to drone.
It would be good if more robustness was built-in to the drone-autoscaler instead, so that we didn’t have to do this. For example, perform a docker pull (with appropirate retry logic) and only when that succeeds mark the runner as “ready for service”.
I think we discussed this in another thread and the maintainer said they’d never experienced issues with drone runners coming online, but we’re seeing it quite frequently - it seems to me as if drone-ausoscaler simply marks the node as healthy too soon. We’d very much like to not have to maintain a custom userdata config.