Drone

General question about the suitability of using drone

Hi

I was considering switching from Jenkins to drone to build Apache MXNet. It’s a complex CI setup which involves builds in different instance families. There’s separate pools of CPU instances, GPU instances and Windows instances in AWS, they are also autoscaled dynamically depending on the number of queued jobs.

Would it be possible to cover this setup with drone? Is it possible to have different pools of workers, for example CPU and GPU instances? Windows docker support in GPU might be a problem, so at least I would like to understand if for Linux workloads it would help this use case.

This is our curent system: http://jenkins.mxnet-ci.amazon-ml.com/

Thank you !

Would it be possible to cover this setup with drone? Is it possible to have different pools of workers, for example CPU and GPU instances?

yes, you can use labels to route pipelines to specific machines
https://docker-runner.docs.drone.io/configuration/routing/
https://docker-runner.docs.drone.io/installation/reference/drone-runner-labels/

Windows docker support in GPU might be a problem, so at least I would like to understand if for Linux workloads it would help this use case.

Drone supports multiple runtime engines. We have the Docker runtime engine, which executes builds inside Docker containers [1] which is the most popular option. But we also have a host machine runtime engine which runs builds directly on the host [2] and an ssh runtime engine that runs builds on remote machines using the ssh protocol [3]

[1] https://docker-runner.docs.drone.io
[2] https://exec-runner.docs.drone.io
[3] https://ssh-runner.docs.drone.io

There’s separate pools of CPU instances, GPU instances and Windows instances in AWS, they are also autoscaled dynamically depending on the number of queued jobs.

Drone also has an autoscaler that supports AWS. Note the autoscaler currently supports the Docker runtime engine only. See https://github.com/drone/autoscaler

Wow, thanks for your reply. We do autoscaling using a lambda function that examines the job queue. Does drone have a way to do something similar using an http API? in this case we could attach slaves directly as we do with Jenkins.

Yes, there is an /api/queue endpoint on the Drone server to get a list of pending and running jobs. The payload includes the os, arch, kernel, labels, etc which can all be used for autoscaling. This is the same endpoint our autoscaling project uses.

I should also point out that we have a digital ocean runner that spawns a new digital ocean VM for every pipeline execution. https://blog.drone.io/drone-digitoalocean-runner/

We are planning something similar for AWS.

If you look at our pipelines are quite big. We run a pipeline execution across different hosts. I don’t think the asumption that one pipeline can’t span multiple hosts is good.

Thanks a lot for your answers. I would be excited if we could use drone.io.

the yaml configuration file can define multiple pipelines as an execution graph [1], where each pipeline is executed independently of the others, and can be scheduled on any machine in the cluster that matches the pipeline requirements (os, arch, labels, etc) [2]

given the complexity of your pipelines, you might also want to consider defining your pipelines using Starlark scripting [3] as opposed to yaml.

[1] https://docs.drone.io/configure/pipeline/multiple/#graph-execution
[2] https://docs.drone.io/configure/pipeline/multiple/#multi-platform
[3] https://docs.drone.io/starlark/overview/

1 Like

What version do you suggest installing? The one in the docker container seems very light / demo purpose, or is the recommended way to run the master? I didn’t see any other binary packages provided. Do you recommend to build myself or use the docker image?

The one in the docker container seems very light / demo purpose

sorry, not sure I understand.

the recommended installation is the drone/drone:1 server image which currently points to the 1.6.0 release. This is the full version of Drone.

Thanks for your answer, catching up with this again.

Does the autoscaler support autoscaling per label? or treats the pool uniformly?

Also, is there a possibility to clone recursively? I find that the submodules are not present on the repo.

For recursive cloning, you can do something like this:
https://docker-runner.docs.drone.io/configuration/cloning/#the-recursive-flag

Yes, the autoscaler can scale by label. More specifically, the autoscaler will only autoscaler workloads that match its profile (os, arch, labels). So you would setup a separate autoscaler for each unique profile.