I have a step in my pipeline which cleans up resources and it works nicely when a build finishes with a success or a failure. Problems arise when a build timeouts or is canceled, because the step is never executed. Is there a way to force running of a step in a case like this?
I’m also after a solution here.
Our pipelines start VMs on AWS, so, you see where this is going…
Sure I could setup a script on aws to kill running VMs, but which?
Sure, I can tag the VMs, but again, which were left behind, which are still part of a running build?!
This is a major set back in our switch to drone and we’re fully committed to switch(enterprise license).
Is there an issue to follow, should one be opened?
@s0undt3ch I think I will need some more context. Drone does run a full cleanup when the build completes, destroying any networks, volumes and containers that it creates . This includes on cancel.
If you mount the host machine Docker socket and download or build images, or create containers on the host machine, these operations are happening outside of the Drone process, in which case you can certainly fill up the host machine hard disk without manual cleanup. Is this perhaps the case here?
EDIT sorry after re-reading I think I understand the problem. When a build is canceled it kills all pipeline steps which, as a result, prevents you from being able to cleanup AWS resources created as part of the pipeline. Is my understanding correct? In this case I recommend creating a feature request in the main issue tracker and we can brainstorm a way to support this use case.
I do not think we will be able to run addition steps on cancel due to design constraints. I believe there is some prior art here, where Travis was unable to accommodate a similar request  likely due to similar design constraints. With that being said, Drone supports multiple-pipelines per-build, and we may be able to support running a separate pipeline on cancel, which may also suffice for your use case.
I also think we can do something creative with system hooks. Today we provide system-level webhooks that would be able to notify an endpoint when a build state changes, including canceling a build, which could allow you to ensure cleanup. We plan to expand webhooks to also launch jobs (kubernetes jobs or nomad jobs) which means you could have a simple cleanup job run on cancel. This is the workaround I’m leaning toward to support this action.
 source code that performs cleanup. https://github.com/drone/drone-runtime/blob/master/engine/docker/docker.go#L219
Yes, your edit is correct.
Running a pipeline on cancel would likely do it, we would just have to tag the images with something unique to the build like the build link(not the pipeline running link, the extra
Then, defining that pipeline to run on success/failure(just to make sure regular build process destroyed resources) and on cancel, would likely do what we need.
I’ll create the feature ticket.
We can expect that to land on 1.0 right?
I still need to assess feasibility of launching a separate pipeline on cancelation, however, I will make sure there is some solution in place in the 1.0 release that enables you to cleanup resources.
I’ll be watching out for it.