How to implement a LIFO queue of jobs waiting for a stage

I’m trying to see how to implement my pipeline in Drone CI. Basically, what I’m trying to implement can be best summarized by this:


(Source: relaxdiegio.com)

Some context:

  1. The top left box represents the source repo
  2. The following 3 boxes represent the various stages of my pipeline

So when I merge PR #1 in the above case, it will trigger the unit test + build + publish (to dev container registry) which will then trigger a deployment of that newly built image to the test (or dev if you prefer) environment. This stage involves executing long-running tests as shown by that long activation box labeled “1” in that stage’s lifeline.

At some point while build 1 is still being tested in the test/dev environment, builds 2 and 3 come in. Of course, they have to wait in queue since 1 is still busy but as soon as build 1 is done, i would like the pipeline to deploy and test build #3 in the test/dev environemnt and discard build #2 since the latter’s changes are already part of build #3 anyway.

How do I implement this behavior in Drone CI?

Is there any reason you need build 2 if it’s just gonna be cancelled, or does it run in the test/dev environment too?

Hi @hwittenborn. Well, in the above case, build 2 is just a natural occurrence in the day of software development team because there’s no reason for them the team to pause all PR approvals just because something is still occupying the the test/dev env. Likewise, I don’t see any reason for the build stage to pause just because the test/dev stage is occupied.

Sorry for my confusion, but I’m still not sure on the purpose of build 2, and/or why you’re wanting it cancelled.

Would merging #2 and not running its build at all work, or are there circumstances where #2 needs to build?

Hi @hwittenborn. No worries. So the reason why the merge of #2 would still trigger the build stage is because the build stage would be idle at that time anyway so it makes sense to run it. The alternative is to not run the build stage at that time and wait for the test/dev stage (which is two stages down) to finish. In this alternative though, we’d just be introducing N-time delay (where N is the amount of time #2 spends in the build stage).

Note that so far above, I’m assuming that #3 hasn’t been merged yet as is in the real world where you really can’t tell when PRs get merged. So at the above point, where we don’t know if #3 is getting merged, it makes sens to build #2.

By the time build #3 is merged and it passes the build stage, that’s the only time it becomes clear that we have no need for #2 anymore. So it’s at that point, or at least the point just before the test/dev stage pulls a queued job, where we decide to cancel #2.

Did that clarify things or was there something I missed?

So the build stage on #2 is a fallback in case the build stage on #3 fails/doesn’t happen?

If so, just so I can start to get somewhere - are the three depending on each other in any way (i.e. #2 depending on #1)?

If I’m still lost, just let me know.

Let’s start from the beginning. First assume this is all happening from a single branch in git. Let’s call that branch “main.”

At 9:00 AM, PR #1 comes in, the team reviews it, and deems it acceptable so they accept it and it gets merged to the main branch. This signals the pipeline to begin the build stage (second box in the diagram above) which includes unit testing, static analysis, and basically any short-run tests that don’t require a live deployment of the build. Eventually this stage produces a new build from master. Let’s call that build #1.

At 9:05 AM, The dev stage detects that a new build, build #1, has been queued up so it picks that up, deploys it and runs end-to-end tests against it. Because of the scope of the tests, this will take at least 10-15 minutes.

At 9:07 AM, Another PR, PR #2 comes in, the team reviews it and (assume for the sake of simplicity that it’s a small change) accepts it, thereby merging it to the main branch. Again, this signals the pipeline to start the build and (unit) test stage which produces build #2.

At this point, since the next stage, the dev stage, is still pre-occuppied with build #1, build #2 is queued up.

At 9:12 AM, another PR, PR #3 comes in, and the team accepts it (for the sake of brevity, let’s assume it’s also small enough), therby merging it to the main branch. Again, this signals the pipeline to start the build and (unit) test stage which produces build #3.

Again at this point, since the next stage, the dev stage, is still pre-occuppied with build #1, build #3 is also queued up.

Note that at this point, because all builds come from the same branch, that means build #3 already contains the changes of build #2 so it’s safe and practical to consider build #2 as stale.

Thus, when the dev stage is done with build #1, it should move on to build #3 and drop build #2, leaving it to the underlying package registry to expire it according to its policy.

That’s makes perfect sense now, thanks for the longer explanation :smiley:

Off of the top of my head, the most straightforward way of doing it would be the following:

  • Inside your pipeline, install Drone CLI
  • Using a shell script (or whatever you prefer), get a list of all running builds for the repository (with Drone CLI), remove the current running one (or whichever ones you’d like to keep running) from the output, and cancel/stop all builds in the remaining output (again, with Drone CLI).

Would something like that work? It’s not the most elegant solution, but I’m not sure if Drone CI has a native feature for what you’re wanting.

Thanks for the tip @hwittenborn! I guess there would be environment variables that I can use to tell the CLI command to limit the returned list of builds returned by pipeline and by stage (I’m just guessing for now). And I suppose I would need to run this build cancellation step at an additional intermediate stage between the “build” and “dev” stage(?).

I suppose you’re referring to drone build ls for getting that list of builds although I’m not yet sure if the --event or the --status option is the right one to use.

I think you’d have to use the --format option or use the API directly, possibly followed by string manipulation.

You should be able to run it anywhere before the dev stage - you just need to check somewhere if any running builds need to be stopped.

1 Like

Makes sense. It would be nice to have something like Github Actions’ concurrency feature or Jenkins’ lockable resources + milestone features/plugins. Then again, this Drone CI approach wouldn’t be so different from the one that I did for CodePipeline where I actually wrote some logic to make it happen. Still, it would be good to have if only to encourage teams to be more conscientious about the WIP management in their pipelines.

Thanks again, @hwittenborn!

Just found this a second ago: How to limit build concurrency per project

No clue if it’ll kill jobs or just wait though.

Doesn’t look like it kills existing jobs as per this comment:

this feature focuses on limiting the number of pipelines that can execute concurrently. You are asking to automatically cancel older pending pipelines when a newer commit is received. These are two separate features.