We’ve had a number of problems with hanging builds. That is to say, a build that is still in a “running” status (in some parts of the UI at least) long after it should have timed out.
I’m looking at one now that started over 29 hours ago, and the build page shows it as running. It is not in the “recent builds” panel, but it is in
I suspect this happens if the runner process exits unexpectedly. I’ve found old containers on nodes from long abandoned builds (usually service containers I think).
My best way to find these is to look at
/api/queue and filter for old builds. I then have to scan repos and builds for the correct ID, since the queue doesn’t give me the numbers the rest of the api endpoints need. Then I can look at them and press the cancel button. Its a bit of an involved script, and I’d rather not need to depend on it.
Is this kind of zombie reaping something drone server should be doing?