We have a use case for a feature that we were wondering if the community would be open to:
We have several users today that will have a certain step in their pipeline fail for any number of reasons… Maybe a timeout waiting for an external dependency. Maybe there was a hiccup in network communication. Regardless of the reasons, we were wondering how the community would feel about implementing a
retry feature for Drone. We think this is how it would appear in the
pipeline: foo: image: alpine retries: 5 commands: - echo foo
Drone by default does not retry any step (native behavior) but upon adding a
retries attribute, it will retry the step. We feel on the Drone side, that sane limits should be set for number of retries as well as wait time before attempting each retry. However a user would be able to override those (to an extent) like:
pipeline: foo: image: alpine retries: 5 retry_wait: 10 commands: - echo foo
In the above example, we have explicitly set the number of times the step will be retried as 5 and the agent will wait 10 seconds before retrying each time it tries to execute the step.
Now in order to give added configurability to managing Drone, we feel that each limit could be sourced from an environment variable like
DRONE_RETRY_WAIT_LIMIT. However the Drone server would come baked with its own default limits so if the server is spun up without those variables set, it is still protected.
Other considerations to keep in mind are what should happen if a user sets the retry count or retry wait period above what the limits are. We see two possible outcomes:
- The step will error out with a message notifying them they set an invalid
- The step would just inherit the defaults from the environment variables
Looking for comments, questions, concerns etc…
We’d be willing to do the PRs for the work! /cc @jmccann