I’m not sure if this is a problem with the runner or something larger in my cluster, but over the last month or two, these failures have become more frequent. Using the drone-kube-runner, during a pipeline run, sometimes a step will fail with no output at all usually or the only output being like:
unable to retrieve container logs for docker://226be60fecae14fa720572ccae3062f530f1d71c9a1dbf67d9b10029b9458c0d
A restart will usually pass but lately, it’s taken several attempts as on a retry sometimes a different step will fail like this. I have my drone logging turned up on the main process and the drone-kube-runner but don’t see any notable errors in the logs. I was using
drone/drone-runner-kube:latest until today when I switched to
drone/drone-runner-kube:1.0.0-beta.3 but still experienced the issue with that. Main Drone version is 1.7.0.
I’m not seeing anything helpful in the kube logs (kube-apiserver, kube-scheduler, kube-controller-manager or even just docker) but it’s a bit like trying to find a needle in the proverbial haystack. I’m not really sure where to go from here. This problem only started occurring after switching to the drone-kube-runner from the legacy kubernetes runner. It’s the running on the same kubernetes cluster.