I do remember reports of a race condition where if the agent starts before Docker, that the connection is never established. This happens if you use a Docker-in-Docker container in the same pod. So that is something worth looking into …
At this point the debugging is pretty much just to scratch my own itch of getting it working. The Kubernetes integration works well so I can run with that for now and look at getting some stand alone agents set up later.
For more context, the reason I am not a huge fan of running Agents on Kubernetes is because the Drone resource scheduler and Kubernetes resource scheduler conflict with each other. Also Kuberetes does weird things with networking and does not always work well with user-defined Docker networks. I have seen people waste a lot of time dealing with these issues.
The native Kubernetes runtime avoids many of the issues but it is still very experimental. It runs quite well on some Kubernetes distros (DigitalOcean, MiniKube) and quite poorly on others (EKS, OpenShift) and in some cases (GKE) varies depending on version and container runtime (containerd, docker, etc). Eventually this will be the recommended way to run pipelines on Kubernetes, and one day, it may be the only way. I could definitely see disabling agents from running on Kubernetes entirely in favor of the native Kubernetes runtime.
On a side note, do you know if there is a way to set a TTL on the Kubernetes jobs so that they get cleaned up automatically after finishing?
yes, you have to enable this feature gate: