Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running

#1

One root cause of this error is that Docker socket not being properly mounted into the agent. Here are some things to check:

  1. check the agents --volume=/var/run/docker.sock:/var/run/docker.sock
  2. make sure the mapping is correct
  3. make sure the socket exists on the host machine. Some linux distributions, such as centos and coreos, may write the socket at a different location.
  4. make sure Drone does not start before the Docker socket is initialized
  5. if Docker restarts, you must restart the agent. Note that Docker could randomly restart due to a panic (there are documented issues in the moby issue tracker)
  6. please make sure you are running the very latest Docker version (and Drone for that matter). This helps avoid troubleshooting issues that have already been resolved :slight_smile:

Another common root cause is people trying to use agents, but not properly configuring the server for multi-machine mode. You must explicitly configure the Drone server to use agents, otherwise the Drone server defaults to single-machine mode. When running in single-machine mode, the Drone server attempts to connect to the host machine Docker socket to launch builds instead of delegating to agents. If you are trying to configure Drone with agents, please double-check your configuration and ensure you are passing DRONE_AGENTS_ENABLED=true to the server.


0 Likes

#2

Thanks to all contributors for Drone.
I have drone/drone:1.0.0 and drone/agent:1.0.0 running via docker-compose on a Google Cloud n1-standard-1 (1 vCPU, 3.75 GB memory) Instance, 20GB free disk.
I’m looking for suggestions on how to diagnose the following:

  • The instance has been semi-usable for me for several days. On each github repo push, a task is correctly created, but about 6 out of 8 times the task stops immediately with the “default: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?” message.
  • I click restart for up to 10 times, until I see that the cloning begins. It seems I’m always able to make the [clone, postgres service, and the build steps] run, as expected, provided I’m patient enough with restarting.

Facts:

  • volume is mounted
  • socket is present:
    srw-rw---- 1 root docker 0 Mar 22 11:25 /var/run/docker.sock
  • uname -a:
    linux drone-runner-1 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

Because I can always eventually get a successful build, configuration seems to be fine. So what causes the unreliability?

Does anyone see the same behaviour? What should I be looking at, to diagnose and resolve this?

0 Likes

#3

This error comes directly from the Docker Client source code. The client will throw the error if any of the following happens:

  1. there is a request timeout to the Docker daemon
  2. there is a connection refused error when trying to connect to the deamon (I think this error is only when using TCP)
  3. there is a “dial unix: no such file or directory” error

You can trace the error in the Docker client code here and here.

If we look at the Drone source code we see there is very little surface area. For example, below is the code used to connect to Docker [1]. At just 4 lines of code there is very little opportunity for error within the Drone codebase.

cli, err := docker.NewEnvClient()
if err != nil {
  return nil, err
}

I do recall one individual solved this problem by upgrading Docker to the very latest version. There are documented problems in the moby issue tracker with people getting this error [2]. And there are documented issues in the GitLab issue tracker where they have seen the same error with the GitLab runner [3].

So given that a) I cannot reproduce this problem locally or at cloud.drone.io and b) users of other systems (e.g. gitlab) are experiencing the problem, I am operating under the assumption that this is most likely an issue with Docker or perhaps a host machine configuration issue. I am ready to help if there are actionable improvements we can make to Drone, however, unless we identify an issue with Drone there is unfortunately little I can do on my end.

[1] https://github.com/drone/drone-runtime/blob/master/engine/docker/docker.go#L40:L46
[2] https://github.com/moby/moby/search?q=Cannot+connect+to+the+Docker+daemon+at&type=Issues
[3] https://gitlab.com/gitlab-org/gitlab-runner/issues/1986

0 Likes

#4

Thanks, Brad, for the precise and valued response.
I will follow your suggestions, and report progress here.
–r

0 Likes

#5

@topiaruss also I updated the original post to include this second common root cause. You might want to check to see if this applies to your installation.

0 Likes

#6

Bingo!
I did NOT have that flag set.
Adding it seems to have made an improvement. I’ll confirm later, after some more experience.
I needed to docker-compose down, then up, to ensure new env variables. Then it seemed fine.
Thanks for coming back to this!
–r.

0 Likes

#7

Yes! My intermittent problem has gone, since I added:

DRONE_AGENTS_ENABLED=true

to the server environment.

Thanks again. Really enjoying Drone.

0 Likes