Broken builds with docker 19.03.13

From this morning we are seeing broken builds when using the gcr plugin:

latest: Pulling from plugins/gcr
Digest: sha256:0c2a1fc28da459cf6cb5a61e724750a5c56eac9d7c92cb1f68cc419cac6615e6
Status: Image is up to date for plugins/gcr:latest
+ /usr/local/bin/dockerd --data-root /var/lib/docker --host=unix:///var/run/docker.sock
Detected registry credentials
+ /usr/local/bin/docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:22:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
time="2020-11-18T11:16:30Z" level=fatal msg="exit status 1"

we are puzzled because other builds that do not rely on it are working fine.
I have eliminated from the causes the nodes (I have re-created them) and we have not upgraded drone recently.

Any suggestion on how to go about debugging this?

1 Like

Looking better at it, the issue seems to be related to the docker version step, which fails to get the server version
running it locally also gives the same error

$ docker run --rm -it  plugins/gcr
+ /usr/local/bin/dockerd --data-root /var/lib/docker --host=unix:///var/run/docker.sock
Registry credentials or Docker config not provided. Guest mode enabled.
+ /usr/local/bin/docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:22:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
FATA[0015] exit status 1    

so the reason is due to a new release of docker that happened today.

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:30 2020
  OS/Arch:          linux/amd64
  Experimental:     false

the last successful run we had was on the previous version

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:30:32 2020
  OS/Arch:          linux/amd64
  Experimental:     false

I’m also having this issue, it strangely doesn’t affect all projects so I’ve been trying to troubleshoot them in particular. What’s changed in this latest docker release that could cause this issue?

We reverted to use the previous docker and containerd, but it is still not working.
So perhaps the upgrade brought in more changes that now simply reverting won’t work.

Strangely I have noticed that the “plugins/gcr” plugin doesn’t seem to have a Dockerfile or a github project in https://github.com/drone-plugins?q=gcr&type=&language=

@josh-stableprice which plugin is it failing for you?

For us it only affects projects that make use of the plugins/gcr plugin, so you might have the same setup and perhaps it’s working on those that do not make use of it

Well the weird issue I’m having is all four my projects use plugin/ecr which I’m lead to believe uses plugin/docker under the hood. Two of them are continuing to work correctly, and the other two are not.

I believe this needs updated! It references the old version of docker.

@bradrydzewski would someone be able to review this PR and help us unblock our builds?

I have tested my patch by building my own image but I don’t seem to get it working anyway.
I think we will be moving away from the DIND approach back to explicit docker commands.

Hey there :wave:

Did you try running the plugin with debug: true ?
In our case, we had a lot of issues related to concurrent iptables access at docker daemon startup, resulting in the docker daemon unable to launch.
Running in debug might give you (and people in this thread) a bit more insights :slight_smile:

I didn’t know we could do that, so I’ve just run with debug and launch_debug on

+ /usr/local/bin/dockerd --data-root /var/lib/docker --host=unix:///var/run/docker.sock
time="2020-11-18T14:25:16.835823089Z" level=info msg="Starting up"
time="2020-11-18T14:25:16.837361836Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
time="2020-11-18T14:25:16.838277236Z" level=info msg="libcontainerd: started new containerd process" pid=41
time="2020-11-18T14:25:16.838394167Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2020-11-18T14:25:16.838542708Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2020-11-18T14:25:16.838587499Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
time="2020-11-18T14:25:16.838651109Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2020-11-18T14:25:16.855560640Z" level=info msg="starting containerd" revision=7ad184331fa3e55e52b890ea95e65ba581ae3429 version=v1.2.13 
time="2020-11-18T14:25:16.855798793Z" level=error msg="failed to change OOM score to -500" error="write /proc/41/oom_score_adj: permission denied" 
time="2020-11-18T14:25:16.856244167Z" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 
time="2020-11-18T14:25:16.856445949Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.856765173Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 
time="2020-11-18T14:25:16.856796483Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.859609004Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 
time="2020-11-18T14:25:16.859641884Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.859802666Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.859993317Z" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.860306441Z" level=info msg="skip loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 
time="2020-11-18T14:25:16.860336711Z" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 
time="2020-11-18T14:25:16.860413642Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" 
time="2020-11-18T14:25:16.860460063Z" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" 
time="2020-11-18T14:25:16.860471413Z" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "ip: can't find device 'aufs'\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n": exit status 1" 
time="2020-11-18T14:25:16.864093361Z" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." type=io.containerd.differ.v1 
time="2020-11-18T14:25:16.864130681Z" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." type=io.containerd.gc.v1 
time="2020-11-18T14:25:16.864246923Z" level=info msg="loading plugin "io.containerd.service.v1.containers-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864270933Z" level=info msg="loading plugin "io.containerd.service.v1.content-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864338464Z" level=info msg="loading plugin "io.containerd.service.v1.diff-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864377634Z" level=info msg="loading plugin "io.containerd.service.v1.images-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864397144Z" level=info msg="loading plugin "io.containerd.service.v1.leases-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864491935Z" level=info msg="loading plugin "io.containerd.service.v1.namespaces-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864511906Z" level=info msg="loading plugin "io.containerd.service.v1.snapshots-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.864597057Z" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." type=io.containerd.runtime.v1 
time="2020-11-18T14:25:16.864793759Z" level=info msg="loading plugin "io.containerd.runtime.v2.task"..." type=io.containerd.runtime.v2 
time="2020-11-18T14:25:16.864970750Z" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." type=io.containerd.monitor.v1 
time="2020-11-18T14:25:16.865439735Z" level=info msg="loading plugin "io.containerd.service.v1.tasks-service"..." type=io.containerd.service.v1 
time="2020-11-18T14:25:16.865483066Z" level=info msg="loading plugin "io.containerd.internal.v1.restart"..." type=io.containerd.internal.v1 
time="2020-11-18T14:25:16.865588917Z" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865610567Z" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865627207Z" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865707848Z" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865781689Z" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865815430Z" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865833049Z" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.865942381Z" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.866019402Z" level=info msg="loading plugin "io.containerd.internal.v1.opt"..." type=io.containerd.internal.v1 
time="2020-11-18T14:25:16.866321715Z" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.866363645Z" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.866442166Z" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.866471257Z" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." type=io.containerd.grpc.v1 
time="2020-11-18T14:25:16.866729760Z" level=info msg=serving... address="/var/run/docker/containerd/containerd-debug.sock" 
time="2020-11-18T14:25:16.866892470Z" level=info msg=serving... address="/var/run/docker/containerd/containerd.sock" 
time="2020-11-18T14:25:16.866917761Z" level=info msg="containerd successfully booted in 0.012009s" 
time="2020-11-18T14:25:16.874363311Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2020-11-18T14:25:16.874400711Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2020-11-18T14:25:16.874426581Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
time="2020-11-18T14:25:16.874644454Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2020-11-18T14:25:16.875551583Z" level=info msg="parsed scheme: \"unix\"" module=grpc
time="2020-11-18T14:25:16.875581073Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
time="2020-11-18T14:25:16.875602574Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
time="2020-11-18T14:25:16.875713505Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
time="2020-11-18T14:25:16.879748238Z" level=error msg="failed to mount overlay: operation not permitted" storage-driver=overlay2
time="2020-11-18T14:25:16.888010497Z" level=error msg="AUFS was not found in /proc/filesystems" storage-driver=aufs
time="2020-11-18T14:25:16.888856555Z" level=error msg="failed to mount overlay: operation not permitted" storage-driver=overlay
time="2020-11-18T14:25:16.888886695Z" level=error msg="Failed to built-in GetDriver graph devicemapper /var/lib/docker"
time="2020-11-18T14:25:16.907542584Z" level=info msg="Loading containers: start."
time="2020-11-18T14:25:16.912073582Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge                172032  1 br_netfilter\nstp                    16384  1 bridge\nllc                    16384  2 bridge,stp\nipv6                  524288 1230 nf_conntrack_ipv6,nf_nat_ipv6,ip_vs,bridge,[permanent]\nip: can't find device 'br_netfilter'\nbr_netfilter           24576  0 \nbridge                172032  1 br_netfilter\nmodprobe: can't change directory to '/lib/modules': No such file or directory\n, error: exit status 1"
time="2020-11-18T14:25:16.912968823Z" level=warning msg="Running iptables --wait -t nat -L -n failed with message: `iptables v1.8.3 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)\nPerhaps iptables or your kernel needs to be upgraded.`, error: exit status 3"
time="2020-11-18T14:25:16.926797910Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2020-11-18T14:25:16.927293215Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
time="2020-11-18T14:25:16.927298935Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.3 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
 (exit status 3)
Detected registry credentials
+ /usr/local/bin/docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:22:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
time="2020-11-18T14:25:32Z" level=fatal msg="exit status 1"

Thanks for sharing the debug logs.

failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.3 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)

Are you sure you are running your plugin in privileged mode ? :thinking:
I think plugin/grc is not whitelisted as a privileged image by default, but you might be able to configure it on your runner, or just use privileged: true on your drone step

This isn’t plugin/grc, it’s plugin/ecr which is whitelisted if I remember correctly

Edit: So I can’t talk anymore in this thread because I’ve hit my “reply limit”… wtf…

Anyway, so I’m running drone on K8’s, with the kubernetes runner and my base image for the nodes with docker is from October 7th. So I’m starting to wonder if that needs to be updated instead

1 Like

I’ve been running gcr for months and never had to whitelist.
Do you think the new upgrade of docker changed things?

@josh-stableprice we changed our autoscaler to use

DRONE_AGENT_IMAGE: drone/drone-runner-docker:1.5.3

and it all started to work

we believe it has something to do with:

FYI I run everything with privileged and it unfortunately didn’t fix for me

Well… interestingly seems like I’ll have to wait for this to be resolved upstream as drone-runner-kube doesn’t seem to be have been getting versioned releases for the last month that I can point to