Steps failing in Kubernetes Native


#4

I get this error (kill signal) only when I use a .drone.yaml containing two pipelines

can you provide an example of a Pipeline that I can use to try and reproduce? Thanks!

About the network issue you mentioned, not sure, I’ve been using Jenkins in “k8s native mode” and never seen any issue.

I’m not sure if / how Jenkins does internal RPC. Perhaps they do not do long lived connections, so it may or may not be comparable to Drone.


#5

The jenkine kubernetes plugin allows users to define pods, rather than jobs. One of the containers in the pod is an agent. Agents communicate with the primary using a protocol they developed, called JNLP. Agents can also be made to communicate over ssh, but this is discouraged with containers. I believe that JNLP/ssh connections remain active, until the agent terminates or some other network event terminates the connection.


#6

The pipelines are very simple.

Two pipelines in .drone.yaml

One pipeline in .drone.yaml

I can confirm now that the model server + agent works quite well on my k8s.

I will be using this approach for now, once drone v1 gets open sourced I will debug it further.

Thanks


#7

Same thing here with Drone version 1.0.0-rc.2

chdir to cwd ("/drone/src") set in config.json failed: no such file or directory

Vinicius, which Drone version are you using now?

Is it the same .drone.yaml for server + agent?


#8

Hi,

I’m still using the version 1.0.0-rc.2 as I need the DRONE_YAML_ENDPOINT settings for my projects.

Yes, I tested the same .drone.yaml for server+agent.

Googling this error shows other projects having the same problem and usually it seems to be a permission issue, I can’t figure it out though.


#9

I will be using this approach for now, once drone v1 gets open sourced I will debug it further.

The Kubernetes source code is available, and I have provided a guide for testing, debugging and contributing here: Contributing to Drone for Kubernetes. Note that I’m testing with Digital Ocean’s Managed Kubernetes and with Docker for mac Kubernetes, and am unable to reproduce the errors mentioned in this thread. Perhaps it is distro specific? It might be helpful to include details about your cluster (gke, eks, version, etc).

chdir to cwd ("/drone/src") set in config.json failed: no such file or directory

the /drone/src directory is a host volume. Perhaps there is a problem mounting host machine volumes in your cluster?


#10

I wanted to quickly post some notes from the gitter channel (so that they don’t get lost)

Initially I thought this might be a network issue (based on the error message) but the error message itself seems like a symptom of a different issue. That issue is that builds are immediately failing because the working dir is not being created automatically on some clusters (more how I came to this conclusion below). The result is the following error:

chdir to cwd ("/drone/src") set in config.json failed: no such file or directory


So first of all, I want to note that this does not impact all Drone users or clusters. I am testing with Digital Ocean’s hosted Kubernetes offering, and with Kubernetes on Docker for mac, and I am not able to reproduce the issue. This leads me to believe it may be a vendor-specific problem (so be sure to post details about your cluster, including version and hosting provider).

Initially I thought perhaps it was a permission issue and that host machine volumes were not being mounted. One of the community members in the gitter channel, who is getting the same error described in this thread, was able to confirm that volumes are in fact being created and mounted:

@brandom: I took a look and my nodes have successfully created folders under /tmp/drone on the host, like /tmp/drone/70bpibaxq8slzssqnlh9yrlyijq561pk/70bpibaxq8slzssqnlh9yrlyijq561pk/

The main difference we found was that in my installation the src directory is automatically created in the volume (below). For individuals getting errors, the src directory was not automatically being created.

$ ls -la /tmp/drone/b7b6svgx50xgg0zc383jead2s0hm8uym/b7b6svgx50xgg0zc383jead2s0hm8uym
total 0
drwxr-xr-x   3 bradrydzewski  wheel  102 Dec 12 15:54 .
drwxr-xr-x   3 bradrydzewski  wheel  102 Dec 12 15:54 ..
drwxr-xr-x  21 bradrydzewski  wheel  714 Dec 12 15:54 src

This does not make sense, because Docker automatically creates the working directory if it does not exist. We can verify this behavior with the below command.

$ docker run -t -i -w /foo/bar/baz alpine:3.8 /bin/sh
/foo/bar/baz
# pwd
> /foo/bar/baz

This has me wondering why it would behave differently for some clusters, but not others. The only thing I could think of is that some Kubernetes clusters use containerD as the runtime engine instead of Docker? Maybe this could explain the difference in behavior? Can anyone confirm or disprove this theory?

Unfortunately because I cannot reproduce this problem it makes it very difficult to triage. I therefore will need assistance from those experiencing the issue to dig a bit deeper and help me understand why the working directory is not being automatically created inside the volume.

Hopefully we can get to the bottom of this soon :slight_smile:


#11

I will leave here my setup

Provider: Joyent Triton
Kubernetes: 1.10.0
Docker: 17.03.2-ce

I’ve run this line docker run -t -i -w /foo/bar/baz alpine:3.8 /bin/sh in one of the nodes and it indeed created the directory.


#12

Hey,

I run into the same Problem on Google Kubernetes Engine and found the problem (or so I think).

When a job is created for step execution a “HostPath” as Volume is generated:

volumes:
- hostPath:
      path: /tmp/drone/hpgos05dyccgek67jss7c9dhxzat5n0d/hpgos05dyccgek67jss7c9dhxzat5n0d
      type: DirectoryOrCreate
  name: zl45xudziwd0abhxcs2azu3p7tyjn0sg

and mounted:

volumeMounts:
- mountPath: /tmp
  name: zl45xudziwd0abhxcs2azu3p7tyjn0sg
workingDir: /tmp/src

Also the workingDir is set to workspace.base/workspace.path from the .drone.yml (or the default value).

The problem with some container engines (like the one from GKE) is that they are not creating the workingDir-Path, if it doesn’t exist. The following error is thrown in that case:
Error: failed to start container "test": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "chdir to cwd (\"/tmp/src\") set in config.json failed: no such file or directory".

For a short term solution you can change the .drone.yml to use only the base like this:

kind: pipeline
name: default
workspace:
  base: /tmp
  path: "."
steps:
- name: build
  image: node
  commands:
  - npm -v

To test if a runtime has this problem I have created the following pod spec:

apiVersion: v1
kind: Pod
metadata: 
  name: "test"
spec: 
  containers:
  - name: test
    image: node
    volumeMounts:
    - mountPath: /tmp
      name: test-volume
    workingDir: "/tmp/src"
    command: 
    - "ls"
    - "/tmp"
    - "-R"
  volumes:
  - hostPath:
      path: /tmp/drone/6r3esqa72x2oksehxaen0iol53gme7nl/6r3esqa72x2oksehxaen0iol53gme7nl
      type: DirectoryOrCreate
    name: test-volume

On a local cluster it works without problems:

NAME                 STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION     CONTAINER-RUNTIME
docker-for-desktop   Ready     master    57d       v1.10.3   <none>        Docker for Mac   4.9.125-linuxkit   docker://18.9.0

On the following GKE-cluster it creates the observed error:

NAME                                      STATUS    ROLES     AGE       VERSION         EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
gke-drone-io-default-pool-e6f774f8-7dc3   Ready     <none>    2h        v1.10.9-gke.5   <redacted>       Container-Optimized OS from Google   4.14.65+         docker://17.3.2
gke-drone-io-default-pool-e6f774f8-r7rk   Ready     <none>    2h        v1.10.9-gke.5   <redacted>       Container-Optimized OS from Google   4.14.65+         docker://17.3.2
gke-drone-io-default-pool-e6f774f8-rgq2   Ready     <none>    2h        v1.10.9-gke.5   <redacted>       Container-Optimized OS from Google   4.14.65+         docker://17.3.2

If you need anything more, please let me know.

Intresting is that docker run -t -i -w /foo/bar/baz alpine:3.8 /bin/sh works on both setups.


Kubernetes - read-only file system
#13

@Tobias is this because GKE is not using CRI-O instead of Docker? I’m trying to figure out why this works on some engines but not others.


#14

As far as I know (and as kubectl get nodes -o wide shows), GKE is using docker as runtime as well. The only thing I can imagine is that GKE is using a custom Kubernetes version (v1.10.9-gke.5) which handles the provision of this kind of volume somehow diffrent.
Another idea is that maybe the container engine doesn’t have enough permissions to create the path and thus resulting in the error.

I want tomorrow to check a few things and try to narrow the problem down, if possible.


#15

@Tobias thanks for helping debug! I am also going to see if I can set the base to /drone and the path to / by default and see if that causes any breaking changes or compatibility problems. If not, we can use this as the default configuration, which should work with kubernetes out-of-the-box.


#16

I was able to narrow it down.
TL;DR: It is fixed in docker version version 18.09.x at least.

As GKE deploys K8s with docker version: Docker version 17.03.2-ce, build f5ec1e2 the problem occurs there. I assume all bigger KaaS provider will have this issue.

The problem is that docker can in these versions not change and create a workingDir inside a mounted volume.
To simulate this the following command can be used:
docker run -t -i -w /foo/bar/baz -v /tmp/test:/foo/bar alpine:3.8 /bin/sh

I started looking through the docker source but couldn’t find the problem yet.

My three options to solve this problem would be:

  • Use a preflight container which generates the path (costs maybe too much time)

  • Enforce users to use newer docker versions (maybe not feasible for KaaS-Solutions; at GKE you can’t get a newer version)

  • Double mount the host volume like this: docker run -t -i -w /foo/bar/baz -v /tmp/test:/foo/bar -v /tmp/test/baz:/foo/bar/baz alpine:3.8 /bin/sh This enforces docker to generate both paths before switching the workingDir. The same can be done in K8s with directoryOrCreate and two volumes and mounts (maybe you don’t even need the mount, just a hostpath-volume).

Anyway I hope this helps you to resolve it and looking forward to it.
Unfortunately I don’t know go/drone-runtime enough to make a dedicated choice here.


#17

awesome! it is great to know the root cause of the problem. I will try to patch the default behavior of drone in the short term, and hopefully as the hosted platforms (GKE, etc) upgrate their docker versions the problem will go away completely.


#18

I also tried to add the “double mount” to a pod spec file and now it works as intended.
For reference (see the two host paths):

apiVersion: v1
kind: Pod
metadata: 
  name: "test"
spec: 
  containers:
  - name: test
    image: node
    volumeMounts:
    - mountPath: /var/tmp/tobias
      name: test-volume
      readOnly: false
    workingDir: "/var/tmp/tobias/src"
    command: 
    - "bash"
    - "-c"
    args: ["ls /var/tmp/tobias -Rlh && sleep 10000"]
    securityContext: 
      privileged: true
  volumes:
  - hostPath:
      path: /var/tmp/tobias/drone/6r3esqa72x2oksehxaen0iol53gme7nl/6r3esqa72x2oksehxaen0iol53gme7nl
      type: DirectoryOrCreate
    name: test-volume
  - hostPath:
      path: /var/tmp/tobias/drone/6r3esqa72x2oksehxaen0iol53gme7nl/6r3esqa72x2oksehxaen0iol53gme7nl/src
      type: DirectoryOrCreate
    name: dummy-volume-to-ensure-workspace-path-exists

#19

Thanks Tobias


#20

Thanks Tobias this is great. Experiencing this on GKE 1.10.7-gke.6 failing on repo clone step:

container_linux.go:247: starting container process caused "chdir to cwd (\"/drone/src\") set in config.json failed: no such file or directory"

Going to upgrade the cluster to 1.10.7-gke.11 will report back


#21

1.10.7-gke.11 did not work


#22

Tobias mentions this is fixed in more recent versions of Docker. He tested with 18.09 and I tested locally with 18.06.1. What version of Docker is 1.10.7-gke.1 running? I think at this point it is helpful to narrow down the minimum version of Docker that is supported.


#23

Yeah correct. I just upgraded again to v1.11.5-gke.5 which is docker://17.3.2. No result. Im going to attempt to change the node image type from Container-Optimised to Ubuntu … long-shot