So I just got Drone 1.0 working in Kubernetes-native mode in our cluster. I’m porting one of our pipelines from Drone 0.5. Everything seems to be working, except we are experiencing auth failures about 10-20% of the time on the publish step. Here is the publish step config:
- name: publish image: plugins/ecr settings: access_key: from_secret: aws-iam-access-key secret_key: from_secret: aws-iam-secret-key region: us-west-2 repo: <repo> tags: - latest when: branch: master
As I mentioned, this works a majority of the time. However, 10-20% of the time we get auth failures from this publish step:
2019/03/29 18:19:32 error getting ECR auth: AccessDeniedException: User: arn:aws:sts::<account>:assumed-role/no-op/ca574b3c-no-op is not authorized to perform: ecr:GetAuthorizationToken on resource: * status code: 400, request id: 33039ea1-524f-11e9-82a0-9bba045bfb43
Note that this is trying to get credentials from the instance role. We use kube2iam in our cluster which allows us to attach AWS EC2 roles to pods. The default role is this no-op role mentioned in the error. That means for whatever reason 10-20% of the time the ECR plugin is ignoring the access key and secret key specified in the config and instead falling back to instance role, which doesn’t have ECR permissions (it has 0 permissions).
First I’m wondering if anybody knows what could be causing this to happen intermittently. Second, I’m wondering what type of control we have over pipeline pods that Drone creates. Normally in our Kubernetes cluster we try to solve AWS permissions issues like this with kube2iam. But since drone is creating the jobs, namespaces and pods in the build pipeline we don’t have access to those pods to give them the kube2iam annotations for IAM roles. As far as I’m aware we can’t give them kubernetes service accounts either? Or is there functionality I’m unaware of?