I’m running Drone (kubernetes-native) on AWS EKS, with an autoscaler running in the cluster. The hope is that when CPU utilization rises, the autoscaler will trigger new nodes to be added, and jobs will run on the new nodes. At first glance adding
resources with CPU requests would get me what I need. However, some interrelated things seem to be thwarting me:
- pipelines are created with node affinity. Drone ‘sticks’ the pipeline steps to the same node as their services, which I read as the same node as their ‘drone-job-*’ pod
- job controllers are created without resources. The cluster doesn’t have the CPU requests up front, so it places these seemingly anywhere, even on CPU-starved nodes.
So I have autoscaling set up to recognize that a step requires more CPU than available, but it can’t scale up because it’s stuck on the same node due to node affinity (I think):
Scale-up predicate failed: GeneralPredicates predicate mismatch, cannot put [...] on [...], reason: node(s) didn't match node selector
Any guidance here is welcome. I’m going to continue to experiment but I’m running out of ideas.
Essentially, right now it feels like I want to get CPU requests on these