1

I have a bit of a unique use-case where I want to run a large number (thousands to tens of thousands) of Kubernetes Jobs at once. Each job consists of a single container, Parallelism 1 and Completions 1, with no side-car or agent. My cluster has plenty of capacity for the resources I'm requesting.

My problem is that the Job status is not transitioning to Complete for a significant period of time when I run many jobs concurrently.

My application submits Jobs and has a watcher on the namespace - as soon as a Job's status transitions to 'succeeded 1', we delete the Job and send information back to the application. The application needs this to happen as soon as possible in order to define and submit subsequent Jobs.

I'm able to submit new Job requests as fast as I want, and Pod scheduling happens without delay, but beyond about one or two hundred concurrent Jobs I get significant delay between a Job's Pod completing and the Job's status updating to Complete. At only around 1,000 jobs in the cluster, it can easily take 5-10 minutes for a Job status to update.

This tells me there is some process in the Kubernetes Control Plane that needs more resources to process Pod completion events more rapidly, or a configuration option that enables it to process more tasks in parallel. However, my system monitoring tools have not yet been able to identify any Control Plane services that are maxing out their available resources while the cluster processes the backlog, and all other operations on the cluster appear to be normal.

My question is - where should I look for system resource or configuration bottlenecks? I don't know enough about Kubernetes to know exactly what components are responsible for updating a Job's status.

1 Answers1

0

After digging into the system for a while, I was able to resolve this issue by tuning the kube-controller CLI flags to allow it to use more resources.

As a correction to my original post, I found that new Jobs also had a delay in the Pod object being created. The scheduler was responsive, but it could take up to 90s just for the Pod object to exist to be scheduled. The controller is responsible for creating the Pod object when you create a Job, and for updating the Job when the Pod is completed.

I found docs on the flags here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/

Specifically I set

--kube-api-qps=500
--kube-api-burst=1000
--concurrent-deployment-syncs=50
--concurrent-gc-syncs=50
--concurrent_rc_syncs=100

And was able to handle 1000 concurrent Jobs.