Description
What happened?
The static CPU manager can reserves full CPU cores for (Guaranteed, integer CPU) pods' init containers.
It supports re-using those reserved CPU cores for main containers, but doesn't enforce (or even favour) their re-use, which can lead to CPU core(s) being left reserved to an init-container while the main container uses a different CPU set.
The example deployment provided below can lead to a topology like this:
$ jq . /var/lib/kubelet/cpu_manager_state
{
"policyName": "static",
"defaultCpuSet": "0",
"entries": {
"3ba83abb-7ceb-45dc-96ec-556fe1640954": {
"init": "4", # reserved yet not reused (not in "1,5"), ie. leaked CPU
"main": "1,5"
},
"f2ee3f83-3839-4d19-aa2d-f1b0775b18ca": {
"init": "2",
"main": "2,6"
},
"f8436276-f7e3-4b90-8824-ccef1313df16": {
"init": "3",
"main": "3,7"
}
},
"checksum": 3949393128
}
Then Kubelet and the Kubernetes Scheduler will disagree on the remaining resources for that node.
Pods might get scheduled to that node then rejected with an UnexpectedAdmissionError
(Pod Allocate failed due to not enough cpus available to satisfy request, which is unexpected
).
What did you expect to happen?
Static CPU Manager only reserving at most max(sum_containers_requests, max(among init containers requests))
(as expected by the scheduler), allowing future scheduled pods to start.
How can we reproduce it (as minimally and precisely as possible)?
Update the deployment's nodeSelector
below to target a node with cpuManagerPolicy: static
; then look at /var/lib/kubelet/cpu_manager_state
and look for leaked "init" CPU cores (or schedule more pods on that node, and see them rejected with UnexpectedAdmissionError
).
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tdeploy
name: tdeploy
spec:
replicas: 3
selector:
matchLabels:
app: tdeploy
template:
metadata:
labels:
app: tdeploy
spec:
nodeSelector:
kubernetes.io/hostname: TEST-NODE-NAME-GOES-HERE
initContainers:
- name: init
image: kubernetes/pause:go
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
containers:
- name: main
image: kubernetes/pause:go
resources:
limits:
cpu: 2
memory: 1Gi
requests:
cpu: 2
memory: 1Gi
Anything else we need to know?
No response
Kubernetes version
$ kubectl version # also tested with a 1.25 kubelet
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-dd.1", GitCommit:"d9705166e190927de148edae148bf46471c7f8d5", GitTreeState:"clean", BuildDate:"2022-03-07T11:53:47Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Labels
Type
Projects
Status