Static CPU Manager can fail with UnexpectedAdmissionError with init-containers requesting integer CPUs

What happened?

The static CPU manager can reserves full CPU cores for (Guaranteed, integer CPU) pods' init containers.

It supports re-using those reserved CPU cores for main containers, but doesn't enforce (or even favour) their re-use, which can lead to CPU core(s) being left reserved to an init-container while the main container uses a different CPU set.

The example deployment provided below can lead to a topology like this:

$ jq . /var/lib/kubelet/cpu_manager_state
{
  "policyName": "static",
  "defaultCpuSet": "0",
  "entries": {
    "3ba83abb-7ceb-45dc-96ec-556fe1640954": {
      "init": "4",              # reserved yet not reused (not in "1,5"), ie. leaked CPU
      "main": "1,5"
    },
    "f2ee3f83-3839-4d19-aa2d-f1b0775b18ca": {
      "init": "2",
      "main": "2,6"
    },
    "f8436276-f7e3-4b90-8824-ccef1313df16": {
      "init": "3",
      "main": "3,7"
    }
  },
  "checksum": 3949393128
}

Then Kubelet and the Kubernetes Scheduler will disagree on the remaining resources for that node.

Pods might get scheduled to that node then rejected with an UnexpectedAdmissionError (Pod Allocate failed due to not enough cpus available to satisfy request, which is unexpected).

What did you expect to happen?

Static CPU Manager only reserving at most max(sum_containers_requests, max(among init containers requests)) (as expected by the scheduler), allowing future scheduled pods to start.

How can we reproduce it (as minimally and precisely as possible)?

Update the deployment's nodeSelector below to target a node with cpuManagerPolicy: static; then look at /var/lib/kubelet/cpu_manager_state and look for leaked "init" CPU cores (or schedule more pods on that node, and see them rejected with UnexpectedAdmissionError).

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: tdeploy
  name: tdeploy
spec:
  replicas: 3
  selector:
    matchLabels:
      app: tdeploy

  template:
    metadata:
      labels:
        app: tdeploy

    spec:
      nodeSelector:
        kubernetes.io/hostname: TEST-NODE-NAME-GOES-HERE

      initContainers:
      - name: init
        image: kubernetes/pause:go

        resources:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 1
            memory: 1Gi

      containers:
      - name: main
        image: kubernetes/pause:go

        resources:
          limits:
            cpu: 2
            memory: 1Gi
          requests:
            cpu: 2
            memory: 1Gi

Anything else we need to know?

No response

Kubernetes version

$ kubectl version  # also tested with a 1.25 kubelet
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-dd.1", GitCommit:"d9705166e190927de148edae148bf46471c7f8d5", GitTreeState:"clean", BuildDate:"2022-03-07T11:53:47Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Static CPU Manager can fail with UnexpectedAdmissionError with init-containers requesting integer CPUs #112228

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Static CPU Manager can fail with UnexpectedAdmissionError with init-containers requesting integer CPUs #112228

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions