Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Soft eviction of pods with long grace periods blocks hard evictions when under resource pressure #123872

Copy link
Copy link
Open
@olyazavr

Description

@olyazavr
Issue body actions

What happened?

When kubelet detects that it's under resource pressure, it first attempts to do soft evictions, until the hard eviction threshold is reached. When a pod is soft-evicted, it respects the configured max pod grace period seconds, and until the pod has shut down, kubelet will not attempt to soft OR hard evict another pod, even if the hard eviction threshold is reached.

As a result, one pod taking a long time to shut down can cause kubelet to run out of resources. From this comment and this comment this behavior seems to be by design

In our case, we saw one soft eviction take 7 hours to complete, and meanwhile, resources usage kept climbing without any automation trying to save the node. Had other pods gotten soft evicted while this pod shut down, this would not be an issue. Manual intervention prevented it from reaching hard-eviction thresholds, but had that not happened, this would have entirely exhausted the node with no automated action

What did you expect to happen?

I would expect that kubelet would keep trying to soft evict other pods if one is taking a long time to shut down. Or at the very least, start hard evicting pods if the hard eviction threshold is reached. It could also hard-evict the pod that was soft-evicted but is taking a long time to shut down.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create two pods that get scheduled to the same node that have emptyDir volumes and a prestop hook that just sleeps forever
  2. Start filling up those emptyDir volumes with dd until soft eviction threshold is reached
  3. Watch as kubelet soft-evicts one pod
  4. Continue filling up the emptyDir volumes with dd
  5. Kubelet will not evict (hard or soft) even as the resource is totally exhausted

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.27.11
Kustomize Version: v5.0.1
Server Version: v1.27.11

Cloud provider

aws

OS version

AlmaLinux9/CentOS Stream 8

Install tools

Container runtime (CRI) and version (if applicable)

cri-o 1.27.0 and containerd 1.6.21 (we use a mix of both)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.Important over the long term, but may not be staffed and/or may need multiple releases to complete.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Triaged
    Show more project fields

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.