Description
What happened?
When kubelet detects that it's under resource pressure, it first attempts to do soft evictions, until the hard eviction threshold is reached. When a pod is soft-evicted, it respects the configured max pod grace period seconds, and until the pod has shut down, kubelet will not attempt to soft OR hard evict another pod, even if the hard eviction threshold is reached.
As a result, one pod taking a long time to shut down can cause kubelet to run out of resources. From this comment and this comment this behavior seems to be by design
In our case, we saw one soft eviction take 7 hours to complete, and meanwhile, resources usage kept climbing without any automation trying to save the node. Had other pods gotten soft evicted while this pod shut down, this would not be an issue. Manual intervention prevented it from reaching hard-eviction thresholds, but had that not happened, this would have entirely exhausted the node with no automated action
What did you expect to happen?
I would expect that kubelet would keep trying to soft evict other pods if one is taking a long time to shut down. Or at the very least, start hard evicting pods if the hard eviction threshold is reached. It could also hard-evict the pod that was soft-evicted but is taking a long time to shut down.
How can we reproduce it (as minimally and precisely as possible)?
- Create two pods that get scheduled to the same node that have emptyDir volumes and a prestop hook that just sleeps forever
- Start filling up those emptyDir volumes with
dd
until soft eviction threshold is reached - Watch as kubelet soft-evicts one pod
- Continue filling up the emptyDir volumes with
dd
- Kubelet will not evict (hard or soft) even as the resource is totally exhausted
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
Client Version: v1.27.11
Kustomize Version: v5.0.1
Server Version: v1.27.11
Cloud provider
OS version
AlmaLinux9/CentOS Stream 8
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status