[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612

natasha41575 · May 5, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 4 commits:

All the changes in move pod admission and resize logic into the allocation manager #131801 (this PR is intended to be rebased when move pod admission and resize logic into the allocation manager #131801 merges)
Untangle the HandlePodResourcesResize unit tests and move them into the allocation package
Add helper methods IsPodResizeInfeasible and IsPodResizeDeferred to the status_manager.
Update the allocation_manager methods to hold all the control logic required for handling pod resize allocation + update the kubelet to no longer attempt to allocate pod resizes in the sync loop (and update unit tests accordingly).
Update the allocation manager's unit tests to cover PushPendingResizes and RetryPendingResizes

The intention of this PR is to reattempt pending resizes:

whenever HandlePodAdditions or HandlePodUpdates receives a resize request that it didn't already have,
upon deletion of another pod,
upon the successful actuation of another resize,
or periodically. This PR sets a timer for every 3 minutes, but we should probably think about if that is the right amount of time.

Special notes for your reviewer

Intended follow-ups:

This PR is required for but does not include implementation of prioritized resizes. That is because the PR was already getting a bit too large to review, and because design for prioritized resizes is still pending (KEP-1287: Priority of Resize Requests enhancements#5266). This is also useful as its own standalone change without having prioritized resizes yet, but I left a TODO for that.
Some cleanup (such as moving some unit tests around, unexporting functions that no longer need to be exported, removing some code that's not needed anymore etc), I left some of these things out of this PR to keep the size down

Which issue(s) this PR fixes:

Does not yet fix it, but this is part of #116971.

Does this PR introduce a user-facing change?

NONE

/sig node
/priority important-soon
/triage accepted
/cc @tallclair

TODO:

~~retry deferred resizes in HandlePodCleanups~~
- I don't think anything in HandlePodCleanups affects the admission decision (but I could be wrong)? It looks like the admission decision depends on the pod manager as the source of truth (through kl.podManager.GetPods), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary
double check the logic in HandlePodAdditions and HandlePodUpdates is correct (maybe add unit tests covering resize cases)
allocation manager unit tests
need to fix an issue where even when the resize is deferred and not allocated or actuated, the pod status is showing updated allocated and actual resources
sanity check with running this e2e locally
there seems to be more latency than should be necessary in accepting a pending resize after another pod is scaled down to make room, want to investigate this (but this doesn't necessarily have to be blocking)
rebase on move pod admission and resize logic into the allocation manager #131801

k8s-ci-robot · May 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

pkg/kubelet/allocation/allocation_manager.go

pkg/kubelet/kubelet.go

k8s-ci-robot · May 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: natasha41575
Once this PR has been reviewed and has the lgtm label, please ask for approval from tallclair. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/kubelet/OWNERS
test/e2e/node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tallclair

There's a lot of async code & cross component dependencies here, so I'm being extra careful in reviewing this, and how we're approaching it.

pkg/kubelet/allocation/allocation_manager.go

tallclair · May 7, 2025

pkg/kubelet/kubelet.go

+// updatePodResizeConditions checks if a pod resize is currently in progress, and sets
+// the PodResizeInProgress condition accordingly. This returns the allocated pod.
+func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod {
+	allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod)


I'm wondering if we should move this out of the sync loop entirely. Basically the idea would be that the pod worker routine is only ever aware of the allocated pod. Ideally, most of the Kubelet should only ever operate on the allocated pod. Logically, there would be a component that ingests updates from the apiserver, queues resizes as needed, but otherwise swaps the pod to the allocated pod, and everything after that point just operates on the allocated pod. This is part of a much larger refactoring I'm working on, so I'm not sure how feasible this is in the short term. One simple option is to do the update in podWorkers.UpdatePod, which is responsible for queuing work.

I moved the call to UpdatePodFromAllocation to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate.

I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:

right after allocation in handlePodResourcesResize

here:

kubernetes/pkg/kubelet/kubelet.go

Lines 2048 to 2054 in b98b86b

if r.Action == kubecontainer.ResizePodInPlace {

if r.Error == nil {

// The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition.

kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true)

} else {

kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)

}

Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?

pkg/kubelet/kubelet.go

pkg/kubelet/allocation/allocation_manager.go

tallclair · May 12, 2025

pkg/kubelet/allocation/allocation_manager.go

+func NewManager(
+	checkpointDirectory string,
+	statusManager status.Manager,
+	podResizeMutex *sync.Mutex,


Passing mutexes is a code smell. The fact that this is necessary makes me think we might have the wrong abstraction here.

The other place this is needed is in HandlePodAdditions, but that's because adding a pod is actually an allocation action. I'm thinking that rather than sharing a mutex, we should just move the pod addition allocation logic into the allocation manager. In other words, add a method like AllocatePod (or AddPod, if you prefer), which does:

lock mutex

run admission check (or just the fit check, if we want to split that out from admission)

update the pod allocation, or return a failure if the admission check failed

We might want to discuss this approach more offline.

thanks, the offline discussion helped me understand what we are going for

moving the admission check into the allocation manager turned into its own little project so I broke it off into another PR: #131801

If we take #131801, I'll rebase this one on it

pkg/kubelet/allocation/allocation_manager.go

natasha41575 · May 20, 2025

pkg/kubelet/kubelet.go

+// updatePodResizeConditions checks if a pod resize is currently in progress, and sets
+// the PodResizeInProgress condition accordingly. This returns the allocated pod.
+func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod {
+	allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod)


I moved the call to UpdatePodFromAllocation to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate.

I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:

right after allocation in handlePodResourcesResize

here:

kubernetes/pkg/kubelet/kubelet.go

Lines 2048 to 2054 in b98b86b

if r.Action == kubecontainer.ResizePodInPlace {

if r.Error == nil {

// The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition.

kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true)

} else {

kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)

}

Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?

pkg/kubelet/allocation/allocation_manager.go

pkg/kubelet/kubelet.go

pkg/kubelet/allocation/allocation_manager.go

natasha41575 · May 21, 2025

pkg/kubelet/kubelet_test.go

@@ -2516,7 +2524,7 @@ func TestPodResourceAllocationReset(t *testing.T) {
 			expectedPodResourceInfoMap: state.PodResourceInfoMap{
 				"3": state.PodResourceInfo{
 					ContainerResources: map[string]v1.ResourceRequirements{
-						cpu800mMem800MPodSpec.Containers[0].Name: cpu800mMem800MPodSpec.Containers[0].Resources,
+						cpu800mMem800MPodSpec.Containers[0].Name: cpu500mMem500MPodSpec.Containers[0].Resources,


this unit test is changing because HandlePodAdditions now attempts the resize allocation (whereas it previously didn't)

natasha41575 · May 21, 2025

pkg/kubelet/allocation/allocation_manager.go

@@ -71,6 +79,10 @@ type Manager interface {
 	// TODO: See if we can remove this and just add them in the allocation manager constructor.
 	AddPodAdmitHandlers(handlers lifecycle.PodAdmitHandlers)

+	// SetContainerRuntime sets the allocation manager's container runtime.
+	// TODO: See if we can remove this and just add it in the allocation manager constructor.
+	SetContainerRuntime(runtime kubecontainer.Runtime)


same energy as #131801 (comment)

…ng resizes

k8s-ci-robot · May 30, 2025

@natasha41575: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit-windows-master	`501626e`	link	false	`/test pull-kubernetes-unit-windows-master`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tallclair · May 30, 2025

pkg/kubelet/kubelet.go

+					pod = allocatedPod
+					pendingResizes = append(pendingResizes, pod)


Suggested change

pod = allocatedPod

pendingResizes = append(pendingResizes, pod)

pendingResizes = append(pendingResizes, pod)

pod = allocatedPod

k8s-ci-robot requested a review from tallclair May 5, 2025 16:33

github-project-automation bot added this to SIG Node: code and documentation PRs May 5, 2025

github-project-automation bot moved this to Triage in SIG Node: code and documentation PRs May 5, 2025

k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2025

github-project-automation bot added this to SIG Node CI/Test Board May 5, 2025

github-project-automation bot moved this to Triage in SIG Node CI/Test Board May 5, 2025

natasha41575 changed the title ~~Move resize logic~~ [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 5, 2025

github-project-automation bot added this to SIG Node: In Place Pod Vertical Scaling May 5, 2025

natasha41575 moved this from Triage to Work in progress in SIG Node: code and documentation PRs May 5, 2025

natasha41575 moved this from Triage to Archive-it in SIG Node CI/Test Board May 5, 2025

natasha41575 force-pushed the move-resize-logic branch 3 times, most recently from 433aa61 to d52210d Compare May 6, 2025 21:19

natasha41575 mentioned this pull request May 7, 2025

[FG:InPlacePodVerticalScaling] Metrics #131648

Open

natasha41575 force-pushed the move-resize-logic branch 2 times, most recently from 9b16320 to 42ececf Compare May 7, 2025 20:33

natasha41575 commented May 7, 2025

View reviewed changes

pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved

pkg/kubelet/kubelet.go Outdated Show resolved Hide resolved

natasha41575 marked this pull request as ready for review May 7, 2025 21:30

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025

This was referenced May 8, 2025

Add container resources to crashloop backoff key #131681

Merged

[FG:InPlacePodVerticalScaling] [FG:PodObservedGenerationTracking] fix observedGeneration in pod resize conditions #131157

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025

natasha41575 force-pushed the move-resize-logic branch from 42ececf to 7119c3e Compare May 9, 2025 20:05

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025

natasha41575 force-pushed the move-resize-logic branch 5 times, most recently from f58302c to 131ef41 Compare May 10, 2025 04:37

natasha41575 changed the title ~~[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop~~ [FG:InPlacePodVerticalScaling] WIP: Move resize allocation logic out of the sync loop May 12, 2025

tallclair reviewed May 12, 2025

View reviewed changes

natasha41575 mentioned this pull request May 15, 2025

move pod admission and resize logic into the allocation manager #131801

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2025

natasha41575 added 3 commits May 21, 2025 19:15

move pod admission and resize logic into the allocation manager

056ad56

move handlePodResourcesResize unit tests into allocation_manager_test

855c6e7

add some additional helpers to kubelet status manager

18b46ff

natasha41575 force-pushed the move-resize-logic branch from 131ef41 to 411dc5d Compare May 21, 2025 19:58

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2025

natasha41575 commented May 21, 2025

View reviewed changes

natasha41575 force-pushed the move-resize-logic branch from 411dc5d to 46ccec3 Compare May 27, 2025 19:55

natasha41575 added 2 commits May 30, 2025 15:56

move resize allocation out of the sync loop

4262a8f

update allocation manager unit tests to cover push and retry of pendi…

501626e

…ng resizes

natasha41575 force-pushed the move-resize-logic branch from cda84c5 to 501626e Compare May 30, 2025 15:56

natasha41575 requested a review from tallclair May 30, 2025 16:09

natasha41575 changed the title ~~[FG:InPlacePodVerticalScaling] WIP: Move resize allocation logic out of the sync loop~~ [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 30, 2025

tallclair reviewed May 30, 2025

View reviewed changes

SergeyKanzhelev moved this from Needs Approver to Needs Reviewer in SIG Node: code and documentation PRs May 31, 2025

	if r.Action == kubecontainer.ResizePodInPlace {
	if r.Error == nil {
	// The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition.
	kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true)
	} else {
	kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)
	}

		pod = allocatedPod
		pendingResizes = append(pendingResizes, pod)

Search code, repositories, users, issues, pull requests...

[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612

Are you sure you want to change the base?

[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612

Conversation

natasha41575 commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

k8s-ci-robot commented May 9, 2025

Uh oh!

tallclair left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tallclair May 7, 2025

Choose a reason for hiding this comment

Uh oh!

natasha41575 May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tallclair May 12, 2025

Choose a reason for hiding this comment

Uh oh!

natasha41575 May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natasha41575 May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natasha41575 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

natasha41575 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented May 30, 2025

Uh oh!

tallclair May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

natasha41575 commented May 5, 2025 •

edited

Loading

natasha41575 May 20, 2025 •

edited

Loading

natasha41575 May 20, 2025 •

edited

Loading