Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
Loading
from

Conversation

natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented May 5, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 4 commits:

  1. All the changes in move pod admission and resize logic into the allocation manager #131801 (this PR is intended to be rebased when move pod admission and resize logic into the allocation manager #131801 merges)
  2. Untangle the HandlePodResourcesResize unit tests and move them into the allocation package
  3. Add helper methods IsPodResizeInfeasible and IsPodResizeDeferred to the status_manager.
  4. Update the allocation_manager methods to hold all the control logic required for handling pod resize allocation + update the kubelet to no longer attempt to allocate pod resizes in the sync loop (and update unit tests accordingly).
  5. Update the allocation manager's unit tests to cover PushPendingResizes and RetryPendingResizes

The intention of this PR is to reattempt pending resizes:

  • whenever HandlePodAdditions or HandlePodUpdates receives a resize request that it didn't already have,
  • upon deletion of another pod,
  • upon the successful actuation of another resize,
  • or periodically. This PR sets a timer for every 3 minutes, but we should probably think about if that is the right amount of time.

Special notes for your reviewer

Intended follow-ups:

  1. This PR is required for but does not include implementation of prioritized resizes. That is because the PR was already getting a bit too large to review, and because design for prioritized resizes is still pending (KEP-1287: Priority of Resize Requests enhancements#5266). This is also useful as its own standalone change without having prioritized resizes yet, but I left a TODO for that.
  2. Some cleanup (such as moving some unit tests around, unexporting functions that no longer need to be exported, removing some code that's not needed anymore etc), I left some of these things out of this PR to keep the size down

Which issue(s) this PR fixes:

Does not yet fix it, but this is part of #116971.

Does this PR introduce a user-facing change?

NONE

/sig node
/priority important-soon
/triage accepted
/cc @tallclair

TODO:

  • retry deferred resizes in HandlePodCleanups
    • I don't think anything in HandlePodCleanups affects the admission decision (but I could be wrong)? It looks like the admission decision depends on the pod manager as the source of truth (through kl.podManager.GetPods), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary
  • double check the logic in HandlePodAdditions and HandlePodUpdates is correct (maybe add unit tests covering resize cases)
  • allocation manager unit tests
  • need to fix an issue where even when the resize is deferred and not allocated or actuated, the pod status is showing updated allocated and actual resources
  • sanity check with running this e2e locally
  • there seems to be more latency than should be necessary in accepting a pending resize after another pod is scaled down to make room, want to investigate this (but this doesn't necessarily have to be blocking)
  • rebase on move pod admission and resize logic into the allocation manager #131801

@k8s-ci-robot k8s-ci-robot requested a review from tallclair May 5, 2025 16:33
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2025
@natasha41575 natasha41575 changed the title Move resize logic [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Work in progress in SIG Node: code and documentation PRs May 5, 2025
@natasha41575 natasha41575 moved this from Triage to Archive-it in SIG Node CI/Test Board May 5, 2025
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 3 times, most recently from 433aa61 to d52210d Compare May 6, 2025 21:19
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 2 times, most recently from 9b16320 to 42ececf Compare May 7, 2025 20:33
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/kubelet.go Outdated Show resolved Hide resolved
@natasha41575 natasha41575 marked this pull request as ready for review May 7, 2025 21:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: natasha41575
Once this PR has been reviewed and has the lgtm label, please ask for approval from tallclair. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025
@natasha41575 natasha41575 force-pushed the move-resize-logic branch 5 times, most recently from f58302c to 131ef41 Compare May 10, 2025 04:37
@natasha41575 natasha41575 changed the title [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop [FG:InPlacePodVerticalScaling] WIP: Move resize allocation logic out of the sync loop May 12, 2025
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of async code & cross component dependencies here, so I'm being extra careful in reviewing this, and how we're approaching it.

pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
// updatePodResizeConditions checks if a pod resize is currently in progress, and sets
// the PodResizeInProgress condition accordingly. This returns the allocated pod.
func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod {
allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should move this out of the sync loop entirely. Basically the idea would be that the pod worker routine is only ever aware of the allocated pod. Ideally, most of the Kubelet should only ever operate on the allocated pod. Logically, there would be a component that ingests updates from the apiserver, queues resizes as needed, but otherwise swaps the pod to the allocated pod, and everything after that point just operates on the allocated pod. This is part of a much larger refactoring I'm working on, so I'm not sure how feasible this is in the short term. One simple option is to do the update in podWorkers.UpdatePod, which is responsible for queuing work.

Copy link
Contributor Author

@natasha41575 natasha41575 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the call to UpdatePodFromAllocation to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate.

I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:

  • right after allocation in handlePodResourcesResize
  • here:
    if r.Action == kubecontainer.ResizePodInPlace {
    if r.Error == nil {
    // The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition.
    kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true)
    } else {
    kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)
    }

Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?

pkg/kubelet/kubelet.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
func NewManager(
checkpointDirectory string,
statusManager status.Manager,
podResizeMutex *sync.Mutex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing mutexes is a code smell. The fact that this is necessary makes me think we might have the wrong abstraction here.

The other place this is needed is in HandlePodAdditions, but that's because adding a pod is actually an allocation action. I'm thinking that rather than sharing a mutex, we should just move the pod addition allocation logic into the allocation manager. In other words, add a method like AllocatePod (or AddPod, if you prefer), which does:

  1. lock mutex
  2. run admission check (or just the fit check, if we want to split that out from admission)
  3. update the pod allocation, or return a failure if the admission check failed

We might want to discuss this approach more offline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, the offline discussion helped me understand what we are going for

moving the admission check into the allocation manager turned into its own little project so I broke it off into another PR: #131801

If we take #131801, I'll rebase this one on it

pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2025
// updatePodResizeConditions checks if a pod resize is currently in progress, and sets
// the PodResizeInProgress condition accordingly. This returns the allocated pod.
func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod {
allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod)
Copy link
Contributor Author

@natasha41575 natasha41575 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the call to UpdatePodFromAllocation to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate.

I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:

  • right after allocation in handlePodResourcesResize
  • here:
    if r.Action == kubecontainer.ResizePodInPlace {
    if r.Error == nil {
    // The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition.
    kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true)
    } else {
    kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false)
    }

Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?

pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/kubelet.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
pkg/kubelet/allocation/allocation_manager.go Outdated Show resolved Hide resolved
@@ -2516,7 +2524,7 @@ func TestPodResourceAllocationReset(t *testing.T) {
expectedPodResourceInfoMap: state.PodResourceInfoMap{
"3": state.PodResourceInfo{
ContainerResources: map[string]v1.ResourceRequirements{
cpu800mMem800MPodSpec.Containers[0].Name: cpu800mMem800MPodSpec.Containers[0].Resources,
cpu800mMem800MPodSpec.Containers[0].Name: cpu500mMem500MPodSpec.Containers[0].Resources,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unit test is changing because HandlePodAdditions now attempts the resize allocation (whereas it previously didn't)

@@ -71,6 +79,10 @@ type Manager interface {
// TODO: See if we can remove this and just add them in the allocation manager constructor.
AddPodAdmitHandlers(handlers lifecycle.PodAdmitHandlers)

// SetContainerRuntime sets the allocation manager's container runtime.
// TODO: See if we can remove this and just add it in the allocation manager constructor.
SetContainerRuntime(runtime kubecontainer.Runtime)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same energy as #131801 (comment)

@natasha41575 natasha41575 requested a review from tallclair May 30, 2025 16:09
@natasha41575 natasha41575 changed the title [FG:InPlacePodVerticalScaling] WIP: Move resize allocation logic out of the sync loop [FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop May 30, 2025
@k8s-ci-robot
Copy link
Contributor

@natasha41575: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master 501626e link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +2654 to +2655
pod = allocatedPod
pendingResizes = append(pendingResizes, pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pod = allocatedPod
pendingResizes = append(pendingResizes, pod)
pendingResizes = append(pendingResizes, pod)
pod = allocatedPod

@SergeyKanzhelev SergeyKanzhelev moved this from Needs Approver to Needs Reviewer in SIG Node: code and documentation PRs May 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Archive-it
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.