-
Notifications
You must be signed in to change notification settings - Fork 40.6k
[FG:InPlacePodVerticalScaling] Move resize allocation logic out of the sync loop #131612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
433aa61
to
d52210d
Compare
9b16320
to
42ececf
Compare
42ececf
to
7119c3e
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: natasha41575 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f58302c
to
131ef41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of async code & cross component dependencies here, so I'm being extra careful in reviewing this, and how we're approaching it.
pkg/kubelet/kubelet.go
Outdated
// updatePodResizeConditions checks if a pod resize is currently in progress, and sets | ||
// the PodResizeInProgress condition accordingly. This returns the allocated pod. | ||
func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod { | ||
allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should move this out of the sync loop entirely. Basically the idea would be that the pod worker routine is only ever aware of the allocated pod. Ideally, most of the Kubelet should only ever operate on the allocated pod. Logically, there would be a component that ingests updates from the apiserver, queues resizes as needed, but otherwise swaps the pod to the allocated pod, and everything after that point just operates on the allocated pod. This is part of a much larger refactoring I'm working on, so I'm not sure how feasible this is in the short term. One simple option is to do the update in podWorkers.UpdatePod
, which is responsible for queuing work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the call to UpdatePodFromAllocation
to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate
.
I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:
- right after allocation in handlePodResourcesResize
- here:
kubernetes/pkg/kubelet/kubelet.go
Lines 2048 to 2054 in b98b86b
if r.Action == kubecontainer.ResizePodInPlace { if r.Error == nil { // The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition. kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true) } else { kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false) }
Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?
func NewManager( | ||
checkpointDirectory string, | ||
statusManager status.Manager, | ||
podResizeMutex *sync.Mutex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing mutexes is a code smell. The fact that this is necessary makes me think we might have the wrong abstraction here.
The other place this is needed is in HandlePodAdditions
, but that's because adding a pod is actually an allocation action. I'm thinking that rather than sharing a mutex, we should just move the pod addition allocation logic into the allocation manager. In other words, add a method like AllocatePod
(or AddPod, if you prefer), which does:
- lock mutex
- run admission check (or just the fit check, if we want to split that out from admission)
- update the pod allocation, or return a failure if the admission check failed
We might want to discuss this approach more offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
131ef41
to
411dc5d
Compare
pkg/kubelet/kubelet.go
Outdated
// updatePodResizeConditions checks if a pod resize is currently in progress, and sets | ||
// the PodResizeInProgress condition accordingly. This returns the allocated pod. | ||
func (kl *Kubelet) updatePodResizeConditions(pod *v1.Pod, podStatus *kubecontainer.PodStatus) *v1.Pod { | ||
allocatedPod, _ := kl.allocationManager.UpdatePodFromAllocation(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the call to UpdatePodFromAllocation
to podWorkers.UpdatePod, to modify the pod in status.pendingUpdate
.
I also took out the check for whether the pod resize is in progress from here - is it sufficient to only ever update the PodResizeInProgress condition in these two places:
- right after allocation in handlePodResourcesResize
- here:
kubernetes/pkg/kubelet/kubelet.go
Lines 2048 to 2054 in b98b86b
if r.Action == kubecontainer.ResizePodInPlace { if r.Error == nil { // The pod was resized successfully, clear any pod resize errors in the PodResizeInProgress condition. kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", true) } else { kl.statusManager.SetPodResizeInProgressCondition(pod.UID, v1.PodReasonError, r.Message, false) }
Basically it would mean that the PodResizeInProgress condition gets set to true immediately after allocation (assuming allocated != actuated; otherwise we just leave it empty), and then after that will only be re-checked / re-updated after the allocation is attempted?
@@ -2516,7 +2524,7 @@ func TestPodResourceAllocationReset(t *testing.T) { | ||
expectedPodResourceInfoMap: state.PodResourceInfoMap{ | ||
"3": state.PodResourceInfo{ | ||
ContainerResources: map[string]v1.ResourceRequirements{ | ||
cpu800mMem800MPodSpec.Containers[0].Name: cpu800mMem800MPodSpec.Containers[0].Resources, | ||
cpu800mMem800MPodSpec.Containers[0].Name: cpu500mMem500MPodSpec.Containers[0].Resources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this unit test is changing because HandlePodAdditions
now attempts the resize allocation (whereas it previously didn't)
@@ -71,6 +79,10 @@ type Manager interface { | ||
// TODO: See if we can remove this and just add them in the allocation manager constructor. | ||
AddPodAdmitHandlers(handlers lifecycle.PodAdmitHandlers) | ||
|
||
// SetContainerRuntime sets the allocation manager's container runtime. | ||
// TODO: See if we can remove this and just add it in the allocation manager constructor. | ||
SetContainerRuntime(runtime kubecontainer.Runtime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same energy as #131801 (comment)
411dc5d
to
46ccec3
Compare
cda84c5
to
501626e
Compare
@natasha41575: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
pod = allocatedPod | ||
pendingResizes = append(pendingResizes, pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod = allocatedPod | |
pendingResizes = append(pendingResizes, pod) | |
pendingResizes = append(pendingResizes, pod) | |
pod = allocatedPod |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This moves the in-place pod resize allocation logic out of the sync loop. This PR is organized into the following 4 commits:
HandlePodResourcesResize
unit tests and move them into theallocation
packageIsPodResizeInfeasible
andIsPodResizeDeferred
to the status_manager.The intention of this PR is to reattempt pending resizes:
Special notes for your reviewer
Intended follow-ups:
Which issue(s) this PR fixes:
Does not yet fix it, but this is part of #116971.
Does this PR introduce a user-facing change?
/sig node
/priority important-soon
/triage accepted
/cc @tallclair
TODO:
retry deferred resizes in HandlePodCleanupskl.podManager.GetPods
), and the pod manager is not updated in HandlePodCleanups, so I don't think retrying the pending resizes here is necessary