[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #129719

esotsal · Jan 20, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

It is needed to resize allocated CPU of a Guaranteed QoS Class with integer CPU requests Pod without restart with Static CPU management policy alongside InPlacePodVerticalScaling.

Current PR retains "promised" upon start of container CPUs

Which issue(s) this PR fixes:

It is mentioned the issue in https://kubernetes.io/blog/2023/05/12/in-place-pod-resize-alpha/#known-issues last bullet.
I wasn't able to find a K8s issue, posting this PR following "I found it, I fix it" ethos after discussion in slack.

Update: Attempt to fix #127262

Special notes for your reviewer:

[Update] Latest 14th April 2025 Demo screencast

Screencast.from.2025-04-14.10-02-45.webm

This PR replace #123319 , needed due to company transfer.

This PR includes also a merge from esotsal#9 , to test proposals and review in one common PR. Thanks @Chunxia202410 for the contributions.

A proposal for CPU allocated strategy when Pod scale up and down
~ Add mustKeepCPUs Interface~ ( Update 14th April 2025 , replaced with a local checkpoint solution as per sig-node meeting decisions )

Does this PR introduce a user-facing change?

Fixed an issue where static CPU management policy would not work alongside in-place
vertical scaling of a Pod.

esotsal · Jan 20, 2025

@Chunxia202410, @hshiina , @ffromani , @AnishShah , @tallclair , @SergeyKanzhelev , @vinaykul , moved old PR here due to company transfer, now i can continue working on this. Will prioritize updating the commit with the recent changes done in InPlacePodVerticalScaling and the proposals shared by Chunxia202410.

linux-foundation-easycla · Feb 5, 2025

The committers listed above are authorized under a signed CLA.

✅ login: Chunxia202410 / name: Chunxia Guo (2e3ee9d)
✅ login: esotsal / name: Sotiris Salloumis (78f52bf, 48df5f3, b0bec4c, 978c807, bbad7ee, ca7a480, 00e4324)

esotsal · Feb 5, 2025

Thanks @Chunxia202410 for the contributions, merged your two PRs to check tests . I have some questions about the second PR regarding strategy , thought best to discuss here, will ask later this week, need to do some more tests.

Thanks for the API PR seems is one of the options discussed.

Will try to update tests tomorrow or by end of this week, to make sure test is covered and passed before the Beta in v1.33 to have a point of reference.

We will need to raise the final solution or solutions in Sig-node , after v1.33 is finished and this PR is refactored with the new InPlacePodVerticalScaling KEP ongoing changes .

Thanks again for your PRs

ffromani · Jun 11, 2025

I'm struggling to find the time to review this important PR. I'll keep pushing, but feel free to pull in more reviewers.

esotsal · Jun 16, 2025

I'm struggling to find the time to review this important PR. I'll keep pushing, but feel free to pull in more reviewers.

Thanks, at the time of writing, six reviewers are in the list. In addition as agreed in this slack thread this commit will be handled together with Memory resize in this draft Kubernetes Enhancement Proposal (will update soon summarizing the discussions done in this PR). Plan is when it will be ready for review to be presented in a sig-node meeting and ask for a KEP review.

So hopefully more eyes will look on this important PR, fyi we are aiming KEP and relevant PRs to be ready for v1.35 release.

ffromani · Jun 20, 2025

I'm struggling to find the time to review this important PR. I'll keep pushing, but feel free to pull in more reviewers.

Thanks, at the time of writing, six reviewers are in the list. In addition as agreed in this slack thread this commit will be handled together with Memory resize in this draft Kubernetes Enhancement Proposal (will update soon summarizing the discussions done in this PR). Plan is when it will be ready for review to be presented in a sig-node meeting and ask for a KEP review.

So hopefully more eyes will look on this important PR, fyi we are aiming KEP and relevant PRs to be ready for v1.35 release.

thanks @esotsal . If you aim for 1.35, I can prepare better and make sure to have time to review and contribute to this work.

k8s-ci-robot · Jun 26, 2025

@esotsal: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-e2e-capz-windows-master	`9c411d0`	link	false	`/test pull-kubernetes-e2e-capz-windows-master`
pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers	`9c411d0`	link	false	`/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers`
pull-kubernetes-node-kubelet-serial-crio-cgroupv2	`9c411d0`	link	false	`/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2`
pull-kubernetes-node-kubelet-serial-podresources	`9c411d0`	link	false	`/test pull-kubernetes-node-kubelet-serial-podresources`
pull-kubernetes-node-kubelet-serial-containerd-alpha-features	`9c411d0`	link	false	`/test pull-kubernetes-node-kubelet-serial-containerd-alpha-features`
pull-kubernetes-node-kubelet-serial-containerd	`9c411d0`	link	false	`/test pull-kubernetes-node-kubelet-serial-containerd`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Use new topology.Allocation struct (a CPU set plus alignment metadata) instead of CPU set, due to rebase. Remove duplicate unecessary SetDefaultCPUSet call as per review comment.

- Revert introduction of API env mustKeepCPUs - Replace mustKeepCPUs with local checkpoint "promised" - Introduce "promised" in CPUManagerCheckpointV3 format - Add logic, refactor with Beta candidate - Fix lint issues - Fail if mustKeepCPUs are not subset of resulted CPUs - Fail if reusableCPUsForResize, mustKeepCPUs are not a subset of aligned CPUs - Fail if mustKeepCPUs are not a subset of reusable CPUs - TODO improve align resize tests, go through testing, corner cases refactor using cpumanager_test.go - TODO improve CPUManagerCheckpointV3 tests - TODO address code review/feedback to try different approach to allocate stepwise instead of once off when resizing - TODO check init-containers - TODO check migration from v2 to v3 CPU Manager checkpoint - TODO check kubectl failure when prohibited can this be done earlier? - TODO update CPU Manager tests to use refactpred cpu_manager_test - TODO update topologymanager,cpumanager,memorymanager documentation

k8s-ci-robot requested review from andrewsykim and klueska January 20, 2025 17:02

esotsal mentioned this pull request Jan 20, 2025

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #123319

Closed

esotsal force-pushed the policy_static branch 2 times, most recently from 6b6da95 to cc91639 Compare January 27, 2025 19:13

This was referenced Jan 28, 2025

[FG:InPlacePodVerticalScaling] Add mustKeepCPUs Interface Nordix/kubernetes#7

Closed

[FG:InPlacePodVerticalScaling] A proposal for CPU allocated strategy when Pod scale up and down Nordix/kubernetes#8

Closed

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 5, 2025

esotsal force-pushed the policy_static branch from 0153b60 to 09b2d95 Compare June 1, 2025 09:02

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2025

esotsal force-pushed the policy_static branch from 09b2d95 to 4ce5b01 Compare June 4, 2025 13:29

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2025

esotsal force-pushed the policy_static branch 6 times, most recently from c4aaa63 to e997de0 Compare June 11, 2025 07:44

esotsal force-pushed the policy_static branch 2 times, most recently from ea8fdd6 to ddf152d Compare June 18, 2025 07:31

liyuerich mentioned this pull request Jun 20, 2025

[Flaking Test] UT k8s.io/kubernetes/pkg/kubeapiserver: options #132424

Closed

esotsal force-pushed the policy_static branch 4 times, most recently from c39d262 to 27a0b9e Compare June 26, 2025 07:30

esotsal and others added 8 commits June 27, 2025 17:57

Static CPU management policy alongside InPlacePodVerticalScaling

48df5f3

Support InPlacePodVerticalScaling for Static CPU management policy

2e3ee9d

Fix go fmt isssues

00e4324

Fix mutation heuristic check of mustKeepCPUs, reason is clone

78f52bf

Fix glangci-lint-pr failed test

978c807

Fix compile issue, due to update in e2e/framework removing rollback bool

bbad7ee

Fix compile issue and address review comment.

ca7a480

Use new topology.Allocation struct (a CPU set plus alignment metadata) instead of CPU set, due to rebase. Remove duplicate unecessary SetDefaultCPUSet call as per review comment.

esotsal force-pushed the policy_static branch from 27a0b9e to b0bec4c Compare June 27, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #129719

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #129719

Uh oh!

esotsal commented Jan 20, 2025 •

edited

Loading

Uh oh!

esotsal commented Jan 20, 2025

Uh oh!

linux-foundation-easycla bot commented Feb 5, 2025 •

edited

Loading

Uh oh!

esotsal commented Feb 5, 2025 •

edited

Loading

Uh oh!

ffromani commented Jun 11, 2025

Uh oh!

esotsal commented Jun 16, 2025 •

edited

Loading

Uh oh!

ffromani commented Jun 20, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #129719

Are you sure you want to change the base?

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling #129719

Uh oh!

Conversation

esotsal commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

esotsal commented Jan 20, 2025

Uh oh!

linux-foundation-easycla bot commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esotsal commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffromani commented Jun 11, 2025

Uh oh!

esotsal commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ffromani commented Jun 20, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

esotsal commented Jan 20, 2025 •

edited

Loading

linux-foundation-easycla bot commented Feb 5, 2025 •

edited

Loading

esotsal commented Feb 5, 2025 •

edited

Loading

esotsal commented Jun 16, 2025 •

edited

Loading

k8s-ci-robot commented Jun 26, 2025 •

edited

Loading