-
Notifications
You must be signed in to change notification settings - Fork 40.7k
feat: enable GPU resource overcommit for virtualized GPUs #132045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Modify IsOvercommitAllowed function to enable overcommitting NVIDIA GPU resources by allowing limits to exceed requests. This enables better utilization of GPU resources in clusters that use virtualization technologies like MPS, vGPU, MIG, or custom device plugins. Fixes kubernetes#132044 Addresses kubernetes#52757
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@Kevinz857: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @Kevinz857! |
Hi @Kevinz857. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Kevinz857 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Support GPU resource overcommit for virtualized GPUs
What type of PR is this?
/kind feature
/area gpu
/area hw-accelerators
/sig node
What this PR does / why we need it
This PR enables GPU resource overcommit by modifying the
IsOvercommitAllowed
function to allownvidia.com/gpu
resources to be overcommitted. Currently, GPU resources must have equal requests and limits, which prevents leveraging GPU virtualization technologies like NVIDIA MPS, vGPU, MIG, or device plugins like HAMi that enable sharing a single physical GPU among multiple containers.With modern GPU virtualization technologies, it's now common to share a single physical GPU among multiple containers, especially for inference workloads. This change makes it possible to overcommit GPU resources similar to how CPU and memory resources can be overcommitted.
Which issue(s) this PR fixes
Fixes #132044 and addresses the long-standing feature request in #52757
Special notes for your reviewer
This is a targeted change that only affects the validation logic for GPU resources, allowing them to be overcommitted while maintaining the existing behavior for other resources. The implementation pattern follows the existing pattern for native resources.
Does this PR introduce a user-facing change?