Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Remove LRU cache from ResourceQuota admission plugin #129998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
Loading
from

Conversation

AwesomePatrol
Copy link
Contributor

@AwesomePatrol AwesomePatrol commented Feb 6, 2025

/kind bug
/kind regression

What this PR does / why we need it:

This PR mitigates the problem of high latency caused by consistent reads made in Admission/Validation of ResourceQuota. LISTs made with Informer provide sufficient consistency guarantees. More details are available in the linked bug report.

Which issue(s) this PR fixes:

Issue #129931

Special notes for your reviewer:

Tests in resource_access_test focused on caching and avoiding duplicated LIST requests. As I deleted the code, the tests became obsolete. GetQuotas is still covered by other tests in the module.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @jpbetz @serathius @liggitt @deads2k

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Feb 6, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 6, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @AwesomePatrol. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 6, 2025
@liggitt
Copy link
Member

liggitt commented Feb 6, 2025

/ok-to-test

Update the release note to indicate this resolves a performance regression of write requests related to the ConsistentListFromCache feature gate, enabled in 1.31+ by default.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. labels Feb 6, 2025
@liggitt
Copy link
Member

liggitt commented Feb 6, 2025

/lgtm
/approve

/hold for ack from second api-machinery reviewer (@deads2k?)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9d0008aa89cc2d2ff081a4fe7307992a4b66601a

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2025
@deads2k
Copy link
Contributor

deads2k commented Feb 6, 2025

/hold for ack from second api-machinery reviewer (@deads2k?)

lgtm

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2025
@AwesomePatrol AwesomePatrol marked this pull request as draft February 12, 2025 12:33
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 12, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 12, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 13, 2025
@AwesomePatrol AwesomePatrol changed the title Remove LRU cache from ResourceQuota admission plugin Remove LRU caches from ResourceQuota/LimitRanger admission plugins Feb 13, 2025
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 13, 2025
@AwesomePatrol AwesomePatrol marked this pull request as ready for review February 14, 2025 07:52
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 14, 2025
plugin/pkg/admission/limitranger/admission.go Outdated Show resolved Hide resolved
@@ -164,6 +154,10 @@ func (a *QuotaAdmission) Validate(ctx context.Context, attr admission.Attributes
if attr.GetNamespace() == "" || isNamespaceCreation(attr) {
return nil
}
// we need to wait for our caches to warm
if !a.WaitForReady() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this here, before calling Evaluate(), means that in unsynced cases, we'll reject resources we would otherwise have ignored:

// is this resource ignored?
gvr := a.GetResource()
gr := gvr.GroupResource()
if _, ok := e.ignoredResources[gr]; ok {
return nil
}

or skipped:

// for this kind, check if the operation could mutate any quota resources
// if no resources tracked by quota are impacted, then just return
if !evaluator.Handles(a) {
return nil
}

I think the WaitForReady call should happen after both of those, so we only block on sync for things quota is actually interested in. That will mean plumbing the wait function down into the evaluator to invoke.

Copy link
Contributor Author

@AwesomePatrol AwesomePatrol Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to the evaluator, but I am not sure if this is the best place. init can't be used as it doesn't allow to retry errors. Moving to quotaAccessor may work, but it would likely need a custom implementation of waitForReady:

func (a *quotaAccessor) waitForReady() error {
  if a.hasSynced() {
    return nil
  }
  if err := wait.PollUntilContext(context.Background, 100 * time.Millisecond, time.Second,
    func(_ context.Context) (bool, error) { return a.hasSynced(), nil }); err != nil {
    
    return fmt.Errorf("not ready")
  }
  return nil
}

I also hope that compiler manages to optimize handler's WaitForReady as otherwise it creates a timer only to be GCed right after (in cases when readyFunc returns always true, i.e. post sync)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 17, 2025
@AwesomePatrol AwesomePatrol changed the title Remove LRU caches from ResourceQuota/LimitRanger admission plugins Remove LRU cache from ResourceQuota admission plugin Feb 17, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2025
@dims dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 4, 2025
The informer keeps track of the data already so there is no need for an
additional layer of caching and LISTing. We add wait to ResourceQuota
admission hook to make sure it is synchronized before processing
requests.
ResourceQuota's Admission tests now start an informer that fetches
resources from the fake kubeClient.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.