Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

add exponential backoff with cap and reset in reflector during retrying ListWatch #87795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2020

Conversation

zhan849
Copy link
Contributor

@zhan849 zhan849 commented Feb 3, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
/kind bug

currently reflector's ListWatch backoffs with a fixed 1sec, which can cause a global QPS spike during API server crash (i.e. OOMKilled). This PR adds up a jittered exponential backoff to the retry of ListWatch. Exponential backoff has a cap, and will be reset after not getting backed-off for a while.

Which issue(s) this PR fixes:

Fixes #87794

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/assign @wojtek-t
/cc @liggitt @smarterclayton

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 3, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 3, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @zhan849. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 3, 2020
@liggitt
Copy link
Member

liggitt commented Feb 4, 2020

I'm not opposed to jittering this, though I'm skeptical that jittering by O(1 second) will make much of a difference at scale. I'd be more inclined to do something like period=1s, jitter factor more like 5-10, but would like feedback from @lavalamp / @smarterclayton / @wojtek-t

@lavalamp
Copy link
Member

lavalamp commented Feb 4, 2020

In this case I think exponential backoff makes more sense. This jitter still averages out to 1 qps per lister when apiserver is unhealthy, or constant load in the unhealthy state. What we actually want is decreasing load in the unhealthy state.

@liggitt
Copy link
Member

liggitt commented Feb 4, 2020

good point… I guess the question is over what period we would want the retry to reset to the min delay. If I recall correctly, this loop is exercised every time we list, which happens under normal circumstances occasionally. Do we have an existing helper that will 1) back off exponentially to a max delay when called frequently, then 2) reset to min delay after not being called for "a while"?

@zhan849
Copy link
Contributor Author

zhan849 commented Feb 4, 2020

Thanks @lavalamp and @liggitt for inputs jittered exponential backoff is definitely what we need here.

I checked that we don't currently have anything in staging but I can try to implement something inside reflector leveraging existing building blocks that

  1. generates exponential backoff with jitter with max delay
  2. resets after a configurable duration

will update the PR shortly

@zhan849 zhan849 force-pushed the harry/reflector-backoff branch from 7116807 to 5f488e5 Compare February 4, 2020 05:55
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 4, 2020
@zhan849 zhan849 force-pushed the harry/reflector-backoff branch 2 times, most recently from 5a3ee04 to 9921061 Compare February 4, 2020 06:20
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2020
@zhan849
Copy link
Contributor Author

zhan849 commented Feb 4, 2020

@lavalamp @liggitt I've proposed some impl leveraging some existing building blocks. I chose some empirical values for backoff management. pls take a look. thanks!

@zhan849 zhan849 changed the title add 0.1~1.8 sec jitter for reflector to retry ListWatch add exponential backoff with cap and reset in reflector during retrying ListWatch Feb 4, 2020
@zhan849 zhan849 force-pushed the harry/reflector-backoff branch from 9921061 to 23e1b2e Compare February 4, 2020 06:31

// backoffManager manages jittered exponential back off with a cap and a reset period. Exponential back off will stop
// to increase at the given max, and timer will reset after not backing off for a while
type backoffManager struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useful enough that it should be moved to a public place imho.
client-go/util/flowcontrol seems to be a good place

Probably also a separate PR as this will be universally useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's get it merged here and move it in a second step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback! I will create a separate PR for backoff manager in client-go/util/flowcontrol package, and get that merged first. There will be a second PR updating reflector

Copy link
Member

@lavalamp lavalamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure looks good, have some implementation requests. Thanks!

@@ -161,11 +165,28 @@ var internalPackages = []string{"client-go/tools/cache/"}
// Run will exit when stopCh is closed.
func (r *Reflector) Run(stopCh <-chan struct{}) {
klog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
wait.Until(func() {
defer func() {
klog.V(2).Infof("Stopping reflector %v (%s) from %s", r.expectedType, r.resyncPeriod, r.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(3) to match the above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually after a second thought, changed both in v2 for debuggability - most open sourced k8s provisioner sets components to v2, and these logs can provide useful debug information about component startup


// backoffManager manages jittered exponential back off with a cap and a reset period. Exponential back off will stop
// to increase at the given max, and timer will reset after not backing off for a while
type backoffManager struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's get it merged here and move it in a second step.

// this will result in a 0.2sec, 0.4sec, 0.8sec, 1.6sec, 3.2sec, 6.4sec, 10sec, 10sec, ... each backoff is
// added with a 0.5 jitter. If the backoff is not triggered within 30sec, assume connection to api server
// is stable and we reset backoff
backoff: newBackoffManager(200*time.Millisecond, 10*time.Second, 30*time.Second, 2.0, 0.5),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for at least 20s max wait (95% traffic reduction in broken state vs current 1s), and I'd go as high as 100s (99% reduction).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 100s is a bit too much. But I would also go higher than 10s, though - maybe 30s as being somewhat round number?


for {
// check stopch before each loop in case r.backoff.Backoff() gets selected before stopCh
// we avoid do one extra invoke
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the call to ListAndWatch return immediately in that case? I don't think this is necessary? The current code doesn't seem to feel a need to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code has the same problem. Even this code still has a race, the channel could close immediately after we pass this line.

When context is wired in, there will be a more obvious way to make this correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. will fix this in next PR. gonna focus this PR to get the BackoffManager correct

}
}

func (b *backoffManager) Backoff() *time.Timer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testability, I suggest you split this into two functions, one that computes the duration (can be private) and one that sets and returns the timer.

}

func (b *backoffManager) Backoff() *time.Timer {
if time.Now().Sub(b.lastBackoffStart) > b.backoffResetDuration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the clock package to make this testable. (Test code uses the fake).

}
}

func TestBackoffManagerResetTimer(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test it all the way up to the max time and verify that it stops there.

@zhan849 zhan849 force-pushed the harry/reflector-backoff branch 2 times, most recently from 419f50d to 04679e9 Compare February 7, 2020 06:14
clock: &clock.RealClock{},
// this will result in a 0.1sec, 0.2sec, 0.4sec, 0.8sec, 1.6sec, 3.2sec, 6.4sec, 12.8sec, 25.6sec,
// 30sec, 30sec, ... backoff. With such backoff, we have ~95% traffic reduction (0.05 QPS) after 1min,
// and ~98% traffic reduction (0.02 QPS) after ~1.5min.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't fully understand this math - after 1 min, we will be sending 1 request per 30s, so ~0.03QPS.

The argument with traffic reduction was more comparing to the current state (where we're sending request every 1s), so I think this % reduction doesn't make much sense in the comment file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I made a bad estimation. I think the jitter would even make the exact math more complicated here. For comments, I will just make it simple -
we used to make the call every 1sec, our goal is to make a 98% traffic reduction (~0.02) QPS by stop at 30sec backoff with 1.0 jitter (random between 30sec and 60sec.)

@zhan849 zhan849 force-pushed the harry/reflector-backoff branch from 04679e9 to 4548c79 Compare February 7, 2020 16:44
// We used to make the call every 1sec (1 QPS), the goal here is to achieve ~98% traffic reduction when
// API server is not healthy. With these parameters, backoff will stop at [30,60) sec interval which is
// 0.22 QPS.
backoffManager: wait.NewExponentialBackoffManager(100*time.Millisecond, 30*time.Second, 2*time.Minute, 2.0, 1.0, realClock),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, that 100ms initial (vs 1s) might make the first couple seconds worse than before-- what do you think about putting that at 800ms?

Copy link
Contributor Author

@zhan849 zhan849 Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

800ms can help in the api server crash case but would make reconnecting not so agile during normal connection broken (in production we are seeing at Pinterest, normal re-connect, aka connection closed by api server is more common at scale)

100ms is more likely to screw up in api server crash case (i.e. oom killed, when it gets restarted, all api calls can squeeze in), but since there are exponential backoff, what it does is to (sort of) delay the recovery of api server by 10s seconds to minutes.

but I'm ok to make it 800ms, [800ms, 1600ms) with jitter, which is no worse than the original behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs @wojtek-t @liggitt ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your clients are reconnecting often enough that this 1s is a problem, something else is wrong, you have a watch cache configuration problem or etcd size problem or something.

I think 800ms is a safe improvement over 1s (that is, it's mostly compensated by the jitter you added).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. Lets' go with 800ms. I updated the PR

@zhan849 zhan849 force-pushed the harry/reflector-backoff branch from 4548c79 to f124816 Compare February 8, 2020 02:06
@wojtek-t
Copy link
Member

Agree with Daniel - the current version looks fine for me.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2020
@zhan849
Copy link
Contributor Author

zhan849 commented Feb 10, 2020

/retest

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t, zhan849

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@zhan849
Copy link
Contributor Author

zhan849 commented Feb 11, 2020

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@@ -168,12 +172,13 @@ var internalPackages = []string{"client-go/tools/cache/"}
// objects and subsequent deltas.
// Run will exit when stopCh is closed.
func (r *Reflector) Run(stopCh <-chan struct{}) {
klog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
wait.Until(func() {
klog.V(2).Infof("Starting reflector %s (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional that the log level got bumped up from 3 to 2?

I just updated to client-go from 1.21.1 and these messages started to pop up quite a lot in my client (an e2e test suite).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really was - we should fix that and backport.
Would you be able to send PR for it (happy to approve)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #102788

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#87795 (comment)

I guess that was my intention. In practice we found reflector starting is an important debugging point and it only prints once when reflector initially created so won’t be too verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly if this line of log is polluting your log file, does your test include a lot of reflector creation?

Copy link
Contributor

@pohly pohly Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm running an e2e.test suite where essentially each test is independent and therefore instantiates its own reflectors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this log help debug then? I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.
@wojtek-t thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this log help debug then?

No. For me it is just additional noise. That's why I investigated.

I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.

But client-go as a library cannot make assumptions about what constitutes component start/finish. In particular one cannot assume that this code is only called at the start and end of a program run. E2E testing is one counter example, but there might very well be others.

A library that gets embedded in command line tools should be very cautious about logging at a level that is visible by default. I would not expect any output from client-go at V < 3 unless that output is relevant for the end-user of that command line tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really was - we should fix that and backport.

Here's the backport: #102800

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this log help debug then? I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.
@wojtek-t thoughts?

Reflector starts may happen not only on component start/finish. Any time watch is not able to resume it may also happen. If you really want those logs, you can always configure verbosity per file. I don't think it should be v=2 by default.

pohly added a commit to pohly/kubernetes that referenced this pull request Jun 10, 2021
kubernetes#87795 most likely
unintentionally increased the log level of "Starting reflector" and
"Stopping reflector", with the result that since Kubernetes 1.21
clients have printed that message by default. This is undesirable, we
should use the original level 3 again.
k8s-publishing-bot pushed a commit to kubernetes/client-go that referenced this pull request Jun 11, 2021
kubernetes/kubernetes#87795 most likely
unintentionally increased the log level of "Starting reflector" and
"Stopping reflector", with the result that since Kubernetes 1.21
clients have printed that message by default. This is undesirable, we
should use the original level 3 again.

Kubernetes-commit: fd972934e4916879b04508686302659ce82cfa75
pohly added a commit to pohly/kubernetes that referenced this pull request Jun 11, 2021
kubernetes#87795 most likely
unintentionally increased the log level of "Starting reflector" and
"Stopping reflector", with the result that since Kubernetes 1.21
clients have printed that message by default. This is undesirable, we
should use the original level 3 again.
k8s-publishing-bot pushed a commit to kubernetes/client-go that referenced this pull request Jun 11, 2021
kubernetes/kubernetes#87795 most likely
unintentionally increased the log level of "Starting reflector" and
"Stopping reflector", with the result that since Kubernetes 1.21
clients have printed that message by default. This is undesirable, we
should use the original level 3 again.

Kubernetes-commit: ef0beb2cf7091f3c1c837af63120cb8ce9f084bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reflector should backoff with Jitter to avoid global synchronization during API server OOM
7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.