add exponential backoff with cap and reset in reflector during retrying ListWatch #87795

zhan849 · Feb 3, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
/kind bug

currently reflector's ListWatch backoffs with a fixed 1sec, which can cause a global QPS spike during API server crash (i.e. OOMKilled). This PR adds up a jittered exponential backoff to the retry of ListWatch. Exponential backoff has a cap, and will be reset after not getting backed-off for a while.

Which issue(s) this PR fixes:

Fixes #87794

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/assign @wojtek-t
/cc @liggitt @smarterclayton

k8s-ci-robot · Feb 3, 2020

Hi @zhan849. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liggitt · Feb 4, 2020

I'm not opposed to jittering this, though I'm skeptical that jittering by O(1 second) will make much of a difference at scale. I'd be more inclined to do something like period=1s, jitter factor more like 5-10, but would like feedback from @lavalamp / @smarterclayton / @wojtek-t

lavalamp · Feb 4, 2020

In this case I think exponential backoff makes more sense. This jitter still averages out to 1 qps per lister when apiserver is unhealthy, or constant load in the unhealthy state. What we actually want is decreasing load in the unhealthy state.

liggitt · Feb 4, 2020

good point… I guess the question is over what period we would want the retry to reset to the min delay. If I recall correctly, this loop is exercised every time we list, which happens under normal circumstances occasionally. Do we have an existing helper that will 1) back off exponentially to a max delay when called frequently, then 2) reset to min delay after not being called for "a while"?

zhan849 · Feb 4, 2020

Thanks @lavalamp and @liggitt for inputs jittered exponential backoff is definitely what we need here.

I checked that we don't currently have anything in staging but I can try to implement something inside reflector leveraging existing building blocks that

generates exponential backoff with jitter with max delay
resets after a configurable duration

will update the PR shortly

zhan849 · Feb 4, 2020

@lavalamp @liggitt I've proposed some impl leveraging some existing building blocks. I chose some empirical values for backoff management. pls take a look. thanks!

wojtek-t · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+
+// backoffManager manages jittered exponential back off with a cap and a reset period. Exponential back off will stop
+// to increase at the given max, and timer will reset after not backing off for a while
+type backoffManager struct {


This seems useful enough that it should be moved to a public place imho.
client-go/util/flowcontrol seems to be a good place

Probably also a separate PR as this will be universally useful.

Agree, let's get it merged here and move it in a second step.

thanks for the feedback! I will create a separate PR for backoff manager in client-go/util/flowcontrol package, and get that merged first. There will be a second PR updating reflector

lavalamp

Structure looks good, have some implementation requests. Thanks!

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

@@ -161,11 +165,28 @@ var internalPackages = []string{"client-go/tools/cache/"}
 // Run will exit when stopCh is closed.
 func (r *Reflector) Run(stopCh <-chan struct{}) {
 	klog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
-	wait.Until(func() {
+	defer func() {
+		klog.V(2).Infof("Stopping reflector %v (%s) from %s", r.expectedType, r.resyncPeriod, r.name)


V(3) to match the above?

actually after a second thought, changed both in v2 for debuggability - most open sourced k8s provisioner sets components to v2, and these logs can provide useful debug information about component startup

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+
+// backoffManager manages jittered exponential back off with a cap and a reset period. Exponential back off will stop
+// to increase at the given max, and timer will reset after not backing off for a while
+type backoffManager struct {


Agree, let's get it merged here and move it in a second step.

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+		// this will result in a 0.2sec, 0.4sec, 0.8sec, 1.6sec, 3.2sec, 6.4sec, 10sec, 10sec, ... each backoff is
+		// added with a 0.5 jitter. If the backoff is not triggered within 30sec, assume connection to api server
+		// is stable and we reset backoff
+		backoff:       newBackoffManager(200*time.Millisecond, 10*time.Second, 30*time.Second, 2.0, 0.5),


I vote for at least 20s max wait (95% traffic reduction in broken state vs current 1s), and I'd go as high as 100s (99% reduction).

I think 100s is a bit too much. But I would also go higher than 10s, though - maybe 30s as being somewhat round number?

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+
+	for {
+		// check stopch before each loop in case r.backoff.Backoff() gets selected before stopCh
+		// we avoid do one extra invoke


shouldn't the call to ListAndWatch return immediately in that case? I don't think this is necessary? The current code doesn't seem to feel a need to do this.

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go#L219 List() is executed asynchronously so still might go through

The current code has the same problem. Even this code still has a race, the channel could close immediately after we pass this line.

When context is wired in, there will be a more obvious way to make this correct.

I see. will fix this in next PR. gonna focus this PR to get the BackoffManager correct

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+	}
+}
+
+func (b *backoffManager) Backoff() *time.Timer {


For testability, I suggest you split this into two functions, one that computes the duration (can be private) and one that sets and returns the timer.

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+}
+
+func (b *backoffManager) Backoff() *time.Timer {
+	if time.Now().Sub(b.lastBackoffStart) > b.backoffResetDuration {


Use the clock package to make this testable. (Test code uses the fake).

lavalamp · Feb 4, 2020

staging/src/k8s.io/client-go/tools/cache/reflector_test.go

+	}
+}
+
+func TestBackoffManagerResetTimer(t *testing.T) {


Please test it all the way up to the max time and verify that it stops there.

staging/src/k8s.io/client-go/tools/cache/reflector.go

wojtek-t · Feb 7, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

-		clock:         &clock.RealClock{},
+		// this will result in a 0.1sec, 0.2sec, 0.4sec, 0.8sec, 1.6sec, 3.2sec, 6.4sec, 12.8sec, 25.6sec,
+		// 30sec, 30sec, ... backoff. With such backoff, we have ~95% traffic reduction (0.05 QPS) after 1min,
+		// and ~98% traffic reduction (0.02 QPS) after ~1.5min.


I guess I don't fully understand this math - after 1 min, we will be sending 1 request per 30s, so ~0.03QPS.

The argument with traffic reduction was more comparing to the current state (where we're sending request every 1s), so I think this % reduction doesn't make much sense in the comment file.

Seems like I made a bad estimation. I think the jitter would even make the exact math more complicated here. For comments, I will just make it simple -
we used to make the call every 1sec, our goal is to make a 98% traffic reduction (~0.02) QPS by stop at 30sec backoff with 1.0 jitter (random between 30sec and 60sec.)

lavalamp · Feb 7, 2020

staging/src/k8s.io/client-go/tools/cache/reflector.go

+		// We used to make the call every 1sec (1 QPS), the goal here is to achieve ~98% traffic reduction when
+		// API server is not healthy. With these parameters, backoff will stop at [30,60) sec interval which is
+		// 0.22 QPS.
+		backoffManager: wait.NewExponentialBackoffManager(100*time.Millisecond, 30*time.Second, 2*time.Minute, 2.0, 1.0, realClock),


hm, that 100ms initial (vs 1s) might make the first couple seconds worse than before-- what do you think about putting that at 800ms?

800ms can help in the api server crash case but would make reconnecting not so agile during normal connection broken (in production we are seeing at Pinterest, normal re-connect, aka connection closed by api server is more common at scale)

100ms is more likely to screw up in api server crash case (i.e. oom killed, when it gets restarted, all api calls can squeeze in), but since there are exponential backoff, what it does is to (sort of) delay the recovery of api server by 10s seconds to minutes.

but I'm ok to make it 800ms, [800ms, 1600ms) with jitter, which is no worse than the original behavior.

inputs @wojtek-t @liggitt ?

If your clients are reconnecting often enough that this 1s is a problem, something else is wrong, you have a watch cache configuration problem or etcd size problem or something.

I think 800ms is a safe improvement over 1s (that is, it's mostly compensated by the jitter you added).

sounds good. Lets' go with 800ms. I updated the PR

wojtek-t · Feb 10, 2020

Agree with Daniel - the current version looks fine for me.

/ok-to-test

zhan849 · Feb 10, 2020

/retest

wojtek-t · Feb 11, 2020

/lgtm
/approve

k8s-ci-robot · Feb 11, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t, zhan849

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/client-go/tools/cache/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · Feb 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · Feb 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

zhan849 · Feb 11, 2020

/retest

fejta-bot · Feb 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

pohly · Jun 10, 2021

staging/src/k8s.io/client-go/tools/cache/reflector.go

@@ -168,12 +172,13 @@ var internalPackages = []string{"client-go/tools/cache/"}
 // objects and subsequent deltas.
 // Run will exit when stopCh is closed.
 func (r *Reflector) Run(stopCh <-chan struct{}) {
-	klog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
-	wait.Until(func() {
+	klog.V(2).Infof("Starting reflector %s (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)


Was it intentional that the log level got bumped up from 3 to 2?

I just updated to client-go from 1.21.1 and these messages started to pop up quite a lot in my client (an e2e test suite).

I don't think it really was - we should fix that and backport.
Would you be able to send PR for it (happy to approve)?

See #102788

#87795 (comment)

I guess that was my intention. In practice we found reflector starting is an important debugging point and it only prints once when reflector initially created so won’t be too verbose.

@pohly if this line of log is polluting your log file, does your test include a lot of reflector creation?

Yes. I'm running an e2e.test suite where essentially each test is independent and therefore instantiates its own reflectors.

Would this log help debug then? I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.
@wojtek-t thoughts?

Would this log help debug then?

No. For me it is just additional noise. That's why I investigated.

I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.

But client-go as a library cannot make assumptions about what constitutes component start/finish. In particular one cannot assume that this code is only called at the start and end of a program run. E2E testing is one counter example, but there might very well be others.

A library that gets embedded in command line tools should be very cautious about logging at a level that is visible by default. I would not expect any output from client-go at V < 3 unless that output is relevant for the end-user of that command line tool.

I don't think it really was - we should fix that and backport.

Here's the backport: #102800

Would this log help debug then? I.e. IMO it makes sense for v2 log or even v1 log to include critical time points such as component start / finish etc. in general.
@wojtek-t thoughts?

Reflector starts may happen not only on component start/finish. Any time watch is not able to resume it may also happen. If you really want those logs, you can always configure verbosity per file. I don't think it should be v=2 by default.

kubernetes#87795 most likely unintentionally increased the log level of "Starting reflector" and "Stopping reflector", with the result that since Kubernetes 1.21 clients have printed that message by default. This is undesirable, we should use the original level 3 again.

kubernetes/kubernetes#87795 most likely unintentionally increased the log level of "Starting reflector" and "Stopping reflector", with the result that since Kubernetes 1.21 clients have printed that message by default. This is undesirable, we should use the original level 3 again. Kubernetes-commit: fd972934e4916879b04508686302659ce82cfa75

kubernetes#87795 most likely unintentionally increased the log level of "Starting reflector" and "Stopping reflector", with the result that since Kubernetes 1.21 clients have printed that message by default. This is undesirable, we should use the original level 3 again.

kubernetes/kubernetes#87795 most likely unintentionally increased the log level of "Starting reflector" and "Stopping reflector", with the result that since Kubernetes 1.21 clients have printed that message by default. This is undesirable, we should use the original level 3 again. Kubernetes-commit: ef0beb2cf7091f3c1c837af63120cb8ce9f084bc

k8s-ci-robot assigned wojtek-t Feb 3, 2020

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 3, 2020

k8s-ci-robot requested review from liggitt and smarterclayton February 3, 2020 23:39

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 3, 2020

zhan849 force-pushed the harry/reflector-backoff branch from 7116807 to 5f488e5 Compare February 4, 2020 05:55

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 4, 2020

zhan849 force-pushed the harry/reflector-backoff branch 2 times, most recently from 5a3ee04 to 9921061 Compare February 4, 2020 06:20

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2020

zhan849 changed the title ~~add 0.1~1.8 sec jitter for reflector to retry ListWatch~~ add exponential backoff with cap and reset in reflector during retrying ListWatch Feb 4, 2020

zhan849 force-pushed the harry/reflector-backoff branch from 9921061 to 23e1b2e Compare February 4, 2020 06:31

wojtek-t reviewed Feb 4, 2020

View reviewed changes

lavalamp reviewed Feb 4, 2020

View reviewed changes

zhan849 mentioned this pull request Feb 4, 2020

implement backoff manager #87829

Merged

zhan849 force-pushed the harry/reflector-backoff branch 2 times, most recently from 419f50d to 04679e9 Compare February 7, 2020 06:14

wojtek-t reviewed Feb 7, 2020

View reviewed changes

zhan849 force-pushed the harry/reflector-backoff branch from 04679e9 to 4548c79 Compare February 7, 2020 16:44

lavalamp reviewed Feb 7, 2020

View reviewed changes

lavalamp mentioned this pull request Feb 7, 2020

Backoffs should have jitter #87915

Closed

add exponential backoff with reset to reflector

f124816

zhan849 force-pushed the harry/reflector-backoff branch from 4548c79 to f124816 Compare February 8, 2020 02:06

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2020

k8s-ci-robot merged commit f9250c4 into kubernetes:master Feb 12, 2020

k8s-ci-robot added this to the v1.18 milestone Feb 12, 2020

l1b0k mentioned this pull request Apr 23, 2020

add error checker for nodeinfo when kubelet could not get pod and node info #80976

Closed

wojtek-t mentioned this pull request Apr 23, 2020

regression: wait.Forever runs twice immediately since 1.18 #90287

Closed

pohly reviewed Jun 10, 2021

View reviewed changes

pohly mentioned this pull request Jun 10, 2021

client-go: reduce log level of reflector again #102788

Merged

Search code, repositories, users, issues, pull requests...

add exponential backoff with cap and reset in reflector during retrying ListWatch #87795

add exponential backoff with cap and reset in reflector during retrying ListWatch #87795

Uh oh!

Conversation

zhan849 commented Feb 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Feb 3, 2020

Uh oh!

liggitt commented Feb 4, 2020

Uh oh!

lavalamp commented Feb 4, 2020

Uh oh!

liggitt commented Feb 4, 2020

Uh oh!

zhan849 commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhan849 commented Feb 4, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lavalamp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhan849 Feb 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t commented Feb 10, 2020

Uh oh!

zhan849 commented Feb 10, 2020

Uh oh!

wojtek-t commented Feb 11, 2020

Uh oh!

k8s-ci-robot commented Feb 11, 2020

Uh oh!

fejta-bot commented Feb 11, 2020

zhan849 commented Feb 3, 2020 •

edited

Loading

zhan849 commented Feb 4, 2020 •

edited

Loading

zhan849 Feb 7, 2020 •

edited

Loading

pohly Jun 10, 2021 •

edited

Loading