Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

add etcd server overrides to etcd probe factory for healthz and readyz #129438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
Loading
from

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Dec 31, 2024

What type of PR is this?

/kind bug feature

What this PR does / why we need it:

status, err := t.client.Status(ctx, t.endpoints[rand.Int()%len(t.endpoints)])
if err != nil {

The monitor random one of the etcd to check.

Which issue(s) this PR fixes:

Fixes #129417

Special notes for your reviewer:

Then the healthz?verbose is like below:

(base) ➜  ~ kubectl get --raw='/healthz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-override-0 ok
...

...

(base) ➜  ~ kubectl get --raw='/readyz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]etcd-override-0 ok
[+]etcd-override-readiness-0 ok
[+]informer-sync ok


cc @mengqiy
/cc @liggitt

Does this PR introduce a user-facing change?

kube-apiserver: include etcd server overrides to health check

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 31, 2024
@k8s-ci-robot k8s-ci-robot requested review from deads2k and dims December 31, 2024 08:31
@dims
Copy link
Member

dims commented Dec 31, 2024

cc @serathius

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 2, 2025
@pacoxu pacoxu marked this pull request as ready for review January 2, 2025 09:46
@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from 9ff8d88 to 6f3b982 Compare January 2, 2025 09:46
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 2, 2025
@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t January 2, 2025 09:46
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
@fedebongio
Copy link
Contributor

/assign @serathius @siyuanfoundation
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 2, 2025
@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from 6f3b982 to bca086c Compare January 3, 2025 01:14
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 11, 2025
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
@mengqiy
Copy link
Member

mengqiy commented Apr 12, 2025

The e2e test failure seems to be unrelated.

@mengqiy
Copy link
Member

mengqiy commented Apr 12, 2025

cc: @natherz97

staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from 44f7728 to 7be81fe Compare April 16, 2025 03:52
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 16, 2025
@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from 7be81fe to c2ae17b Compare April 16, 2025 06:18
@natherz97
Copy link

@pacoxu Thanks for the revision! These changes lgtm. I wrote these unit tests in staging/src/k8s.io/apiserver/pkg/server/options/etcd_test.go while reviewing the PR if you would like to add them.

func TestReadinessCheck(t *testing.T) {
	testCases := []struct {
		name                 string
		wantReadyzChecks     []string
		wantHealthzChecks    []string
		wantLivezChecks      []string
		skipHealth           bool
		etcdServersOverrides []string
	}{
		{
			name:              "Readyz should have etcd-readiness check",
			wantReadyzChecks:  []string{"etcd", "etcd-readiness"},
			wantHealthzChecks: []string{"etcd"},
			wantLivezChecks:   []string{"etcd"},
		},
		{
			name:              "skip health, Readyz should not have etcd-readiness check",
			wantReadyzChecks:  nil,
			wantHealthzChecks: nil,
			wantLivezChecks:   nil,
			skipHealth:        true,
		},
		{
			name:                 "Health checks should not have duplicated servers from etcd-servers-overrides",
			wantReadyzChecks:     []string{"etcd", "etcd-readiness", "etcd-override-0", "etcd-override-readiness-0"},
			wantHealthzChecks:    []string{"etcd", "etcd-override-0"},
			wantLivezChecks:      []string{"etcd", "etcd-override-0"},
			etcdServersOverrides: []string{"r1#s1.com;s2.com", "r2#s1.com;s2.com"},
		},
		{
			name: "Health checks should not have duplicated servers from etcd-servers-overrides " +
				"if servers are provided in different orders",
			wantReadyzChecks:     []string{"etcd", "etcd-readiness", "etcd-override-0", "etcd-override-readiness-0"},
			wantHealthzChecks:    []string{"etcd", "etcd-override-0"},
			wantLivezChecks:      []string{"etcd", "etcd-override-0"},
			etcdServersOverrides: []string{"r1#s1.com;s2.com", "r2#s2.com;s1.com"},
		},
		{
			name: "Health checks should allow multiple overrides in etcd-servers-overrides",
			wantReadyzChecks: []string{"etcd", "etcd-readiness", "etcd-override-0", "etcd-override-readiness-0",
				"etcd-override-1", "etcd-override-readiness-1"},
			wantHealthzChecks:    []string{"etcd", "etcd-override-0", "etcd-override-1"},
			wantLivezChecks:      []string{"etcd", "etcd-override-0", "etcd-override-1"},
			etcdServersOverrides: []string{"r1#s1.com;s2.com", "r2#s3.com;s4.com"},
		},
		{
			name: "Health checks should allow multiple overrides in etcd-servers-overrides if servers overlap between overrides",
			wantReadyzChecks: []string{"etcd", "etcd-readiness", "etcd-override-0", "etcd-override-readiness-0",
				"etcd-override-1", "etcd-override-readiness-1"},
			wantHealthzChecks:    []string{"etcd", "etcd-override-0", "etcd-override-1"},
			wantLivezChecks:      []string{"etcd", "etcd-override-0", "etcd-override-1"},
			etcdServersOverrides: []string{"r1#s1.com;s2.com", "r2#s2.com;s3.com"},
		},
	}

	scheme := runtime.NewScheme()
	codecs := serializer.NewCodecFactory(scheme)

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			serverConfig := server.NewConfig(codecs)
			etcdOptions := &EtcdOptions{SkipHealthEndpoints: tc.skipHealth, EtcdServersOverrides: tc.etcdServersOverrides}
			if err := etcdOptions.ApplyTo(serverConfig); err != nil {
				t.Fatalf("Failed to add healthz error: %v", err)
			}

			healthChecksAreEqual(t, tc.wantReadyzChecks, serverConfig.ReadyzChecks, "readyz")
			healthChecksAreEqual(t, tc.wantHealthzChecks, serverConfig.HealthzChecks, "healthz")
			healthChecksAreEqual(t, tc.wantLivezChecks, serverConfig.LivezChecks, "livez")
		})
	}
}

@pacoxu
Copy link
Member Author

pacoxu commented Apr 17, 2025

@pacoxu Thanks for the revision! These changes lgtm. I wrote these unit tests in staging/src/k8s.io/apiserver/pkg/server/options/etcd_test.go while reviewing the PR if you would like to add them.

UT test caess ware added now. Thanks.

@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from c41b4df to 30afd4b Compare April 18, 2025 03:51
staging/src/k8s.io/apiserver/pkg/server/healthz/healthz.go Outdated Show resolved Hide resolved
staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
sortedServers := make([]string, len(servers))
copy(sortedServers, servers)
sort.Strings(sortedServers)
sortedKey := strings.Join(sortedServers, ";")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's strange to use the sorted set here to determine uniqueness, but then use the unsorted set below to initialize the client ... if the order matters to the client, it should matter for uniqueness as well

I don't know enough to know if sort order matters, but we should at least be consistent ... use the same thing to determine uniqueness and client construction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to

  1. sort servers in ParseEtcdServersOverrides
  2. check if overrides are dup in the next loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • sort servers in ParseEtcdServersOverrides

Does changing the specified order change client behavior? If so, I don't think we should sort before passing to the client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/kubernetes/kubernetes/compare/406faa2db0b5ab6f2b6aac72a15175c3c9d37f15..6364526efb5072d71632556ac401d8f4f7b2d201

How about this?

  • keep the order in ParseEtcdServersOverrides
  • make a copy for sorting and keep the order for the client

@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch 4 times, most recently from 6364526 to b9ffb7e Compare April 23, 2025 03:15
@pacoxu
Copy link
Member Author

pacoxu commented May 12, 2025

ping @liggitt
Do you have time to take a look again?

@dims
Copy link
Member

dims commented May 13, 2025

/assign @liggitt

@dims
Copy link
Member

dims commented May 13, 2025

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 13, 2025
Comment on lines 369 to 371
// multi overrides servers may in different order
// example: ["apps/deployments#s2.example.com;s1.example.com","apps/replicasets#s1.example.com;s2.example.com"]
// ParseEtcdServersOverrides will sort the servers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop this comment, ParseEtcdServersOverrides no longer sorts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped.

staging/src/k8s.io/apiserver/pkg/server/options/etcd.go Outdated Show resolved Hide resolved
- grouping health checks for exclusion purposes & add exclude integration test

Signed-off-by: Paco Xu <paco.xu@daocloud.io>
@pacoxu pacoxu force-pushed the apiserver-probe-etcd branch from b9ffb7e to 8cec148 Compare May 22, 2025 06:04
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pacoxu
Once this PR has been reviewed and has the lgtm label, please ask for approval from liggitt. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apiserver healthz should check etcd override endpoints
9 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.