Description
What happened?
During periods when there's webhook failures, it is possible for the apiserver to crash while generating an audit log entry for the given request. Specifically, when a large number of pods are deleted that are intercepted by a mutating webhook that is timing out, the following fatal error can be encountered:
E0905 18:47:26.898105 11 dispatcher.go:184] failed calling webhook "webhook.com": failed to call webhook: Post "https://webhook:443/mutate?timeout=12s": context deadline exceeded
fatal error: concurrent map iteration and map write
goroutine 295461 [running]:
reflect.mapiterinit(0x451512?, 0x200?, 0x48da4c0?)
/usr/local/go/src/runtime/map.go:1375 +0x19
reflect.(*MapIter).Next(0x4528ac0?)
/usr/local/go/src/reflect/value.go:1919 +0x65
encoding/json.mapEncoder.encode({0xc05cea2cf8?}, 0xc06363ae00, {0x4528ac0?, 0xc0b2bdf170?, 0xc0b2bdf0b0?}, {0xe?, 0x0?})
/usr/local/go/src/encoding/json/encode.go:797 +0x33e
encoding/json.structEncoder.encode({{{0xc0208ea000?, 0x3dfdbc4?, 0x5?}, 0xc041e42b40?}}, 0xc06363ae00, {0x4e6fa60?, 0xc0b2bdf040?, 0x3dfdbc4?}, {0x0, 0x1})
/usr/local/go/src/encoding/json/encode.go:759 +0x1f4
encoding/json.ptrEncoder.encode({0x0?}, 0xc06363ae00, {0x4e16cc0?, 0xc0b2bdf040?, 0x4e16cc0?}, {0xb6?, 0x68?})
/usr/local/go/src/encoding/json/encode.go:943 +0x21c
encoding/json.(*encodeState).reflectValue(0xc05cea2fb0?, {0x4e16cc0?, 0xc0b2bdf040?, 0xc0ab3181a0?}, {0x60?, 0x2c?})
/usr/local/go/src/encoding/json/encode.go:358 +0x78
encoding/json.(*encodeState).marshal(0x58d8a20?, {0x4e16cc0?, 0xc0b2bdf040?}, {0xb9?, 0xe?})
/usr/local/go/src/encoding/json/encode.go:330 +0xfa
encoding/json.(*Encoder).Encode(0xc05cea30d0, {0x4e16cc0, 0xc0b2bdf040})
/usr/local/go/src/encoding/json/stream.go:209 +0xf3
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).doEncode(0x9cc5cf?, {0x58e09e0?, 0xc0b2bdf040?}, {0x58c99e0, 0xc0b2c2fc20})
vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:246 +0x19a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Encode(0xc000344c30, {0x58e09e0, 0xc0b2bdf040}, {0x58c99e0, 0xc0b2c2fc20})
vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:220 +0xfc
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).doEncode(0xc00093dc20, {0x58e0940, 0xc04f459a40}, {0x58c99e0, 0xc0b2c2fc20}, {0x0?, 0x0?})
vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go:284 +0xa12
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).encode(0xc00093dc20, {0x58e0940, 0xc04f459a40}, {0x58c99e0, 0xc0b2c2fc20}, {0x0?, 0x0?})
vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go:214 +0x167
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).Encode(0xc05cb67d10?, {0x58e0940?, 0xc04f459a40?}, {0x58c99e0?, 0xc0b2c2fc20?})
vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go:207 +0x33
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime.Encode({0x7f407ed689c8, 0xc00093dc20}, {0x58e0940, 0xc04f459a40})
vendor/k8s.io/apimachinery/pkg/runtime/codec.go:50 +0x64
k8s.io/kubernetes/vendor/k8s.io/apiserver/plugin/pkg/audit/log.(*backend).logEvent(0xc000676750, 0xc04f459a40)
vendor/k8s.io/apiserver/plugin/pkg/audit/log/backend.go:76 +0x73
k8s.io/kubernetes/vendor/k8s.io/apiserver/plugin/pkg/audit/log.(*backend).ProcessEvents(0x40e167?, {0xc09db7fcf8, 0x1, 0xc0b2bc6d01?})
vendor/k8s.io/apiserver/plugin/pkg/audit/log/backend.go:65 +0x45
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/options.(*ignoreErrorsBackend).ProcessEvents(0x59013c0?, {0xc09db7fcf8?, 0x40e167?, 0x98?})
vendor/k8s.io/apiserver/pkg/server/options/audit.go:395 +0x26
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.processAuditEvent({0x59013c0, 0xc05cb67d10}, {0x7f407ed6fd98, 0xc0012a1f40}, 0xc04f459a40, {0xc0012a1ed0?, 0x1, 0x8337e00?})
vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:202 +0x24f
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1()
vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:115 +0x41d
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1({0x5900650?, 0xc0ab318160}, 0xc098e8e800)
vendor/k8s.io/apiserver/pkg/endpoints/filters/audit.go:118 +0x5c5
net/http.HandlerFunc.ServeHTTP(0xa?, {0x5900650?, 0xc0ab318160?}, 0xc0612f1a80?)
/usr/local/go/src/net/http/server.go:2122 +0x2f
goroutine 295461 [running]:
This cluster is running v1.24.16. The panic is occurring on this line while generating the audit log entry: https://github.com/kubernetes/kubernetes/blob/v1.24.16/staging/src/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go#L246
What did you expect to happen?
A panic should not be possible from the audit handler.
How can we reproduce it (as minimally and precisely as possible)?
We do not have a method to reproduce this deterministically. This has occurred during periods where 1000+ pods are deleted which are intercepted by a mutating webhook that is timing out.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.16-eks-2d98532", GitCommit:"af930c12e26ef9d1e8fac7e3532ff4bcc1b2b509", GitTreeState:"clean", BuildDate:"2023-07-28T16:56:16Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.16-eks-2d98532", GitCommit:"af930c12e26ef9d1e8fac7e3532ff4bcc1b2b509", GitTreeState:"clean", BuildDate:"2023-07-28T16:52:47Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here