Description
What happened?
Hello,
We noticed that when one of our CoreDNS pods is deleted, some client pods experience latency on their DNS queries.
This happens when the pod is completely deleted from Kubernetes, after the terminating
phase. When it happens, all DNS requests from some pods (not all of them, it seems random) are "stuck" for a few seconds (the value is the timeout value in the pod resolv.conf
file, so 5 seconds by default but if I set in the pod spec a timeout of 3 seconds in dnsConfig.options
, it will be 3 seconds at maximum).
You can see on this screenshot how it looks like on the application side (traces are generated using Opentemetry + httptrace): When the coredns pod is removed (not in terminating phase, completely removed, so after the lameduck
period, we even tried 17 seconds for lameduck), all requests are waiting for 5 seconds. We can see span durations decreasing because of new requests all wait until the system can send requests again:
We ran tcpdump
(tcpdump -w capture.pcap udp port 53
) on the pod namespace (using nsenter
) and we can indeed see that during 5 seconds, no DNS requests are visible (look at the traces and the wireshark timestamps, they are matching):
We're using Karpenter on our Kubernetes clusters so CoreDNS pods are destroyed regularly. To mitigate the issue, we moved the CoreDNS pods to stable nodes but at every node upgrade, the problem occurs so it's not a good long-term solution (it is also more expensive for us to have dedicated nodes for CoreDNS).
What did you expect to happen?
We didn't expect any latency during CoreDNS rollouts.
How can we reproduce it (as minimally and precisely as possible)?
On AWS EKS
A simple kubectl rollout restart -n kube-system deployment coredns
is enough to impact our applications.
On Exoscale SKS
I created a 1.31.4 cluster (and also reproduced with kube-proxy 1.32.0 on it) with 5 CoreDNS replicas, and then deployed an application generating DNS traffic on the cluster (it's the only app running on the cluster):
package main
import (
"context"
"errors"
"fmt"
"net"
"os"
"strconv"
"time"
)
func resolve(ctx context.Context, domain string) ([]net.IP, error) {
addrs, err := net.DefaultResolver.LookupIPAddr(ctx, domain)
if err != nil {
return nil, err
}
result := make([]net.IP, len(addrs))
for i, ia := range addrs {
result[i] = ia.IP
}
return result, nil
}
func main() {
domain := os.Getenv("DOMAIN")
if domain == "" {
panic(errors.New("DOMAIN env var is empty"))
}
parallelism, err := strconv.Atoi(os.Getenv("PARALLELISM"))
if err != nil {
panic(err)
}
interval, err := strconv.Atoi(os.Getenv("INTERVAL"))
if err != nil {
panic(err)
}
for i := 0; i < parallelism; i++ {
ticker := time.NewTicker(time.Duration(interval) * time.Millisecond)
go func() {
for {
select {
case <-ticker.C:
ctx, cancel := context.WithTimeout(context.Background(), 7*time.Second)
start := time.Now().UnixMilli()
_, err := resolve(ctx, domain)
cancel()
end := time.Now().UnixMilli()
duration := end - start
if err != nil {
fmt.Printf("%d: resolved in %d milliseconds with error: %s\n", start, duration, err.Error())
} else {
fmt.Printf("%d: resolved in %d milliseconds\n", start, duration)
}
}
}
}()
}
time.Sleep(24000 * time.Second)
}
I then deploy this code using this deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-test
spec:
replicas: 2
selector:
matchLabels:
app: dns-test
template:
metadata:
labels:
app: dns-test
spec:
containers:
- name: dns
image: mcorbin/dnstest:0.0.3
resources:
limits:
memory: "300Mi"
requests:
cpu: "0.5"
memory: "300Mi"
env:
- name: DOMAIN
value: "metrics-server.kube-system.svc.cluster.local."
- name: PARALLELISM
value: "4"
- name: INTERVAL
value: "50"
From time to time I can see slow DNS queries after rollout, similar to what I see on EKS:
1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds
Anything else we need to know?
We already investigated a lot of things:
- increased lameduck option on CoreDNS to 17 seconds: no changes
- It's not a CoreDNS performance issue (metrics are good, no latency at all which was verified by enabling debug logs).
- It's not a kube-proxy reconciliation latency issue: kube-proxy logs/metrics are good, endpoints are correctly updated
- We're mostly AWS EKS users but it seems we're also able to reproduce the issue on Exoscale SKS offering.
I suspect a conntrack issue when conntrack entries are removed from kube-proxy. I indeed noticed that cleaning the conntrack manually for CoreDNS IPs was causing the same symptoms
Kubernetes version
We reproduced the issue on several Kubernetes versions/cloud providers:
On AWS EKS:
Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.8-eks-2d5f260
On Exoscale SKS
Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.3
I also reproduced on Exoscale SKS with server v1.31.3
and kube-proxy v1.32.0
to get this fix.
The AWS EKS Service Team also told us that they can reproduce the issue on the (unreleased yet to users) v1.32.0
on their side.
Cloud provider
OS version
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
Install tools
Both cases use kube-proxy with iptables mode.
Container runtime (CRI) and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response