Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DNS latency when a CoreDNS pod is deleted #129617

Copy link
Copy link
Open
@mcorbin

Description

@mcorbin
Issue body actions

What happened?

Hello,

We noticed that when one of our CoreDNS pods is deleted, some client pods experience latency on their DNS queries.

This happens when the pod is completely deleted from Kubernetes, after the terminating phase. When it happens, all DNS requests from some pods (not all of them, it seems random) are "stuck" for a few seconds (the value is the timeout value in the pod resolv.conf file, so 5 seconds by default but if I set in the pod spec a timeout of 3 seconds in dnsConfig.options, it will be 3 seconds at maximum).

You can see on this screenshot how it looks like on the application side (traces are generated using Opentemetry + httptrace): When the coredns pod is removed (not in terminating phase, completely removed, so after the lameduck period, we even tried 17 seconds for lameduck), all requests are waiting for 5 seconds. We can see span durations decreasing because of new requests all wait until the system can send requests again:

Image

We ran tcpdump (tcpdump -w capture.pcap udp port 53) on the pod namespace (using nsenter) and we can indeed see that during 5 seconds, no DNS requests are visible (look at the traces and the wireshark timestamps, they are matching):

Image

We're using Karpenter on our Kubernetes clusters so CoreDNS pods are destroyed regularly. To mitigate the issue, we moved the CoreDNS pods to stable nodes but at every node upgrade, the problem occurs so it's not a good long-term solution (it is also more expensive for us to have dedicated nodes for CoreDNS).

What did you expect to happen?

We didn't expect any latency during CoreDNS rollouts.

How can we reproduce it (as minimally and precisely as possible)?

On AWS EKS

A simple kubectl rollout restart -n kube-system deployment coredns is enough to impact our applications.

On Exoscale SKS

I created a 1.31.4 cluster (and also reproduced with kube-proxy 1.32.0 on it) with 5 CoreDNS replicas, and then deployed an application generating DNS traffic on the cluster (it's the only app running on the cluster):

package main

import (
	"context"
	"errors"
	"fmt"
	"net"
	"os"
	"strconv"
	"time"
)

func resolve(ctx context.Context, domain string) ([]net.IP, error) {
	addrs, err := net.DefaultResolver.LookupIPAddr(ctx, domain)
	if err != nil {
		return nil, err
	}
	result := make([]net.IP, len(addrs))
	for i, ia := range addrs {
		result[i] = ia.IP
	}
	return result, nil
}

func main() {
	domain := os.Getenv("DOMAIN")
	if domain == "" {
		panic(errors.New("DOMAIN env var is empty"))
	}
	parallelism, err := strconv.Atoi(os.Getenv("PARALLELISM"))
	if err != nil {
		panic(err)
	}
	interval, err := strconv.Atoi(os.Getenv("INTERVAL"))
	if err != nil {
		panic(err)
	}

	for i := 0; i < parallelism; i++ {
		ticker := time.NewTicker(time.Duration(interval) * time.Millisecond)
		go func() {
			for {
				select {
				case <-ticker.C:
					ctx, cancel := context.WithTimeout(context.Background(), 7*time.Second)
					start := time.Now().UnixMilli()
					_, err := resolve(ctx, domain)
					cancel()
					end := time.Now().UnixMilli()
					duration := end - start
					if err != nil {
						fmt.Printf("%d: resolved in %d milliseconds with error: %s\n", start, duration, err.Error())
					} else {
						fmt.Printf("%d: resolved in %d milliseconds\n", start, duration)
					}

				}
			}
		}()
	}
	time.Sleep(24000 * time.Second)
}

I then deploy this code using this deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: dns-test
  template:
    metadata:
      labels:
        app: dns-test
    spec:
      containers:
      - name: dns
        image: mcorbin/dnstest:0.0.3
        resources:
          limits:
            memory: "300Mi"
          requests:
            cpu: "0.5"
            memory: "300Mi"
        env:
          - name: DOMAIN
            value: "metrics-server.kube-system.svc.cluster.local."
          - name: PARALLELISM
            value: "4"
          - name: INTERVAL
            value: "50"

From time to time I can see slow DNS queries after rollout, similar to what I see on EKS:

1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds
1736868482815: resolved in 5003 milliseconds

Anything else we need to know?

We already investigated a lot of things:

  • increased lameduck option on CoreDNS to 17 seconds: no changes
  • It's not a CoreDNS performance issue (metrics are good, no latency at all which was verified by enabling debug logs).
  • It's not a kube-proxy reconciliation latency issue: kube-proxy logs/metrics are good, endpoints are correctly updated
  • We're mostly AWS EKS users but it seems we're also able to reproduce the issue on Exoscale SKS offering.

I suspect a conntrack issue when conntrack entries are removed from kube-proxy. I indeed noticed that cleaning the conntrack manually for CoreDNS IPs was causing the same symptoms

Kubernetes version

We reproduced the issue on several Kubernetes versions/cloud providers:

On AWS EKS:

Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.8-eks-2d5f260

On Exoscale SKS

Client Version: v1.30.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.3

I also reproduced on Exoscale SKS with server v1.31.3 and kube-proxy v1.32.0 to get this fix.

The AWS EKS Service Team also told us that they can reproduce the issue on the (unreleased yet to users) v1.32.0 on their side.

Cloud provider

AWS EKS, Exoscale SKS

OS version

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

Install tools

Both cases use kube-proxy with iptables mode.

Container runtime (CRI) and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.Indicates an issue or PR lacks a `sig/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.