Description
Hi everyone,
We have a docker installation (version 19.0.3) running on Linux Red Hat (8.0).
We have a strange issue related to docker on some of our systems: operations related to image pull are very slow and they hang for many seconds at each layer after extraction is seemingly completed before going to the next layer:
acc5fb5d9486: Extracting [==================================================>] 79MB/79MB
stuck here for many seconds even though download and extraction progress is very fast.
While it is stuck here, all docker inspect operation also become stuck for many seconds (> 20 seconds).
Below our docker setup:
docker info output:
Client:
Debug Mode: false
Server:
Containers: 18
Running: 10
Paused: 0
Stopped: 8
Images: 69
Server Version: 19.03.15
Storage Driver: devicemapper
Pool Name: lvmdata-docker--thinpool
Pool Blocksize: 65.54kB
Base Device Size: 3.221GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data Space Used: 32.43GB
Data Space Total: 48.1GB
Data Space Available: 15.67GB
Metadata Space Used: 58.72MB
Metadata Space Total: 163.6MB
Metadata Space Available: 104.9MB
Thin Pool Minimum Free Space: 4.81GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.181-RHEL8 (2021-10-20)
Logging Driver: syslog
Cgroup Driver: systemd
Plugins:
Volume: local lvm
Network: bridge host ipvlan macvlan null overlay xvp_macvlan
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: mosaix-runtime runc
Default Runtime: mosaix-runtime
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-513.9.1.el8_9.x86_64
Operating System: Red Hat Enterprise Linux 8.9 (Ootpa)
OSType: linux
Architecture: x86_64
CPUs: 10
Total Memory: 30.8GiB
Name: ekbi-ekch-mgt-01.ops.naviair.frq
ID: W4YH:T5XG:QXBX:I7BO:KG74:YEUO:GC6P:QN24:3LZH:UMGW:H5WD:WQVU
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker goroutine stacks at the time of event:
docker-goroutines.txt
It is strange that with exactly this setup on many environments it is fine but on some others not. We think there might be some lock contention causing this.
We have also ruled out CPU/Net/Disk IO bottlenecks.
Any idea on how do debug this further is very appreciated.
Thanks,
Radu.