Description
What happened?
ref: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/configmap/configmap.go#L230
I was running a Pod with a configmap which can be modified regularly. Last day root disk of one node is full, and today SRE reported volume dir in this Pod became empty.
After some digging, It seems like configmap volume dir in the host will be teardown if writer.Write
is failed. Because volume between pod and host is implemented by bind mount
, if the inode of host dir is changed, the dir in pod should not work.
defer func() {
// Clean up directories if setup fails
if !setupSuccess {
unmounter, unmountCreateErr := b.plugin.NewUnmounter(b.volName, b.podUID)
if unmountCreateErr != nil {
klog.Errorf("error cleaning up mount %s after failure. Create unmounter failed with %v", b.volName, unmountCreateErr)
return
}
tearDownErr := unmounter.TearDown()
if tearDownErr != nil {
klog.Errorf("Error tearing down volume %s with : %v", b.volName, tearDownErr)
}
}
}()
writerContext := fmt.Sprintf("pod %v/%v volume %v", b.pod.Namespace, b.pod.Name, b.volName)
writer, err := volumeutil.NewAtomicWriter(dir, writerContext)
if err != nil {
klog.Errorf("Error creating atomic writer: %v", err)
return err
}
err = writer.Write(payload)
if err != nil {
klog.Errorf("Error writing payload to dir: %v", err)
return err
}
err = volume.SetVolumeOwnership(b, mounterArgs.FsGroup, nil /*fsGroupChangePolicy*/, volumeutil.FSGroupCompleteHook(b.plugin, nil))
if err != nil {
klog.Errorf("Error applying volume ownership settings for group: %v", mounterArgs.FsGroup)
return err
}
setupSuccess = true
return nil
Is my suspicion correct?
What did you expect to happen?
content in pod dir should follow with configmap normally, but now I get empty dir.
How can we reproduce it (as minimally and precisely as possible)?
- remove
..data
link in volume, andwriter.Write
will fail.
cd /var/lib/kubelet/pods/6aab8ebd-c747-4efd-930b-592cccf43ccc/volumes/kubernetes.io~configmap/xxx/
rm -rf ..data
mkdir ..data
- wait 1-2 mins;
- go into the pod, the volume now should be empty.
Anything else we need to know?
No response
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.11-alicloud.base-tianji-9", GitCommit:"2cee452b78d", GitTreeState:"", BuildDate:"2022-09-23T08:42:15Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.11-alicloud.base-tianji-9", GitCommit:"2cee452b78d", GitTreeState:"", BuildDate:"2022-09-23T08:42:15Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
NAME="Alibaba Group Enterprise Linux Server"
VERSION="7.2 (Paladin)"
ID="alios"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Alibaba Group Enterprise Linux Server 7.2 (Paladin)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:alibaba:enterprise_linux:7.2:GA:server"
HOME_URL="https://os.alibaba-inc.com/"
BUG_REPORT_URL="https://os.alibaba-inc.com/"
ALIBABA_BUGZILLA_PRODUCT="Alibaba Group Enterprise Linux 7"
ALIBABA_BUGZILLA_PRODUCT_VERSION=7.2
ALIBABA_SUPPORT_PRODUCT="Alibaba Group Enterprise Linux"
ALIBABA_SUPPORT_PRODUCT_VERSION=7.2
Linux c23b02006.cloud.b02.amtest64a 4.19.91-007.ali4000.alios7.x86_64 #1 SMP Wed Apr 8 16:17:43 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status