-
Notifications
You must be signed in to change notification settings - Fork 275
Closed as not planned
Labels
lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Description
What happened:
I'm using k8s cluster on AWS eks, and I'm using spot instances for node groups. I see that randomly and not on all clusters one pod that manage the CSI NFS controller goes in crashloopback and report these logs:
csi-snapshotter E1029 09:35:37.115611 1 leaderelection.go:340] Failed to update lock optimitically: Operation cannot be fulfilled on leases.coordination.k8s.io "external-snapshotter-leader-nfs-csi-k8s-io": the object has been modified; please apply your changes to the latest version and try again, falling back to slow path
If I delete the pod, all starts without any issue:
nfs Compiler: gc
nfs Driver Name: nfs.csi.k8s.io
nfs Driver Version: v4.9.0
nfs Git Commit: ""
nfs Go Version: go1.22.3
nfs Platform: linux/amd64
It seems that every time (or mostly) that an ec2 is retired and swapped with another one, csi-nfs-controller has some lock that can be solved only with a brutal pod delete.
What you expected to happen:
No crashloopback status on a controller pod
How to reproduce it:
Try to deploy a cluster with spot instances and install nfs-csi-controller and see IF happens and WHEN.
Anything else we need to know?:
Environment:
- CSI Driver version: 4.9.0
- Kubernetes version (use
kubectl version
): 1.31 - OS (e.g. from /etc/os-release): AWS Bottlerocket
- Install tools: Terraform + Helm
- Others:
Fuochi
Metadata
Metadata
Assignees
Labels
lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.Denotes an issue or PR that has aged beyond stale and will be auto-closed.