Re: VMs not migrated after node failure until node…

Thank you for the detailed explanation. I understand that KubeVirt deliberately avoids automatic migration during a node outage to prevent data corruption or split-brain scenarios, and that the virt-launcher pod must fully terminate before a replacement can be scheduled.

However, in our environment, we have a Disaster Recovery (DR) requirement: whenever a node goes down, the VM should come up on another healthy node automatically without waiting for the failed node to recover. This is critical to meet our uptime and availability SLAs.

Could you advise on the recommended approach or configuration in KubeVirt/OpenShift Virtualization to achieve automatic failover of VMs across nodes during node failures while still ensuring data integrity? For example, should we consider Live Migration with OnNodeFailure runStrategy, anti-affinity rules, or any high-availability setup within KubeVirt?

We want to implement this in a way that ensures the VM is immediately available on a healthy node while avoiding split-brain scenarios.

Read more here: Source link