vMotion fails to migrate VMs between ESXi host which have the same configuration

vMotion fails to migrate VMs between ESXi host which have the same configuration

As a rule vMotion requires the same family of CPU among involved servers which ensure the same feature set to be presented in order to succeed.

This is obvious statement, if you have for example Intel Xeon V3 and v4 CPU generations in your cluster you need EVC in order to make it work. But recently I have came across and issue that vMotion were failing to migrate VMs between hosts with identical configuration. That were Dells R730 with V3 Intel CPUs to be more precise.

The error message stated as follows:

The target host does not support the virtual machine’s current hardware requirements.
To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.
com.vmware.vim.vmfeature.cpuid.stibp
com.vmware.vim.vmfeature.cpuid.ibrs
com.vmware.vim.vmfeature.cpuid.ibpb

Turn on EVC it says, but wait a minute – EVC for the same CPUs? That sounds ridiculous as far as there were 3 exactly the same hosts in the cluster. To make it more unusual I was not able to mgirate VMs only from host 02 to others but was able to migrate VMs online between 01 and 03 and so on. So it definitely was related to host 02 itself.

So I did additional tests which revealed even more weird behaviour for example:

  • I was able to cold migrate a VM from host02 to 01 and then back from 01 to 02 this time online.
  • I was able to migrate VMs without any issues between 02 and 03.
  • all configuratian, communication and so on were correct/
  • not able to migrate VMs using Shared-Nothing vMotion

But then after a few such attempts I realized then host02 has different build than others, small difference but it was a key thing here.

The build number of host02 was: 7526125, whilst other had 7388607. Not a big deal as far as vCenter had higher build it should not be an issue.

The clue here is that 7526125 is a BN of Spectre/Meltdown fixes which were withdrawn, so there were not installed on the rest of hosts in the cluster resulting in different capability set presented to ESXi which are:

  • “Capability Found: cpuid.IBRS”
  • “Capability Found: cpuid.IBPB”
  • “Capabliity Found: cpuid.STIBP”

There are currently 2 ways to deal with such issue:

  1. Cold migrate your VMs if you need to or simply wait for new patches from VMware.
  2. Reinstall that single host to ensure the same capabilities. That’s the way I have choosen because in my case that server had some additional hardware issues that had to be addressed.

For additional information take a look at:

  • https://kb.vmware.com/s/article/52085
  • https://kb.vmware.com/s/article/52345
  • https://kb.vmware.com/s/article/52245

 

12 thoughts on “vMotion fails to migrate VMs between ESXi host which have the same configuration

  1. Excellent article. Just ran into the exact same issue last night and found your article. Describes the same situation to a tee. Dell R620’s in a cluster, identical systems, same build revs on the servers, same issue.

    1. Thank you Chris. Fortunatelly this issue should be currently gone as new patches which are now consistent were released 🙂

  2. Thanks for the article. I just ran into this exact issue tonight on Dell 730s. This saved me some time. Thanks.

  3. Thank you! I ran into this as well. VMware support didn’t even catch this, they just said enable EVC mode, but I couldn’t b/c of the different ESXi versions.

  4. In my scenario, I have 4 hosts in the cluster and all have the same build version too.

    However, I am unable to migrate VM to Host No.3. But some of the VM can migrate.

    So I am confused is this issue with Host or particular VM.

  5. Shutting down the VM allows the migration to non-dynamically reconfigure the VM according to the different target host hardware configuration.
    So try migrating while VM is shut down.

  6. In this case was from 10.0.28.37 to 10.0.28.48  (all ESXi hosts from the same Cluster where VMs should migrate) As we can see in the IP display board, MTU is 9000, that means Jumbo Frames are enabled. So I try same vmkping command with Jumbo Frames to make sure all ESXi hosts were able to ping each other (even using Jumbo Frames), so the issues were not on the network level.

  7. What if I do actually have EVC enabled and i’m running into the same issue? I checked build numbers, CPU Architectures. . I DO actually have a mixed node cluster, but EVC is enabled across them all. . 3 nodes are identical then I have 2 more nodes that are also identical. This is Nutanix HCI i’m using. 1 of the 3 identical nodes is not allowing me to vmotion a vm to it, but I can to the non-uniform nodes. I’m wondering if my situation is due to the fact that I enabled ‘Expose hardware assisted virtualization to the guest OS’ and ‘Expose the NX/XD flag to guest’ options enabled. For the Hardware Virtualization option, I learned that if I wanted that enabled in a mixed node cluster. All i had to do was make sure the VM came online and booted up seeing new CPU hardware on the nodes that had different CPU instruction sets/microarchitecture and as long as we could see the different CPU’s in Device Manager. No matter where it vmotioned to or failed over to, it wouldnt have an issue.

Leave a Reply to Alex Cernau Cancel reply

Your email address will not be published. Required fields are marked *