As a rule vMotion requires the same family of CPU among involved servers which ensure the same feature set to be presented in order to succeed.
This is obvious statement, if you have for example Intel Xeon V3 and v4 CPU generations in your cluster you need EVC in order to make it work. But recently I have came across and issue that vMotion were failing to migrate VMs between hosts with identical configuration. That were Dells R730 with V3 Intel CPUs to be more precise.
The error message stated as follows:
The target host does not support the virtual machine’s current hardware requirements.
To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.
Turn on EVC it says, but wait a minute – EVC for the same CPUs? That sounds ridiculous as far as there were 3 exactly the same hosts in the cluster. To make it more unusual I was not able to mgirate VMs only from host 02 to others but was able to migrate VMs online between 01 and 03 and so on. So it definitely was related to host 02 itself.
So I did additional tests which revealed even more weird behaviour for example:
- I was able to cold migrate a VM from host02 to 01 and then back from 01 to 02 this time online.
- I was able to migrate VMs without any issues between 02 and 03.
- all configuratian, communication and so on were correct/
- not able to migrate VMs using Shared-Nothing vMotion
But then after a few such attempts I realized then host02 has different build than others, small difference but it was a key thing here.
The build number of host02 was: 7526125, whilst other had 7388607. Not a big deal as far as vCenter had higher build it should not be an issue.
The clue here is that 7526125 is a BN of Spectre/Meltdown fixes which were withdrawn, so there were not installed on the rest of hosts in the cluster resulting in different capability set presented to ESXi which are:
- “Capability Found: cpuid.IBRS”
- “Capability Found: cpuid.IBPB”
- “Capabliity Found: cpuid.STIBP”
There are currently 2 ways to deal with such issue:
- Cold migrate your VMs if you need to or simply wait for new patches from VMware.
- Reinstall that single host to ensure the same capabilities. That’s the way I have choosen because in my case that server had some additional hardware issues that had to be addressed.
For additional information take a look at: