During recent Horizon deployment with GPUs for a Customer I’ve encounter an issue related to ECC on NVIDIA P40 GPU card. By the way as per documentation it applies to all cards with Pascal architecture.
In NVIDIA documentation you can find an information that apart from installing NVIDIA’s VIB you need to disable ECC. Which is clear for me. However, using nvidia-smi command after installing that VIB the status looks as follows:
ECC status is listed as N/A, which I initially treated as disabled. That was wrong.
When I tried to power on a VM with PCI device added I received following error:
Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU “profile_name”
Of course after typing it into google I’ve found this KB which clearly indicates the root cause – ECC!
With that in mind I went back to SSH console and issued following command to make sure it’s disabled:
nvidia-smi -i ID
After ESXi host restart I was able to power on all VMs 🙂