VLAN Discovery Failed
Sometimes you can see plenty of strange entries while observing vmkernel.log related to FCoE. It won’t be unusual if you use FCoE. However, if you don’t you could be a little bit curious or even worried about it, the timeouts or “link down” entries aren’t normal for most of vSphere Admins.
The problem could be seen when you are using some kinds of converged network cards. In my case it was HPE C7000 with Virtual Connects and Qlogic 57840. It’s a 10 Gb/s NIC which is also capable of FCoE and iSCSI offload. Anyway, FCoE isn’t used in any part of this infrastructure. Therefore fallowing entries were a little bit strange for me:
<3>bnx2fc:vmhba32:0000:87:00.0: bnx2fc_vlan_disc_timeout:218 VLAN 1002 failed. Trying VLAN Discovery.
<3>bnx2fc:vmhba32:0000:87:00.0: bnx2fc_start_disc:3260 Entered bnx2fc_start_disc
<3>bnx2fc:vmhba32:0000:87:00.0: bnx2fc_vlan_disc_timeout:193 VLAN Discovery Failed. Trying default VLAN 1002
<6>host4: fip: link down.
<6>host4: libfc: Link down on port ( 0)
<3>bnx2fc:vmhba32:0000:87:00.0: bnx2fc_vlan_disc_cmpl:266 vmnic2: vlan_disc_cmpl: hba is on vlan_id 1002
Furthermore I realized that in statistic of HBA there are listed two unfamiliar adapters: vmhba32 and vmhba33, what else they are listed with different driver used and with no traffic passed.
The driver bnx2fc indicates that it’s a driver of my network card. That’s means that the driver is loaded even if you do not use FCoE. The driver used for my network card is bnx2x, but there available and installed also bnx2fc, bnx2i, bnx2 and cnic. I was determined to make my vmkernel as clear as possible so I decided to turn it off.
After some investigation and test I managed to do it and get rid of these rubbish entries.
To turn off the FCoE in case you do not use it, you have to perform fallowing steps:
1. Remove the bnx2fc vib
# esxcli software vib remove --vibname=scsi-bnx2fc
2. Move to /etc/rc.local.d and remove a script called 99bnx2fc.sh which is responsible for loading the driver when the host boots.
3. Disable the FCoE on all network cards involved:
# esxcli fcoe nic disable -n vmnicX
4. Reboot the host and check that the errors aren’t present anymore in the logs.
Despite in driver version 2.713.10.v60.4 according to release notes which can be found here the problem should be resolved, however in my case it wasn’t.