ESXi host connection lost due to CDP/LLDP protocol
You can observe some random and intermittent loss of connection to ESXi 6.0 host running on Dell servers (both Rack and Blade). It’s caused by a bug with Cisco Discovery Protocol /Link Layer Discovery Protocol. It can be also seen while generating VMware support log bundle because during this process these protocol are also used to include information about the network.
What are these protocols for? Both of them perform similar roles in the local area network. They are used by network devices to advertise their identity, capabilities and neighbors. The main difference is that CDP is a Cisco proprietary protocol and LLDP is vendor-neutral. There are also other niche protocols like Nortel Discovery Protocol, Foundry Discovery Protocol or Link Layer Topology.
CDP and LLDP are also compatible with VMware virtual switches and thereby they can gather and display information about the physical switches. CDP is available for both standard and distributed switches whilst LLDP is available only for distributed virtual switches since vSphere 5.0
Cisco Discovery Protocol information displayed on vSwitch level.
There is currently no resolution for this bug but thanks to the VMware Technical Support the workaround described below is available.
Turn off the CDP for each vSwitch:
# esxcfg-vswitch –B down vSwitchX
You can also verify the current status of CDP using fallowing command:
# esxcfg-vswitch –b vSwitchX
This simple task will resolve the problem with random connection loss of ESXi hosts. Anyway it will not solve the problem with loss of connection during generation of log bundle.
To confirm that the prblem exist you can simply run fallowing command:
# vm-support –w /vmfs/volumes/datastore_name
Even though we turned off the CDP, during log generation process ESXi are using it to gather information about network topology.
To fix it you have to download this script called disablelldp2.py and perform the steps below:
- Copy the script to a datastore which is shared with all hosts,
- Open SSH to an ESXi host,
- Move to a destination where you copied the script,
- Grant the permission: # chmod 555 disablelldp2.py,
- Run the script: ./disablelldp2.py,
- After the script is executed move to /etc/rc.local.d and edit local.sh file. It should look like this:
#!/bin/sh
# local configuration options
# Note: modify at your own risk! If you do/use anything in this # script that is not part of a stable API (relying on files to be in # specific places, specific tools, specific output, etc) there is a # possibility you will end up with a broken system after patching or # upgrading. Changes are not supported unless under direction of # VMware support.
ORIGINAL_FILE=/sbin/lldpnetmap
MODIFIED_FILE=/sbin/lldpnetmap.original
if test -e “$MODIFIED_FILE”
then
echo “$MODIFIED_FILE already exists.”
else
mv “$ORIGINAL_FILE” “$MODIFIED_FILE”
echo “Omitting LLDP Script.” > “$ORIGINAL_FILE”
chmod 555 “$ORIGINAL_FILE”
fi
exit 0
- Restart the ESXi server and run vm-support command to confirm that the problem is solved.