Browsed by
Category: Uncategorized

General vSAN Error

General vSAN Error

vSAN is a wonderful shared storage option in a vSphere cluster, but it requires an administrator with deep product knowledge and overall awareness to be able to manage it with an understanding of its quirks and gotchas. I’ve worked with several vSAN clusters composed of many nodes for a few years now but sometimes it still surprises me. I’ve recently spent a couple of hours troubleshooting a “General vSAN Error” to figure out why I couldn’t put a host in Maintenance Mode. Finally I found out that it was done on purpose. I decided to describe my experience to help others to resolve their vSAN issues.

Usually, if I want to check some scenario as quickly as possible, I use one of the VMware Hands On Labs environments, which I reconfigure just as I need it. This time I used “HOL-2008-01-HCI – vSAN – Getting Started”. It is based on the 6.7 version. I know it’s not a current vSAN version, but it is mature enough to use it for testing. I wanted to check how a three-node cluster would behave if I put one of the nodes in Maintenance Mode choosing “Full data migration” as a data evacuation option. A VM which was run in the cluster used “vSAN Default Storage Policy”. The task quickly failed after it started, with an error message “General vSAN error”. I immediately checked if there was enough storage space left on disks of the remaining nodes and there was. A “CORE-A” VM was consuming just 492.1 MB from almost 60 GB of vSAN datastore. Even if I put one host in Maintenance Mode, it would be enough storage space from the remaining two nodes. I decided to confirm this conclusion, so I opened a SSH session to vCenter Server Appliance (vCSA). I ran these commands:

rvc administrator@corp.local@vcsa-01a.corp.local
vsan.whatif_host_failures -s 1/RegionA01/computers/RegionA01-COMP01/

It showed me which percent of storage space was used per node and how these numbers would change after a simulated failure of 1 node. It didn’t look suspicious.

Next, I checked “Task Console” in vSphere Client to find any clues. A description added to the error message confused me: “Evacuation precheck failed – Retry operation after adding 1 nodes with each node having 1 GB worth of capacity.” and I ignored it without thinking. I dived into kb.vmware.com to find any clues there.
I quickly found this article: “out of resources” error when entering maintenance mode on vSAN hosts with large vSAN objects (2149615).
This got my attention to vSAN’s clomd service. I decided to check /var/log/clomd.log. I opened a SSH session to an ESXi host and found in last four consecutive lines that decommission operation was started and it changed its state as shown below:

DECOM_STATE_NONE 
DECOM_STATE_ACTIVE
DECOM_STATE_AUDIT
DECOM_STATE_FAILED
DECOM_STATE_NONE

Also, I decided to find if there were any known problems with decommissioning nodes from vSAN clusters. I quickly found another article: “vSAN Host Maintenance Mode is in sync with vSAN Node Decommission State (51464)” and I used this recommended command to check if there were any problems in vSAN database with node decommissioning:

cmmds-tool find -t NODE_DECOM_STATE -f json | grep ‘uuid\|decomState’

The results showed that values for decomState key were equal to zero. It indicated that there weren’t any problems with background decommission operation which froze.

Then, I decided to find any traces in VMware’s community resources. I easily found that my issue was well known and there were some solutions.
In the post titled “A general system error occurred: Operation failed due to a VSAN error. Another host in the cluster is already entering maintenance mode” I found out that I should try to break any Maintenance Mode entering operations using this command:

localcli vsan maintenancemode cancel

In order to put a host into Maintenance Mode I should use this command:

localcli system maintenanceMode set -e true -m noAction

I found it useful, but putting a host into Maintenance Mode without data evacuation wasn’t what I was looking for.

Finally, desperately I decided to search the product documentation to find the answers. And my life got easier from the first hit. In vSAN documentation in the article titled “Place a Member of vSAN Cluster in Maintenance Mode” I found this definition of the available data evacuation options:

Ensure accessibility – “This is the default option. When you power off or remove the host from the cluster, vSAN ensures that all accessible virtual machines on this host remain accessible. Select this option if you want to take the host out of the cluster temporarily, for example, to install upgrades, and plan to have the host back in the cluster. This option is not appropriate if you want to remove the host from the cluster permanently.
Typically, only partial data evacuation is required. However, the virtual machine might no longer be fully compliant to a VM storage policy during evacuation. That means, it might not have access to all its replicas. If a failure occurs while the host is in maintenance mode and the Primary level of failures to tolerate is set to 1, you might experience data loss in the cluster.”

And finally the most important note was this one:

“This is the only evacuation mode available if you are working with a three-host cluster or a vSAN cluster configured with three fault domains.”

The rest of the definitions you can read there, but what I read was the explanation I was looking for.

If you use a three-node vSAN cluster and want to put a host in Maintenance Mode to be able to do any service activities, you don’t have an option to fully protect hosted VMs. It can be done by using at least 4 nodes in the cluster.

Remember folks, the old rule “RTFM” still counts!

Part 1 – PVRDMA and how to test it in home lab.

Part 1 – PVRDMA and how to test it in home lab.

One of the members of the VMware User Community (VMTN) inspired me to build a configuration where two VMs use PVRDMA network adapters to communicate. The goal I wanted to achieve was to establish the communication between VMs
without using Host Channel Adapter cards installed in hosts. It’s possible to configure it as stated here, in the VMware vSphere documentation.

For virtual machines on the same ESXi hosts or virtual machines using the TCP-based fallback, the HCA is not required.

To do this task, I prepared one ESXi host (6.7U1) managed by vCSA (6.7U1). One of the requirements for RDMA is a vDS. First, I configured a dedicated vDS for RDMA communication. I simply set a basic vDS configuration (DSwitch-DVUplinks-34) with a default portgroup (DPortGroup) and, then, equipped it with just one uplink.

In vSphere, a virtual machine can use a PVRDMA network adapter to communicate with other virtual machines that have PVRDMA devices. The virtual machines must be connected to the same vSphere Distributed Switch.

Second, I created a VMkernel port (vmk1) dedicated to RDMA traffic in this DPortGroup without assigning an IP address to this port (No IPv4 settings).

Third, I set Advanced System Setting Net.PVRDMAVmknic on ESXi host and gave it a value pointing to VMkernel port (vmk1). Tag a VMkernel Adapter for PVRDMA.

Then, I enabled “pvrdma” firewall rule on host in Edit Security Profile window. Enable the Firewall Rule for PVRDMA.

The next steps are related to configuration of the VMs. First, I created a new virtual machine. Then, I added another Network adapter to it and connected it to DPortGroup on vDS. For the Adapter Type of this Network adapter I chose PVRDMA and Device Protocol RoCE v2. Assign a PVRDMA Adapter to a Virtual Machine.

Then, I installed Fedora 29 on a first VM. I chose it because there are many tools to easily test a communication using RDMA. After the OS installation, another network interface showed up on the VM. I addressed it in a different IP subnet. I used two network interfaces in VMs, the first one to have an access through SSH and the second one to test RDMA communication.

Then I set “Reserve all guest memory (All locked)” in VM’s Edit Settings window.

I had two VMs configured enough – in the infrastructure layer – to communicate using RDMA.

To do it I had to install appropriate tools. I found them on GitHub, here.
To use them, I had to install them first. I did it using the procedure described on the previously mentioned page.

dnf install cmake gcc libnl3-devel libudev-devel pkgconfig valgrind-devel ninja-build python3-devel python3-Cython	

Next, I installed the git client using the following command.

yum install git

Then I cloned the git project to a local directory.

mkdir /home/rdma
git clone https://github.com/linux-rdma/rdma-core.git /home/rdma

I built it.

cd /home/rdma
bash build.sh

Afterwards, I cloned a VM to have a communication partner for the first one. After cloning, I reconfigured an appropriate IP address in the cloned VM.

Finally I could test the communication using RDMA.

On the VM that functioned as a server I ran a listener service on the interface mapped to PVRDMA virtual adapter:

cd /home/rdma/build/bin
./rping -s -a 192.168.0.200 -P

I ran this command, which allowed me to connect the client VM to the server VM:

./rping -c -I 192.168.0.100 -a 192.168.0.200 -v

It was working beautifully!

pvrdma

dcli and how to shutdown vCSA

dcli and how to shutdown vCSA

Sometimes You want to shutdown vCSA or PSC gracefully, but You don’t have an access to GUI through vSphere Client or VAMI.

How to do it in CLI? I’m going to show You right now using dcli, because I’m exploring a potential of this tool and I can’t get enough.

  1. Open an SSH session to vCSA and log in as root user.
  2. Run dcli command in an interactive mode.

    dcli +i
  3. Use shutdown API call, to shutdown an appliance, giving a delay value (0 means now) and a description of disk task.

    com vmware appliance shutdown poweroff --delay 0 --reason 'Shutdown now'
  4. Enter an appropriate administrator user name e.g. administrator@vsphere.local and a password.
  5. Decide if You want save the credentials in the credstore. You can enter ‘y’ as yes.

Wait until the appliance will go down. 🙂

I know that it’s not the quickest way, but the point is to have fun.

dcli and orphaned VMs in vCenter Server inventory

dcli and orphaned VMs in vCenter Server inventory

The orphaned VMs in vCenter inventory is an unusual view in experienced administrator’s Web/vSphere Client window. But in large environments, where many people manage hosts and VMs it will happen sometimes.

You do know how to get rid of them using traditional methods described in VMware KB articles and by other well known bloggers, but there’s a quite elegant new method using dcli.
This handy tool is available in vCLI package, in 6.5/6.7 vCSA shell and vCenter Server on Windows command prompt. Dcli does use APIs to give an administrator the interface to call some methods to get done or to automate some tasks.

How to use it to remove orphaned VMs from vCenter inventory?

  1. Open an SSH session to vCSA and log in as root user.
  2. Run dcli command in an interactive mode.

    dcli +i
  3. Get a list of VMs registered in vCenter’s inventory. Log in as administrator user in your SSO domain. You can save credentials in the credstore for future use.

    com vmware vcenter vm list
  4. From the displayed list get VM’s MoID (Managed Object Id) of the affected VM, e.g. vm-103.
  5. Run this command to delete the record of the affected VM using its MoID from vCenter’s database.

    com vmware vcenter vm delete --vm vm-103
  6. Using Web/vSphere Client check the vCenter’s inventory if the affected VM is now deleted.

It’s working!

Part 2 – How to list vSwitch “MAC Address table” on ESXi host?

Part 2 – How to list vSwitch “MAC Address table” on ESXi host?

The other way to list MAC addresses of open ports on vSwitches on the ESXi host is based on net-stats tool.

Use this one-liner.

		
for VSWITCH in $(vsish -e ls /net/portsets/ | cut -c 1-8); do net-stats -S $VSWITCH | grep \{\"name | sed 's/[{,"]//g' | awk '{$9=$10=$11=$12=""; print $0}'; done		
		
	

This is not a final word. 🙂

Part 1 – How to list vSwitch “MAC Address table” on ESXi host?

Part 1 – How to list vSwitch “MAC Address table” on ESXi host?

Sometimes You need to list MAC addresses loged on host’s vSwitches to eliminate VM’s MAC address duplicates.

  1. Create a shell script:
  2. vi mac_address_list.sh
  3. Copy and past the code listed below:
  4. 
    #!/bin/sh
    #vmrale
    for VSWITCH in `vsish -e ls /net/portsets/ | cut -c 1-8`
    do
            echo $VSWITCH
            for PORT in `vsish -e ls /net/portsets/$VSWITCH/ports | cut -c 1-8`
            do
                    CLIENT_NAME=`vsish -e get /net/portsets/$VSWITCH/ports/$PORT/status | grep clientName | uniq`
                    ADDRESS=`vsish -e get /net/portsets/$VSWITCH/ports/$PORT/status | grep unicastAdd | uniq`
                    echo -e "\t$PORT\t$CLIENT_NAME\t$ADDRESS"
            done
    done        
    
    
  5. Change the file’s permissions
  6. chmod 755 mac_address_list.sh
  7. Run the script
  8. ./mac_address_list.sh

Simple, but useful! 🙂

… but this is not the only one possible method 🙂

Alternative methods to create virtual switch.

Alternative methods to create virtual switch.

Creating virtual switch through GUI is well described in documentation and pretty intuitive using GUI. However, sometimes it might be useful to know how to do it with CLI or Powershell, thus making the process part of a script to automate initial configuration of ESXi after installation.

Here you will find commands which are necessary to create and configure a standard virtual switch using CLI and Powershell. Those examples will describe the process of vSwitch creation for vMotion traffic which involves VMkernel creation.

I. vSwitch configuration through CLI

  1. Create a vSwitch named “vMotion”

esxcli network vswitch standard add -v vMotion

  1. Check whether your newly created vSwitch was configured and is available on the list.

esxcli network vswitch standard list

  1. Add physical uplink (vmnic) to your vSwitch

esxcli network vswitch standard uplink add -u vmnic4 -v vMotion

  1. Designate an uplink to be used as active.

esxcli network vswitch standard policy failover set -a vmnic4 -v vMotion

  1. Add a port group named “vMotion-PG” to previously created vSwitch

esxcli network vswitch standard portgroup add -v vMotion -p vMotion-PG

  1. Add a VMkernel interface to a port group (Optional – not necessary if you are creating a vSwitch just for VM traffic)

esxcli network ip interface add -p vMotion-PG -i vmk9

  1. Configure IP settings of a VMkernel adapter.

esxcli network ip interface ipv4 set -i vmk9 -t static -I 172.20.14.11 -N 255.255.255.0

  1. Tag VMkernel adapter for a vMotion service. NOTE – service tag is case sensitive.

esxcli network ip interface tag add -i vmk9 -t vmotion

Done, your vSwitch is configured and ready to service vMotion traffic.

 

II. vSwitch configuration through PowerCLI

  1. First thing is to connect to vCenter server.

Connect-VIServer -Server vcsa.vclass.local -User administrator@vsphere.local -Password VMware1!

  1. Indicate specific host and create new virtual switch, assigning vmnic at the same time.

$vswitch1 = New-VirtualSwitch -VMHost sa-esx01.vclass.local -Name vMotion -NIC vmnic4

  1. Create port group and add it to new virtual switch.

New-VirtualPortGroup -VirtualSwitch $vswitch1 -Name vMotion-PG

  1. Create and configure VMkernel adapter.

New-VMHostNetworkAdapter -VMHost sa-esx01.vclass.local -PortGroup vMotion-PG -VirtualSwitch vMotion -IP 172.20.11.11 -SubnetMask 255.255.255.0 -vmotionTrafficEnabled $true

 

VMware Virtual SAN 6.6 what’s new

VMware Virtual SAN 6.6 what’s new

1vsan

vSAN 6.6 it’s 6th generation of the product and there are more than 20+ new features and enhancements in this release, such as:

  • Native encryption for data-at-rest
  • Compliance certifications
  • Resilient management independent of vCenter
  • Degraded Disk Handling v2.0 (DDHv2)
  • Smart repairs and enhanced rebalancing
  • Intelligent rebuilds using partial repairs
  • Certified file service & data protection solutions
  • Stretched clusters with local failure protection
  • Site affinity for stretched clusters
  • 1-click witness change for Stretched Cluster
  • vSAN Management Pack for vRealize
  • Enhanced vSAN SDK and PowerCLI
  • Simple networking with Unicast
  • vSAN Cloud Analytics with real-time support notification and recommendations
  • vSAN Config Assist with 1-click hardware lifecycle management
  • Extended vSAN Health Services
  • vSAN Easy Install with 1-click fixes
  • Up to 50% greater IOPS for all-flash with optimized checksum and dedupe
  • Support for new next-gen workloads
  • vSAN for Photon in Photon Platform 1.1
  • Day 0 support for latest flash technologies
  • Expanded caching tier choice
  • Docker Volume Driver 1.1

 

… ok now lets review main enhancements:

vSAN 6.6 introduces the industry’s first native HCI security solution. vSAN will now offer data-at-rest encryption that is completely hardware-agnostic. No more concern about someone walking off with a drive or breaking in to a less-secure, edge IT location and stealing hardware. Encryption is applied at the cluster level, and any data written to a vSAN storage device, both at the cache layer and persistent layer can now be fully encrypted.  And vSAN 6.6 supports 2-factor authentication, including SecurID and CAC.

2vsan

Certified file services and data protection solutions are available from 3rd party partners in the VMware Ready for vSAN Program to enable customers to extend and complement their vSAN environment with proven, industry-leading solutions. These solutions provide customers with detailed guidance on how to complement vSAN. (EMC NetWorker is avaialble today with new solutions coming on soon)

3vsan

vSAN stretched cluster was released in Q3’15 to provide an Active-Active solution. vSAN 6.6 adds a major new capability that will deliver a highly-available stretched cluster that addresses the highest resiliency requirements of data centers. vSAN 6.6 adds support for local failure protection that can provide resiliency against both site failures and local component failures.

4vsan

PowerCLI Updates: Full featured vSAN PowerCLI cmdlets enable full automation that includes all the latest features. SDK/API updates also enable enterprise-class automation that brings cloud management flexibility to storage by supporting REST APIs.

VMware vRealize Operations Management Pack for vSAN released recently, provides customers with native integration for simplified management and monitoring. The vSAN management pack is specifically designed to accelerate time to production with vSAN, optimize application performance for workloads running on vSAN and provide unified management for the Software Defined Datacenter (SDDC). It provides additional options for monitoring, managing and troubleshooting vSAN along with the end-to-end infrastructure solutions.

5vsan

Finally, vSAN 6.6 is well suited for next-generation applications. Performance improvements, especially when combined with new flash technologies for write-intensive applications, enable vSAN to address more emerging applications like Big Data. The vSAN team has also tested and released numerous reference architectures for these types of solutions, including Big Data, Splunk and InterSystems Cache.

RESOURCES:

  • Splunk Reference Architecture: http://www.emc.com/collateral/service-overviews/h15699-splunk-vxrail-sg.pdf
  • Citrix XenDestkop/XenApp Blog: https://blogs.vmware.com/virtualblocks/2017/02/27/citrix-xenapp-xendesktop-7-12-vmware-vsan-6-5-flash/
  • vSAN, VxRail and Pivotal Cloud Foundry RA: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-pcf-vxrail-reference-architeture.pdf
  • vSAN and InterSystems Blog: https://community.intersystems.com/post/intersystems-data-platforms-and-performance-%E2%80%93-part-8-hyper-converged-infrastructure-capacity
  • Intel, vSAN and Big Data Hadoop: https://builders.intel.com/docs/storagebuilders/Hyper-Converged_big_data_using_Hadoop_with_All-Flash_VMware_vSAN.pdf

 

 

vCenter 6.5 DSN permissions

vCenter 6.5 DSN permissions

Recently we had some strange problems with our 6.5 lab vCenter (Windows version with MSSQL Server db), which frequently crashed. After some digging in vpxd logs it seem to be related to vc db permissions:

17-05-28T19:36:53.443+02:00 error vpxd[05420] [Originator@6876 sub=Default] [VdbStatement] SQLError was thrown: “ODBC error: (42000) – [Micrsoft][SQL Server Native Client 11.0][SQL Server]VIEW SERVER STATE permission was denied on object ‘server’, database ‘master’.” is returned when executing SQL statement “SELECT  DB_NAME(mf.DATABASE_ID) Db_Name,            CASE mf.FILE_ID WHEN 1 THEN ‘DATA’                            WHEN 2 THEN ‘LOG’            END File_Type,            vol.VOLUME_MOUNT_POINT AS Drive,            CONVERT(INT,vol.AVAILABLE_BYTES/1048576.0) FreeSpaceInMB,            (mf.SIZE*8)/1024 VCDB_Space_Mb,             mf.PHYSICAL_NAME Physical_Name,             SERVERPROPERTY(‘edition’) Sql_Server_Edition,             SERVERPROPERTY(‘productversion’) Sql_Server_Version            FROM            SYS.M” action.

The SQL execution is failing as the vCenter Server database user has no permisss on ‘master’ database, to resolve this issue, grant additional privileges to the vCenter Server database user:

use master
go
grant VIEW SERVER STATE to [vCenter_database_user]
go
GRANT VIEW ANY DEFINITION TO [vCenter_database_user]
go