Browsed by
Tag: vsphere

VSAN real capacity utilization

VSAN real capacity utilization

There are a few caveats that make the calculation and planning of VSAN capacity tough and gets even harder when you try to map it with real consumption on the VSAN datastore level.

  1. VSAN disks objects are thin provisioned by default.
  2. Configuring full reservation of storage space through Object Space Reservation rule in Storage Policy, does not mean

disk object block will be inflated on a datastore. This only means the space will be reserved and showed as used in VSAN Datastore Capacity pane.

Which makes it even harder to figure out why size of “files” on this datastore is not compliant with other information related to capacity.

  1. In order to plan capacity you need to include overhead of Storage Policies. Policies – as I haven’t met an environment which would use only one for all kinds of workloads. This means that planning should start with dividing workloads for different groups which might require different levels of protections.
  1. Apart from disks objects there are different objects especially SWAP which are not displayed in GUI and can be easily forgotten. However, based on the size of environment they might consume considerable amount of storage space.
  1. VM SWAP object does not adhere to Storage Policy assigned to VM. What does it mean? Even if you configure your VM’s disks with PFTT=0

SWAP will always utilize PFTT=1. Unless you configure advanced option (SwapThickProfivisionedDisabled) to disable it.

I have made a test to check how much space will consume my empty VM. (Empty means here without operating system even)

In order to see that a VM called Prod-01 has been created with 1 GB of memory and 2 GB of Hard disk and default storage policy assigned (PFTT=1)

Based on the Edit Setting window the VM disk size on datastore is 4 GB (Maximum sized based on disk size and policy). However, used storage space is 8 MB which means there will be 2 replicas 4 MB each, which is fine as there is no OS installed at all.

VMka wyłączona

However, when you open datastore files you will see this list with Virtual Disk object you will notice that the size is 36 864 KB which gives us 36 MB. So it’s neither 4 GB nor 8 MB as displayed by edit setting consumption..vsan pliki

Meanwhile datastore provisioned space is listed as 5,07 GB.

vmka dysk 2GB default policy i 1GB RAM - wyłączona

 

So let’s power on that VM.

Now the disks size remain intact, but other files appear as for instance SWAP has been created as well as log and other temporary files.

VSAN VMKa wlaczona

 

Looking at datastore provisioned space now it shows 5,9 GB. Which again is confisung even if we forgot about previous findings powering on VM triggers SWAP creation which according to the theory should be protected with PFTT=1 and be thick provisioned. But if that’s the case then the provisioned storage consumption should be increased by 2 GB not 0,83 (where some space is consumed for logs and other small files included in Home namespace object)

 

vmka dysk 2GB default policy i 1GB RAM - włączona

Moreover during those observations I noticed that during the VM booting process the provisioned space is peaking up to 7,11 GB for a very short period of time

And this value after a few seconds decreases to 5.07 GB. Even after a few reboots those values stays consistent.

vmka dysk 2GB default policy i 1GB RAM - podczas bootowania

The question is why those information are not consistent and what heppens during booting of the VM that is the reason for peak of provisioned space?

That’s the quest for not to figure it out 🙂

 

 

Part 2 – How to list vSwitch “MAC Address table” on ESXi host?

Part 2 – How to list vSwitch “MAC Address table” on ESXi host?

The other way to list MAC addresses of open ports on vSwitches on the ESXi host is based on net-stats tool.

Use this one-liner.

		
for VSWITCH in $(vsish -e ls /net/portsets/ | cut -c 1-8); do net-stats -S $VSWITCH | grep \{\"name | sed 's/[{,"]//g' | awk '{$9=$10=$11=$12=""; print $0}'; done		
		
	

This is not a final word. 🙂

Part 1 – How to list vSwitch “MAC Address table” on ESXi host?

Part 1 – How to list vSwitch “MAC Address table” on ESXi host?

Sometimes You need to list MAC addresses loged on host’s vSwitches to eliminate VM’s MAC address duplicates.

  1. Create a shell script:
  2. vi mac_address_list.sh
  3. Copy and past the code listed below:
  4. 
    #!/bin/sh
    #vmrale
    for VSWITCH in `vsish -e ls /net/portsets/ | cut -c 1-8`
    do
            echo $VSWITCH
            for PORT in `vsish -e ls /net/portsets/$VSWITCH/ports | cut -c 1-8`
            do
                    CLIENT_NAME=`vsish -e get /net/portsets/$VSWITCH/ports/$PORT/status | grep clientName | uniq`
                    ADDRESS=`vsish -e get /net/portsets/$VSWITCH/ports/$PORT/status | grep unicastAdd | uniq`
                    echo -e "\t$PORT\t$CLIENT_NAME\t$ADDRESS"
            done
    done        
    
    
  5. Change the file’s permissions
  6. chmod 755 mac_address_list.sh
  7. Run the script
  8. ./mac_address_list.sh

Simple, but useful! 🙂

… but this is not the only one possible method 🙂

Alternative methods to create virtual switch.

Alternative methods to create virtual switch.

Creating virtual switch through GUI is well described in documentation and pretty intuitive using GUI. However, sometimes it might be useful to know how to do it with CLI or Powershell, thus making the process part of a script to automate initial configuration of ESXi after installation.

Here you will find commands which are necessary to create and configure a standard virtual switch using CLI and Powershell. Those examples will describe the process of vSwitch creation for vMotion traffic which involves VMkernel creation.

I. vSwitch configuration through CLI

  1. Create a vSwitch named “vMotion”

esxcli network vswitch standard add -v vMotion

  1. Check whether your newly created vSwitch was configured and is available on the list.

esxcli network vswitch standard list

  1. Add physical uplink (vmnic) to your vSwitch

esxcli network vswitch standard uplink add -u vmnic4 -v vMotion

  1. Designate an uplink to be used as active.

esxcli network vswitch standard policy failover set -a vmnic4 -v vMotion

  1. Add a port group named “vMotion-PG” to previously created vSwitch

esxcli network vswitch standard portgroup add -v vMotion -p vMotion-PG

  1. Add a VMkernel interface to a port group (Optional – not necessary if you are creating a vSwitch just for VM traffic)

esxcli network ip interface add -p vMotion-PG -i vmk9

  1. Configure IP settings of a VMkernel adapter.

esxcli network ip interface ipv4 set -i vmk9 -t static -I 172.20.14.11 -N 255.255.255.0

  1. Tag VMkernel adapter for a vMotion service. NOTE – service tag is case sensitive.

esxcli network ip interface tag add -i vmk9 -t vmotion

Done, your vSwitch is configured and ready to service vMotion traffic.

 

II. vSwitch configuration through PowerCLI

  1. First thing is to connect to vCenter server.

Connect-VIServer -Server vcsa.vclass.local -User administrator@vsphere.local -Password VMware1!

  1. Indicate specific host and create new virtual switch, assigning vmnic at the same time.

$vswitch1 = New-VirtualSwitch -VMHost sa-esx01.vclass.local -Name vMotion -NIC vmnic4

  1. Create port group and add it to new virtual switch.

New-VirtualPortGroup -VirtualSwitch $vswitch1 -Name vMotion-PG

  1. Create and configure VMkernel adapter.

New-VMHostNetworkAdapter -VMHost sa-esx01.vclass.local -PortGroup vMotion-PG -VirtualSwitch vMotion -IP 172.20.11.11 -SubnetMask 255.255.255.0 -vmotionTrafficEnabled $true

 

vMotion fails to migrate VMs between ESXi host which have the same configuration

vMotion fails to migrate VMs between ESXi host which have the same configuration

As a rule vMotion requires the same family of CPU among involved servers which ensure the same feature set to be presented in order to succeed.

This is obvious statement, if you have for example Intel Xeon V3 and v4 CPU generations in your cluster you need EVC in order to make it work. But recently I have came across and issue that vMotion were failing to migrate VMs between hosts with identical configuration. That were Dells R730 with V3 Intel CPUs to be more precise.

The error message stated as follows:

The target host does not support the virtual machine’s current hardware requirements.
To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.
com.vmware.vim.vmfeature.cpuid.stibp
com.vmware.vim.vmfeature.cpuid.ibrs
com.vmware.vim.vmfeature.cpuid.ibpb

Turn on EVC it says, but wait a minute – EVC for the same CPUs? That sounds ridiculous as far as there were 3 exactly the same hosts in the cluster. To make it more unusual I was not able to mgirate VMs only from host 02 to others but was able to migrate VMs online between 01 and 03 and so on. So it definitely was related to host 02 itself.

So I did additional tests which revealed even more weird behaviour for example:

  • I was able to cold migrate a VM from host02 to 01 and then back from 01 to 02 this time online.
  • I was able to migrate VMs without any issues between 02 and 03.
  • all configuratian, communication and so on were correct/
  • not able to migrate VMs using Shared-Nothing vMotion

But then after a few such attempts I realized then host02 has different build than others, small difference but it was a key thing here.

The build number of host02 was: 7526125, whilst other had 7388607. Not a big deal as far as vCenter had higher build it should not be an issue.

The clue here is that 7526125 is a BN of Spectre/Meltdown fixes which were withdrawn, so there were not installed on the rest of hosts in the cluster resulting in different capability set presented to ESXi which are:

  • “Capability Found: cpuid.IBRS”
  • “Capability Found: cpuid.IBPB”
  • “Capabliity Found: cpuid.STIBP”

There are currently 2 ways to deal with such issue:

  1. Cold migrate your VMs if you need to or simply wait for new patches from VMware.
  2. Reinstall that single host to ensure the same capabilities. That’s the way I have choosen because in my case that server had some additional hardware issues that had to be addressed.

For additional information take a look at:

  • https://kb.vmware.com/s/article/52085
  • https://kb.vmware.com/s/article/52345
  • https://kb.vmware.com/s/article/52245

 

Perennially reservations weird behaviour whilst not configured correctly

Perennially reservations weird behaviour whilst not configured correctly

Whilst using RDM disks in your environment you might notice long (even extremely long) boot time of your ESXi hosts. That’s because ESXi host uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as perennially reserved that is participating in an MSCS cluster. During the start of an ESXi host, the storage mid-layer attempts to discover all devices presented to an ESXi host during the device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the start process to lengthen as the ESXi host cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.

Configuring the device to be perennially reserved is local to each ESXi host, and must be performed on every ESXi host that has visibility to each device participating in an MSCS cluster. This improves the start time for all ESXi hosts that have visibility to the devices.

The process is described in this KB  and is requires to issue following command on each ESXi:

 esxcli storage core device setconfig -d naa.id –perennially-reserved=true

You can check the status using following command:

esxcli storage core device list -d naa.id

In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved.

However, recently I came across on a problem with snapshot consolidation, even storage vMotion was not possible for particular VM.

Whilst checking VM settings one of the disks was locked and indicated that it’s running on a delta disks which means there is a snapshot. However, Snapshot manager didn’t showed any snapshot, at all. Moreover, creating new and delete all snapshot which in most cases solves the consolidation problem didn’t help as well.

Per1

In the vmkernel.log while trying to consolidate VM lots of perenially reservation entries was present. Which initially I ignored because there were RDMs which were intentionally configured as perennially reserved to prevent long ESXi boot.

log

However, after digging deeper and checking a few things, I return to perenially reservations and decided to check what the LUN which generates these warnings is and why it creates these entries especially while trying consolidation or storage vMotion of a VM.

To my surprise I realised that datastore on which the VM’s disks reside is configured as perenially reserved! It was due to a mistake when the PowerCLi script was prepared accidentially someone configured all available LUNs as perenially reserved. Changing the value to false happily solved the problem.

The moral of the story is simple – logs are not issued to be ignored 🙂

vCenter Appliance 6.0 U3 email notifications are not sent when multiple email addresses are defined in an alarm action

vCenter Appliance 6.0 U3 email notifications are not sent when multiple email addresses are defined in an alarm action

Recently I tried to configure email notifications on my lab vCenter Server Appliance (6.0u3), but  experience issue:

 “Diagnostic-Code: SMTP;550 5.7.60 SMTP; Client does not have permissions to send as this sender”

I tried to use solution from kb: https://kb.vmware.com/kb/2075153 but apparently, the solution does not work with latest 6.0.x appliance!

After some research and digging deeper (header analysis ), it seems that root cause was invalid return path in the email header. To resolve this you need to edit two system files:

1. SSH to VCSA and enable shell:

#Command>shell.set –enabled True

# Command>shell

2. Open catalog : /etc/sysconfig

mail1

3. Edit “mail” using vi and made a change as in below prtsc:

#vi email

mail2

  • simply check using cat:

mail3

4. In the same catalog edit “sendmail” file adding a domain name “SENDMAIL_GENERICS_DOMAIN=”:

mail4

5. Subsequently, go to /etc/mail catalog and add a user to mask root in “genericstable”:

mail56. Regenerate table:

# makemap -r hash /etc/mail/genericstable.db < /etc/mail/genericstable

7. create file sendmail.mc:

#/sbin/conf.d/SuSEconfig.sendmail -m4 > /sendmail.mc

Note. Do not edit file “sendmail” like in abowe procedure

8. Double check if “sendmail.cf” file in catalog /etc exist if yes then change it a name:

   #mv /etc/sendmail.cf /etc/sendmail.cf.orig

9. Create a new config file:

#m4 /sendmail.mc > /etc/sendmail.cf

10. Open config file “sendmail.cf” (vi) and add IP SMTP/Exchange (DS[xxx.xxx.xxx.xxx] ) server in environment :

mail611. Restart sendmail service:

# /etc/init.d/sendmail restart

 

Now it should work fine !

Esxi Net.ReversePathFwdCheckPromisc Advanced setting

Esxi Net.ReversePathFwdCheckPromisc Advanced setting

During deployment of Cisco proxy appliance, we discovered a problem. According to cisco to resolve this problem qa“Net.ReversePathFwdCheckPromisc” should be set to “1” on ESX’s.

The question is – do you know any negative effects which such change could cause. We believe that there must be a reason why by default this option is set to 0 ? That’s why I decided to figure our what it is used for.

After some research I was able to find answer:

Setting – > Net.ReversePathFwdCheckPromisc = 1 — > this is when you are expecting the reverse filters to filter the mirrored packets, to prevent multicast packets getting duplicated.

Note: If the value of the Net.ReversePathFwdCheckPromisc configuration option is changed when the ESXi instance is running, you need to enable or re-enable the promiscuous mode for the change in the configuration to take effect.

The reason you would use promiscuous mode depends on the requirement and configuration. Please check the below KB Article:

http://kb.vmware.com/kb/1004099

  • This option is not enabled by default because we are not aware of the vSwitch configuration and can’t predict what it could be as it has configurable options.

VMware does not advise to enable this option if we do not have a use case scenario with teamed uplinks and have monitoring software running on the VMs ideally. As When promiscuous mode is enabled at the port group level, objects defined within that port group have the option of receiving all incoming traffic on the vSwitch. Interfaces and virtual machines within the port group will be able to see all traffic passing on the vSwitch causing VM performance impact.

Should the ESX server be rebooted for this change to take effect:  answer is – > Yes, and Yes you can enable this option with the VMs running on the existing portgroup.

Do you have any interesting virtualization related question?

 

SAP application on vSphere platform

SAP application on vSphere platform

This is a mini article to start our Q&A set, a set of not easy to find answer real life questions 😉 qa
Recently I received a question-related to advanced settings SAP app on vSphere platform:
“One of our customer ask us to set the following option to their virtual system: Misc.GuestLibAllowHostInfo This is according to SAP note: 1606643 where SAP requires reconfigure virtual system default configuration. I can’t find details information, which host data would be exposed to virtual system. Could you please point me to documentation or describe which information is being transferred from HOST to virtual systems?“

  • After some research I was able to find answer :

“Misc.GuestLibAllowHostInfo” and “tools.guestlib.enableHostInfo” these configurations if enabled allow the guest OS to access some of the ESXi host configurations, mainly performance metrics e.g. how many CPU cores the host has, their utilization and contention etc. There is no confidential information from other customers which would be visible, however, it may give the user of those SAP VMs access to performance/resource information which you may not want to share.

The following document outlines the effect of the changes as I have described above.

I believe the “might use the information to perform further attacks on the host” could only apply to other vulnerabilities which may exist for the particular hardware information that the guestOS can gather from the ESXi host.
Other than that I am not sure there is any other concern to worry about.

Do you have any interesting virtualization related question?

VMware vSphere tags limit – is it known ?

VMware vSphere tags limit – is it known ?

Recently I received quite interesting question – what is the supported maximum quantity  for tags in vCenter 6.0U2 ?

Malignant author of the question is a good friend of mine and VMware administrator in one person. He ssked about tags limit because he want to use them to provide more information about each of its production VM’s – roughly speaking need to create about 20000 tags.

I thought ok., give me couple seconds to verify this,  and looked fast in vmware configuration maxims …. couple minuntes later it was clear that this is not a easy question 😉

Furthermore after some additional research (no clear statement in official documentation)  we decide to perform tests in lab environment !

We used simple powercli script to create 20000 tags in test vcenter appliance (6.0U2) , below our script:

for($i=1
$i -le 20000
$i++){
New-Tag -Name $i -Description $i -Category test
}

Script worked like a charm without any issue – so far so good :), but when we tried to assign one tag to first vm we encounter web client error 1009  – very strange!

We decided to perform additional test and find out that limit is below 10000.  At this stage we decide to clear this issue with Vmware support and after some time received wery interesting feedback:

  1. NGC has upper bound of retrieve 10000 objects max.
  2. If the tags are less than 10000 then data service timeouts after 120 seconds(default dataservice timeout is 120 seconds).
  3. Decreasing the count to 9994 tags and increasing dataservice timeout, shows up all the tags(Assign) now.

As a temporary workaround for now.
————————————-
1. Have total created tags less than 10000.
2. Increase data service timeout to 600 seconds(10 min).

VMware GSS stands that engineering working now to remove tag limit boundary in next releases vSphere 6.x.