Browsed by
Author: Paweł

vCloud Director Network considerations

vCloud Director Network considerations

One of the most tricky parts of vCD – networks. It took my some time to to digest how those network relations between different types of network in vCD works. Just to remind we distinguish:

  • External Networks
  • VDC Organization Networks
  • vApp Networks

Moreover for both VDC Orgzanization and vApp networks we distinguish folowing types:

  • Directly connected to upper layer network
  • Routed network
  • Isolated Network

To complicate even further vApp directly connected network can be fenced 🙂

All networks apart from directly connected will create an ESG (yes, even isolated network requires an ESG!). Just don’t be fooled during some test that they are not visible in vSphere  as soon as you create new vApp/Org VDC Network. ESG as well as port group on DVS will be established not at the time of vCD network creation but when you connect and power a VM to this network for the first time.

To understand how we can mix and match these networks I’ve created a diagram as a reference mostly for myself but maybe it will be helpful for you as well as I didn’t find any diagram covering all options. So here we have a vCD network diagram starting from an external network combining all (apart from fenced one) options.

1vCD networks

 

Plus another diagram including ESG as an Org perimiter interconnected with DLR.

2vcd Networks

 

 

Hope it will be informative, if you have any comments or questions, don’t hesitate to write a comment!

NVIDIA P40 – unable to power on a VM

NVIDIA P40 – unable to power on a VM

During recent Horizon deployment with GPUs for a Customer I’ve encounter an issue related to ECC on NVIDIA P40 GPU card. By the way as per documentation it applies to all cards with Pascal architecture.

In NVIDIA documentation you can find an information that apart from installing NVIDIA’s VIB you need to disable ECC. Which is clear for me. However, using nvidia-smi command after installing that VIB the status looks as follows:

nvidia-smi

ECC status is listed as N/A, which I initially treated as disabled. That was wrong.

When I tried to power on a VM with PCI device added I received following error:

Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU “profile_name”

Of course after typing it into google I’ve found this KB which clearly indicates the root cause – ECC!

With that in mind I went back to SSH console and issued following command to make sure it’s disabled:

nvidia-smi -i ID

nvidia-smi disable

After ESXi host restart I was able to power on all VMs 🙂

 

NSX-V VTEP, MAC, ARP Tables content mapping

NSX-V VTEP, MAC, ARP Tables content mapping

It took me a while to figure out what information I see while displaying VTEP, MAC and ARP table on Controller Cluster in NSX. In documentation you can find what information are included in those tables but it might not be seemingly obvious which field contains what kind of data that’s why I decided to make a short reference for myself but maybe it will help also someone else.

To understand those tables I started with Central CLI to display content of each table which was as follows:

tabeleVTEPitp

Now let’s consider what kind of information we’ve got in each table and how they map to particular components in the environment.

VTEP Table – segment to VTEP IP bindings:

VNI – Logical Switch ID based on configured Segment pool

IP – VTEP IP (VMkernel IP) of host on which VM in VNI 6502 is running

Segment – VTEP Segment – in my case that’s only one L3 network which is used

MAC – MAC address of physical NIC configured for VTEP

MAC Table – VM MAC address to VTEP IP (host) mapping:

VNI – Logical Switch ID based on configured Segment pool

MAC – MAC address of VM accessible through VTEP IP displayed in column on the right.

VTEP-IP – IP of a host VTEP on which VM with MAC address from previous column is running.

ARP Table – Virtual Machine MAC to IP mapping:

VNI – Logical Switch ID based on configured Segment pool

IP – IP address of a Virtual Machine connected to that Logical Switch with following VNI

MAC – MAC address of Virtual Machine

 

To make it even easier here you have got a summary diagram with those mappings.

Drawing1

If you want to dig deeper into details how those tables are populated I strongly recommend watching this video from VMworld 2017 which clearly explains it step by step:

VSAN real capacity utilization

VSAN real capacity utilization

There are a few caveats that make the calculation and planning of VSAN capacity tough and gets even harder when you try to map it with real consumption on the VSAN datastore level.

  1. VSAN disks objects are thin provisioned by default.
  2. Configuring full reservation of storage space through Object Space Reservation rule in Storage Policy, does not mean

disk object block will be inflated on a datastore. This only means the space will be reserved and showed as used in VSAN Datastore Capacity pane.

Which makes it even harder to figure out why size of “files” on this datastore is not compliant with other information related to capacity.

  1. In order to plan capacity you need to include overhead of Storage Policies. Policies – as I haven’t met an environment which would use only one for all kinds of workloads. This means that planning should start with dividing workloads for different groups which might require different levels of protections.
  1. Apart from disks objects there are different objects especially SWAP which are not displayed in GUI and can be easily forgotten. However, based on the size of environment they might consume considerable amount of storage space.
  1. VM SWAP object does not adhere to Storage Policy assigned to VM. What does it mean? Even if you configure your VM’s disks with PFTT=0

SWAP will always utilize PFTT=1. Unless you configure advanced option (SwapThickProfivisionedDisabled) to disable it.

I have made a test to check how much space will consume my empty VM. (Empty means here without operating system even)

In order to see that a VM called Prod-01 has been created with 1 GB of memory and 2 GB of Hard disk and default storage policy assigned (PFTT=1)

Based on the Edit Setting window the VM disk size on datastore is 4 GB (Maximum sized based on disk size and policy). However, used storage space is 8 MB which means there will be 2 replicas 4 MB each, which is fine as there is no OS installed at all.

VMka wyłączona

However, when you open datastore files you will see this list with Virtual Disk object you will notice that the size is 36 864 KB which gives us 36 MB. So it’s neither 4 GB nor 8 MB as displayed by edit setting consumption..vsan pliki

Meanwhile datastore provisioned space is listed as 5,07 GB.

vmka dysk 2GB default policy i 1GB RAM - wyłączona

 

So let’s power on that VM.

Now the disks size remain intact, but other files appear as for instance SWAP has been created as well as log and other temporary files.

VSAN VMKa wlaczona

 

Looking at datastore provisioned space now it shows 5,9 GB. Which again is confisung even if we forgot about previous findings powering on VM triggers SWAP creation which according to the theory should be protected with PFTT=1 and be thick provisioned. But if that’s the case then the provisioned storage consumption should be increased by 2 GB not 0,83 (where some space is consumed for logs and other small files included in Home namespace object)

 

vmka dysk 2GB default policy i 1GB RAM - włączona

Moreover during those observations I noticed that during the VM booting process the provisioned space is peaking up to 7,11 GB for a very short period of time

And this value after a few seconds decreases to 5.07 GB. Even after a few reboots those values stays consistent.

vmka dysk 2GB default policy i 1GB RAM - podczas bootowania

The question is why those information are not consistent and what heppens during booting of the VM that is the reason for peak of provisioned space?

That’s the quest for not to figure it out 🙂

 

 

Alternative methods to create virtual switch.

Alternative methods to create virtual switch.

Creating virtual switch through GUI is well described in documentation and pretty intuitive using GUI. However, sometimes it might be useful to know how to do it with CLI or Powershell, thus making the process part of a script to automate initial configuration of ESXi after installation.

Here you will find commands which are necessary to create and configure a standard virtual switch using CLI and Powershell. Those examples will describe the process of vSwitch creation for vMotion traffic which involves VMkernel creation.

I. vSwitch configuration through CLI

  1. Create a vSwitch named “vMotion”

esxcli network vswitch standard add -v vMotion

  1. Check whether your newly created vSwitch was configured and is available on the list.

esxcli network vswitch standard list

  1. Add physical uplink (vmnic) to your vSwitch

esxcli network vswitch standard uplink add -u vmnic4 -v vMotion

  1. Designate an uplink to be used as active.

esxcli network vswitch standard policy failover set -a vmnic4 -v vMotion

  1. Add a port group named “vMotion-PG” to previously created vSwitch

esxcli network vswitch standard portgroup add -v vMotion -p vMotion-PG

  1. Add a VMkernel interface to a port group (Optional – not necessary if you are creating a vSwitch just for VM traffic)

esxcli network ip interface add -p vMotion-PG -i vmk9

  1. Configure IP settings of a VMkernel adapter.

esxcli network ip interface ipv4 set -i vmk9 -t static -I 172.20.14.11 -N 255.255.255.0

  1. Tag VMkernel adapter for a vMotion service. NOTE – service tag is case sensitive.

esxcli network ip interface tag add -i vmk9 -t vmotion

Done, your vSwitch is configured and ready to service vMotion traffic.

 

II. vSwitch configuration through PowerCLI

  1. First thing is to connect to vCenter server.

Connect-VIServer -Server vcsa.vclass.local -User administrator@vsphere.local -Password VMware1!

  1. Indicate specific host and create new virtual switch, assigning vmnic at the same time.

$vswitch1 = New-VirtualSwitch -VMHost sa-esx01.vclass.local -Name vMotion -NIC vmnic4

  1. Create port group and add it to new virtual switch.

New-VirtualPortGroup -VirtualSwitch $vswitch1 -Name vMotion-PG

  1. Create and configure VMkernel adapter.

New-VMHostNetworkAdapter -VMHost sa-esx01.vclass.local -PortGroup vMotion-PG -VirtualSwitch vMotion -IP 172.20.11.11 -SubnetMask 255.255.255.0 -vmotionTrafficEnabled $true

 

vMotion fails to migrate VMs between ESXi host which have the same configuration

vMotion fails to migrate VMs between ESXi host which have the same configuration

As a rule vMotion requires the same family of CPU among involved servers which ensure the same feature set to be presented in order to succeed.

This is obvious statement, if you have for example Intel Xeon V3 and v4 CPU generations in your cluster you need EVC in order to make it work. But recently I have came across and issue that vMotion were failing to migrate VMs between hosts with identical configuration. That were Dells R730 with V3 Intel CPUs to be more precise.

The error message stated as follows:

The target host does not support the virtual machine’s current hardware requirements.
To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.
com.vmware.vim.vmfeature.cpuid.stibp
com.vmware.vim.vmfeature.cpuid.ibrs
com.vmware.vim.vmfeature.cpuid.ibpb

Turn on EVC it says, but wait a minute – EVC for the same CPUs? That sounds ridiculous as far as there were 3 exactly the same hosts in the cluster. To make it more unusual I was not able to mgirate VMs only from host 02 to others but was able to migrate VMs online between 01 and 03 and so on. So it definitely was related to host 02 itself.

So I did additional tests which revealed even more weird behaviour for example:

  • I was able to cold migrate a VM from host02 to 01 and then back from 01 to 02 this time online.
  • I was able to migrate VMs without any issues between 02 and 03.
  • all configuratian, communication and so on were correct/
  • not able to migrate VMs using Shared-Nothing vMotion

But then after a few such attempts I realized then host02 has different build than others, small difference but it was a key thing here.

The build number of host02 was: 7526125, whilst other had 7388607. Not a big deal as far as vCenter had higher build it should not be an issue.

The clue here is that 7526125 is a BN of Spectre/Meltdown fixes which were withdrawn, so there were not installed on the rest of hosts in the cluster resulting in different capability set presented to ESXi which are:

  • “Capability Found: cpuid.IBRS”
  • “Capability Found: cpuid.IBPB”
  • “Capabliity Found: cpuid.STIBP”

There are currently 2 ways to deal with such issue:

  1. Cold migrate your VMs if you need to or simply wait for new patches from VMware.
  2. Reinstall that single host to ensure the same capabilities. That’s the way I have choosen because in my case that server had some additional hardware issues that had to be addressed.

For additional information take a look at:

  • https://kb.vmware.com/s/article/52085
  • https://kb.vmware.com/s/article/52345
  • https://kb.vmware.com/s/article/52245

 

Perennially reservations weird behaviour whilst not configured correctly

Perennially reservations weird behaviour whilst not configured correctly

Whilst using RDM disks in your environment you might notice long (even extremely long) boot time of your ESXi hosts. That’s because ESXi host uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as perennially reserved that is participating in an MSCS cluster. During the start of an ESXi host, the storage mid-layer attempts to discover all devices presented to an ESXi host during the device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the start process to lengthen as the ESXi host cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.

Configuring the device to be perennially reserved is local to each ESXi host, and must be performed on every ESXi host that has visibility to each device participating in an MSCS cluster. This improves the start time for all ESXi hosts that have visibility to the devices.

The process is described in this KB  and is requires to issue following command on each ESXi:

 esxcli storage core device setconfig -d naa.id –perennially-reserved=true

You can check the status using following command:

esxcli storage core device list -d naa.id

In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved.

However, recently I came across on a problem with snapshot consolidation, even storage vMotion was not possible for particular VM.

Whilst checking VM settings one of the disks was locked and indicated that it’s running on a delta disks which means there is a snapshot. However, Snapshot manager didn’t showed any snapshot, at all. Moreover, creating new and delete all snapshot which in most cases solves the consolidation problem didn’t help as well.

Per1

In the vmkernel.log while trying to consolidate VM lots of perenially reservation entries was present. Which initially I ignored because there were RDMs which were intentionally configured as perennially reserved to prevent long ESXi boot.

log

However, after digging deeper and checking a few things, I return to perenially reservations and decided to check what the LUN which generates these warnings is and why it creates these entries especially while trying consolidation or storage vMotion of a VM.

To my surprise I realised that datastore on which the VM’s disks reside is configured as perenially reserved! It was due to a mistake when the PowerCLi script was prepared accidentially someone configured all available LUNs as perenially reserved. Changing the value to false happily solved the problem.

The moral of the story is simple – logs are not issued to be ignored 🙂

vCloud Director 9 – Released!

vCloud Director 9 – Released!

Today new version of VMware vCloud Director for Service Providers was released.

There are plenty of new features and enhancements like:

  • vVols support
  • Increased vCD-vCenter latensy up to 100 ms
  • Multisite feature which lets service providers offer a single port of entry to Tenants having multiple Virtual Data Centers (Org vDC’s) in different instances of vCD
  • Ability to manage routing between two or mogr Org vDC Networks with NSX DLR
  • PostgreSQL database support as an externam database

There are a few more as well as a list of known issues resolved.

Release notes for the product can be found here.

Complete list of new features and enhancements could be found here.

VMUG VIRTUAL EMEA 2017 – 28 September

VMUG VIRTUAL EMEA 2017 – 28 September

Tomorrow starts VMUG Virtual EMEA 2017 – it is a great oppportunity for all of those who missed VMworld or was not able to participate in-person or even online. It is a huge oportunity to learn about newest technology from VMware and supporting companies, play around with dedicated Hans-on labs and so on.

You can register for the event here.

As of VMUG website definition it is a FREE day-long event is meant to empower you through education, training, and collaboration – all with the goal of improving your projects and impacting your career.

 

I highly recommend to attend it 🙂

Configuring the Dukes Bank Sample Application Blueprint

Configuring the Dukes Bank Sample Application Blueprint

In the previous part importing steps of Dukes Bank Sample Application Blueprint were described. Now it’s time to perform additional configurations steps to makes it works. (If you thought that you will be able to request sample three-tier app out of the box after you import it, you were wrong! Do not worry I overinterpreted it also when first seeing it during a training long time ago ;))

But going back to vRA Dukes Bank App – after successful import you have to configure the blueprint.

First of all you must prepare Centos template for the blueprint. There are following prerequesities:

  1. Install Guest Agent.
    • Guest agent can be downloaded from https://your_vra_FQDN:5480/software. You can download it on your mgmt station and then transfer to template machine or directly from template using following command:  #wget –no-check-certificate https://your_vra_FQDN:5480/software/download/prepare_vra_template.sh . After that it have to be made executable e.g # chmod u+x prepare_vra_template.sh and simply run it. A few information must be provided1gugent2gugent3gugent
    • SeLinux feature have to be disabled. without disabling it you can expect following error during deployment.selinux2To disable SELinux rom the command line, you can edit the /etc/sysconfig/selinux file. This file is a symlink to /etc/selinux/config. Changing the value of SELINUX or SELINUXTYPE changes the state of SELinux and the name of the policy to be used the next time the system boots. Simply change it to disabled and save settings.[root@host2a ~]# cat /etc/sysconfig/selinux
      # This file controls the state of SELinux on the system.
      # SELINUX= can take one of these three values:
      # enforcing – SELinux security policy is enforced.
      # permissive – SELinux prints warnings instead of enforcing.
      # disabled – SELinux is fully disabled.
      SELINUX=permissive
      # SELINUXTYPE= type of policy in use. Possible values are:
      # targeted – Only targeted network daemons are protected.
      # strict – Full SELinux protection.
      SELINUXTYPE=targeted
  2.  When your tepmlate is up and ready you have to make additional changes in blueprint. ( Do not forget to run data collection to see current state of your template/snapshot)
    • Modify the blueprint machine specs for each node:
      •Template Name / Customization Spec
      •Reservation Policy
      •Machine Prefix
      •Edit the property http_node_ips in Apache Load Balancer and Binding = Yes
    • In case you use DHCP address allocation you must add a dependency from Load Balancer Node machine to App Server node. Simply put an arrow to connect them.

 

That’s it now you are ready to resuest and test your sample Dukes Bank Application.