One of the members of the VMware User Community (VMTN) inspired me to build a configuration where two VMs use PVRDMA network adapters to communicate. The goal I wanted to achieve was to establish the communication between VMs
without using Host Channel Adapter cards installed in hosts. It’s possible to configure it as stated here, in the VMware vSphere documentation.
For virtual machines on the same ESXi hosts or virtual machines using the TCP-based fallback, the HCA is not required.
To do this task, I prepared one ESXi host (6.7U1) managed by vCSA (6.7U1). One of the requirements for RDMA is a vDS. First, I configured a dedicated vDS for RDMA communication. I simply set a basic vDS configuration (DSwitch-DVUplinks-34) with a default portgroup (DPortGroup) and, then, equipped it with just one uplink.
In vSphere, a virtual machine can use a PVRDMA network adapter to communicate with other virtual machines that have PVRDMA devices. The virtual machines must be connected to the same vSphere Distributed Switch.
Second, I created a VMkernel port (vmk1) dedicated to RDMA traffic in this DPortGroup without assigning an IP address to this port (No IPv4 settings).
Third, I set Advanced System Setting Net.PVRDMAVmknic on ESXi host and gave it a value pointing to VMkernel port (vmk1). Tag a VMkernel Adapter for PVRDMA.
Then, I enabled “pvrdma” firewall rule on host in Edit Security Profile window. Enable the Firewall Rule for PVRDMA.
The next steps are related to configuration of the VMs. First, I created a new virtual machine. Then, I added another Network adapter to it and connected it to DPortGroup on vDS. For the Adapter Type of this Network adapter I chose PVRDMA and Device Protocol RoCE v2. Assign a PVRDMA Adapter to a Virtual Machine.
Then, I installed Fedora 29 on a first VM. I chose it because there are many tools to easily test a communication using RDMA. After the OS installation, another network interface showed up on the VM. I addressed it in a different IP subnet. I used two network interfaces in VMs, the first one to have an access through SSH and the second one to test RDMA communication.
Then I set “Reserve all guest memory (All locked)” in VM’s Edit Settings window.
I had two VMs configured enough – in the infrastructure layer – to communicate using RDMA.
To do it I had to install appropriate tools. I found them on GitHub, here.
To use them, I had to install them first. I did it using the procedure described on the previously mentioned page.
dnf install cmake gcc libnl3-devel libudev-devel pkgconfig valgrind-devel ninja-build python3-devel python3-Cython
Next, I installed the git client using the following command.
yum install git
Then I cloned the git project to a local directory.
mkdir /home/rdma git clone https://github.com/linux-rdma/rdma-core.git /home/rdma
I built it.
cd /home/rdma bash build.sh
Afterwards, I cloned a VM to have a communication partner for the first one. After cloning, I reconfigured an appropriate IP address in the cloned VM.
Finally I could test the communication using RDMA.
On the VM that functioned as a server I ran a listener service on the interface mapped to PVRDMA virtual adapter:
cd /home/rdma/build/bin ./rping -s -a 192.168.0.200 -P
I ran this command, which allowed me to connect the client VM to the server VM:
./rping -c -I 192.168.0.100 -a 192.168.0.200 -v
It was working beautifully!
It took me a while to figure out what information I see while displaying VTEP, MAC and ARP table on Controller Cluster in NSX. In documentation you can find what information are included in those tables but it might not be seemingly obvious which field contains what kind of data that’s why I decided to make a short reference for myself but maybe it will help also someone else.
To understand those tables I started with Central CLI to display content of each table which was as follows:
Now let’s consider what kind of information we’ve got in each table and how they map to particular components in the environment.
VTEP Table – segment to VTEP IP bindings:
VNI – Logical Switch ID based on configured Segment pool
IP – VTEP IP (VMkernel IP) of host on which VM in VNI 6502 is running
Segment – VTEP Segment – in my case that’s only one L3 network which is used
MAC – MAC address of physical NIC configured for VTEP
MAC Table – VM MAC address to VTEP IP (host) mapping:
VNI – Logical Switch ID based on configured Segment pool
MAC – MAC address of VM accessible through VTEP IP displayed in column on the right.
VTEP-IP – IP of a host VTEP on which VM with MAC address from previous column is running.
ARP Table – Virtual Machine MAC to IP mapping:
VNI – Logical Switch ID based on configured Segment pool
IP – IP address of a Virtual Machine connected to that Logical Switch with following VNI
MAC – MAC address of Virtual Machine
To make it even easier here you have got a summary diagram with those mappings.
If you want to dig deeper into details how those tables are populated I strongly recommend watching this video from VMworld 2017 which clearly explains it step by step:
In my last post about Infinio Accelerator we introduced product and basics about it. Now it is time to go more deep – how this server side cache is working ?
Infinio’s cache inserts server RAM (and optionally, flash devices) transparently into the I/O stream. By dynamically populating server-side media with the hottest data, Infinio’s software reduces storage requirements to a small fraction of the workload size. Infinio is built on VMware’s vSphere APIs for I/O Filtering (VAIO) framework. This enables administrators to use VMware’s Storage Policy Based Management to apply Infinio’s storage acceleration filter to VMs, VMDKs, or groups of VMs transparently.
An Infinio cluster seamlessly supports typical cluster-wide VMware operations, such as vMotion, HA, and DRS. Introduction of Infinio doesn’t require any changes to the environment. Datastore configuration, snapshot and replication setup, backup scripts, and integration with VMware features like VAAI and vMotion all remain the same.
Infinio’s core engine is a content-based memory cache that scales out to accommodate expanding workloads and additional nodes. Deduplication enables the memory-first design, which can be complemented with flash devices for large working sets. In tiered configuration such as this, the cache is persistent, enabling fast warming after either planned or unplanned downtime.
Note. Infinio’s transparent server-side cache doesn’t require any changes to the environment !
Lets go with installation – is easy and entirely non-disruptive with no reboots or downtime. It can be completed in just a few steps via an automated installation wizard. The installation wizard collects vCenter credentials and location, and desired Management Console information, then automatically deploys the console :
- Run infinio setup and agree to license terms
2. Add vcenter FQDN and user credentials (in example we go with sso admin)
3. Select destination esxi and other parameters to deploy ovf management console vm (datastore and network)
- Set management console hostname and network information (IP address, DNS)
- Create admin user for management console
- setup auto-support (in our trial scenario we skip this step)
- Preview config and deploy management console.
- Login to management console
In the next article we will provide some real performance result form our lab tests – so stay tuned 🙂
Shared storage performance and characteristics (iops,latency) is crucial for overall vSphere platform performance and users satisfaction. In the advent of ssd and memory cache solutions we have many options to chose in case storage acceleration (local ssd, array side ssd , server side ssd). Lets discuse further server side caching – act of caching data on the server.
Data can be cached anywhere and at any point on the server that makes sense. It is common to cache commonly used data from the DB to prevent hitting the DB every time the data is required. We cache the results from competition scores since the operation is expensive in terms of both processor and database usage. It is also common to cache pages or page fragments so that they don’t need to be generated for every visitor.
In this article I would like to introduce one of the commercial server side caching solution from INFINIO – Infinio Accelerator 3.
Infinio Accelerator increases IOPS and decreases latency by caching a copy of the hottest data on serverside resources such as RAM and flash devices. Native inline deduplication ensures that all local storage resources are used as efficiently as possible,reducing the cost of performance. Infinio is built on VMware’s VAIO (vSphere APIs for I/O Filters) framework,which is the fastest and most secure way to intercept I/O coming from a virtual machine. Its benefits can be realized on any storage that VMware supports; in addition, integration with VMware features like DRS, SDRS, VAAI and vMotionall continue to function the same way once Infinio is installed. Finally, future storage innovation that VMware releases will be available immediately through I/O Filter integration.
The I/O Filter is the most direct path to storage for capabilities like caching and replication that need to intercept the data path. (Image courtesy of VMware)
Infinio is licensed per ESXi host in an Infinio cluster. Software may be purchased for perpetual or term use:
- A perpetual license allows the use of the licensed software indefinitely with an annual cost for support and maintenance.
- A term license allows the use of software for one year, including support and maintenance.
For more information on licensing and pricing, contact email@example.com.
Infinio Accelerator requires min. VMware vSphere ESXi 6 U2 (Standard, Enterprise,or Enterprise Plus) and VMware vCenter 6 U2.
Note! vSphere 6.5 is supported and on VMware HCL !
Infinio works with any VMware supported datastore, including a variety of SAN, NAS, and DAS hardware supporting VMFS, Virtual Volumes (VVOLs), and Virtual SAN (vSAN).
- Infinio’s cluster size mirrors that of VMware vSphere’s, scaling out to 64 nodes.
- Infinio’s Management Console VM requires 1 vCPU, 8GB RAM, and 80GB of HDD space.
I’m very happy to announce that we received very friendly response from Infinio support and we get an option to download trial version of software – next articles will describe product in more depth and show “real life” examples of use in our lab environment.
Please, stay tuned 🙂
The architecture of auto deploy has changed in vSphere 6.5, one of the main difference is the ImageBuilder build in vCenter and the fact that you can create image profiles through the GUI instead of PowerCLI. That is really good news for those how is not keen on PowerCLI. But let’s go throgh the new configuration process of Auto Deploy. Below I gathered all the necessary steps to configure Auto Deploy in your environment.
- Enable Auto Deploy services on vCenter Server. Move to Administration -> System Configuration -> Related Objects, look for and start fallowing services:
- Auto Deploy
- ImageBuilder Service
You can change the startup type to start them with the vCenter server automatically as well.
To start them, log in to vCenter Server through SSH and use fallowing commands:
#service-control – -status // to verify the status of these services
#service-control – -start vmonapi vmware-sca //to start services
Next, go back to Web Client and refresh the page.
- Prepare the DHCP server and configure DHCP scope including default gateway. A Dynamic Host Configuration Protocol (DHCP) scope is the consecutive range of possible IP addresses that the DHCP server can lease to clients on a subnet. Scopes typically define a single physical subnet on your network to which DHCP services are offered. Scopes are the primary way for the DHCP server to manage distribution and assignment of IP addresses and any related configuration parameters to DHCP clients on the network.
When basic DHCP scope settings are ready, you need to configure additional options:
- Option 066 – with the Boot Server Host Name
- Option 067 – with the Bootfile Name (it is a file name observed at Auto Deploy Configuration tab on vCenter Server – kpxe.vmw-hardwired)
- Configure TFTP server. For lab purposes I nearly always using the SolarWinds TFTP server, it is very easy to manage. You need to copy the TFTP Boot Zip files available at Auto Deploy Configuration page observed in step 2 to TFTP server file folder and start the TFTP service.
At this stage when you are try to boot you fresh server should get the IP Address and connect to TFTP server. In the Discovered Hosts tab of Auto Deploy Configuration you will be able to see these host which received IP addresses and some information from TFTP server, but no Deploy Rule has been assigned to them.
- Create an Image Profile.
Go to Auto Deploy Configuration page -> Software Depots tab and Import Software Depot
Click on Image Profiles so see the Image Profiles that are defined in this Software Depot.
The ESXi software depot contains the image profiles and software packages (VIBs) that are used to run ESXi. An image profile is a list of VIBs.
Image profiles define the set of VIBs to boot ESXi hosts with. VMware and VMware partners make image profiles and VIBs available in public depots. Use the Image Builder PowerCLI to examine the depot and the Auto Deploy rule engine to specify which image profile to assign to which host. VMware customers can create a custom image profile based on the public image profiles and VIBs in the depot and apply that image profile to the host.
- Add Software Depot.
Click on Add Software Depot icon and add custom depot.
Next point in the newly created custom software depot select Image Profiles and click New Image Profile.
I selected the minimum required VIBs to boot ESXi host which are:
- esx-base 6.5.0-0.0.4073352 VMware ESXi is a thin hypervisor integrated into server hardware.
- misc-drivers 6.5.0-0.0.4073352 This package contains miscellaneous vmklinux drivers
- net-vmxnet3 126.96.36.199-3vmw.6188.8.131.5273352 VMware vmxnet3
- scsi-mptspi 4.23.01.00-10vmw.6184.108.40.20673352 LSI Logic Fusion MPT SPI driver
- shim-vmklinux-9-2-2-0 6.5.0-0.0.4073352 Package for driver vmklinux_9_2_2_0
- shim-vmklinux-9-2-3-0 6.5.0-0.0.4073352 Package for driver vmklinux_9_2_3_0
- vmkplexer-vmkplexer 6.5.0-0.0.4073352 Package for driver vmkplexer
- vsan 6.5.0-0.0.4073352 VSAN for ESXi.
- vsanhealth 6.5.0-0.0.4073352 VSAN Health for ESXi.
- ehci-ehci-hcd 1.0-3vmw.6220.127.116.1173352 USB 2.0 ehci host driver
- xhci-xhci 1.0-3vmw.618.104.22.16873352 USB 3.0 xhci host driver
- usbcore-usb 1.0-3vmw.622.214.171.12473352 USB core driver
- vmkusb 0.1-1vmw.6126.96.36.19973352 USB Native Driver for VMware
But the list could be different for you.
- Create a Deploy Rule.
- Activate Deploy Rule
- That’s it, now you can restart you host, it should boot and install according to your configuration now.
According to VMware definitione vSphere Auto Deploy can provision hundreds of physical hosts with ESXi software. You can specify the image to deploy and the hosts to provision with the image. Optionally, you can specify host profiles to apply to the hosts, a vCenter Server location (datacenter, folder or cluster), and assign a script bundle for each host. In short that is the tool to automate your ESXi deployment or upgrade.
As far as I know in particular on the Polish market it is not a widely used tool. However, it can be helpful for Integrator’s Companies to improve and make far more faster deployment of new environments. Furthermore, VMware claims the scripted or automated deployments should be used for every deployment with 5 or more hosts. Nonetheless, even if you are woring as a System Engineer or at other implementation position I believe you are not installing new deployments every week..If that is every month – lucky you.
Well, is it really worth to prepare the AutoDeploy environment to deploy for instance 8 new hosts? – It depends.
IMHO, for such small deployments if you are really keen on making it a little bit fater the better way is to use kickstarts scripts. It can be much faster, expecially in case you are using them at least from time to time and you have prepared a good template (According the vSphere 6.5 I’m changing my mind a little bit due to changes which make AutoDpeloy preparation far more quicker)
However, Auto Deploy that’s not only deployment. It can be a kind of environment and change management. That can only be a specific kind of infrastructure where you use AutoDeploy to boot ESXi hosts instead of booting from local hard drives/SD cards.
Nevertheless, in Polands it is easier to meet classic PXE deployment booting from SAN than AutoDeploy. Is it the same trend seen around the world?
I am looking forward to hearing from you about yours experience with Auto Deploy.
We have received a few questions about our lab which is rather extraordinary 🙂 Some of you wanted us to publish a picture of it. Unfortunatelly, I’ve got only the old one (nowadays cables are better organised so it looks far more better). I’m sorry for the quality of the picture as well.
Anyway, at this moment we are in implementation phase – the management cluster is going to be expanded to four host cluster. We are planning to implement NSX in physical environment to expand our basic knowledge about the pruduct. Unfortunatelly, these kinds of toys for big boys aren’t cheap and we are looking for some cut-prices or good offers for refubrished components. However CPU is already waiting so it shouldn’t take much time.
When the upgrade of the environment will be finished, I’ll post the new picture of the whole lab 🙂
Taking the advantage of the occassion coming with the last day of 2016, I wish you a Happy New your and remarkable party! It’s high time to begin preparations 😉
It is possible to learn especially about VMware products using just books, official trainings, blogs, etc. However, we believe that the real knowledge is available only by practice and not all could be tested or verified using production environments 🙂
And again, you can test a lot just using Workstation on your notebook (providing it is powerful enough) but these days there are more and more virtual infrastructure component which requires a lot of resources. Furthermore, having real servers and storage array is also a little bit different than deploying a few small virtual machines running on a notebook.
That is why a few years ago we decided to join forces and build the real laboratory where we are able to test even the most sophisticated deployments not only with VMware products without being constraint by the resources.
The main hardware components of our lab infrastructure are included in the table below.
|ServerFujitsu TX200 S7||2||2x CPU E5-4220, 128 GB RAM||Payload Cluster|
|Server Fujitsu TX100 S1||2||Router/Firewall and Backup|
|Server Fujitsu TX100 S3||3||1x CPU E3-1240, 32 GB RAM||Management Cluster|
|NAS Synology DS2413+||1||12 x 1 TB SATA 7,2K||Gold Storage|
|NAS Synology RS3617+||1||12 x 600 GB SAS 15K||Silver Storage|
|NAS QNAP T410||1||4 x 1TB SATA 5,4K||Bronze Storage (ISO)|
|Switch HPE 1910||1||48x 1 Gbps||Connectivity|
Of course we didn’t buy it at once. The environment evaluates with increasing needs. ( In the near future we are going to expand management cluster with 4 host and deploy NSX).
The logical topology looks like this:
Despite the fact that most of our servers use tower cases, we installed them in a self made 42U Rack. Unfortunatelly, especially during the summer it could not go without air conditoning (this is one of the most power consuming part of the lab..)
Later, either me or Daniel will describe the software layer of our Lab. I hope, it will give an inspiration to anyone who is thinking about own lab.
Before we start talking about NetFlow configuration on VMware vSphere let’s back to basics and review protocol itself. NetFlow was originally developed by Cisco and has become a reasonably standard mechanism to perform network analysis. NetFlow collect network traffic statistics on designated interfaces. Commonly used in the physical world to help gain visibility into traffic and understanding just who is sending what and to where.
NetFlow comes in a variety of versions, from v1 to v10. VMware uses the IPFIX version
of NetFlow, which is version 10. Each NetFlow monitoring environment need to have exporter ( device carrying netflow flow’s) , collector (main component ) and of course some network to monitor and analyze 😉
Below You can see basic environment diagram:
We can describe flow as tcp/ip packets sequence (without direction) that have common:
- Input interface
- Source IP
- Destination IP
- TCP/IP Protocol
- Source Port (TCP/UDP)
- Destination Port (TCP/UDP)
- ToS IP
Note. vSphere 5.0 uses NetFlow version 5, while vSphere 5.1 and beyond uses IPFIX (version 10).
Ok, we know that distributed virtual is needed to configure NetFlow on vSphere but what about main component NetFlow collector – as usual we have couple options that we can simply divide in commercial software with fancy graphical interfaces and open source staff for admins that still like old good cli 😉
Below I will show simple implementation steps describing examples from both approach :
Manage engine NetFlow analyzer v12.2, more about software on https://www.manageengine.com/products/netflow/?gclid=CP3HlJbyv9ACFSQz0wod_UcDCw my lab VM setup:
- Guest OS:Windows 2008R2
- 4GB RAM
- 60 GB HDD
- vNIC interface connected to ESXi management network
Installation (using embedded database just for demo purpose) is really simple and straight forward. Let’s start from starting the installer:
- accept license agreements
- choose installation folder on vm hdd
- choose installation component option – for this demo purpose we go with simple environment with only one collector server, central reporting is not necessary
- choose web server and collector services TCP/IP ports
- provide communication details – again in this demo we have all components on one server and we can simply go with localhost
- optional – configuration proxy server details
- select database – on this demo i used embedded Postgresql , but if You choose MS database remember about ODBC config.
- installation is quite fast – couple more minutes and solution will be ready and available to start work:
… Web client like in VMware need couple CPU cycles to start 😉
.. and finally we can see fancy ManageEngine NetFlow collector
II) Open-Source netdump tool – nfdump is distributed under the BSD license, and can be downloaded at: http://sourceforge.net/projects/nfdump/ my lab VM steup:
- GOS: Debian 8.6
- 4GB RAM
- 60 HDD
- vNIC interface connected to ESXi management network
- We need to start from adding some sources to our debian distribution:
- CLI Installation nfdump packet:
- Run simple flow capture to verify if collector is running and creating output flow statictics files (you can see that i use same tcp port 9995 and folder on my desktop as output destination):
Ok, now it is time to back to vSphere and configure DVS to send network traffic to collector:
- IP Address: This is the IP of the NetFlow Collector
- Port: This is the port used by the NetFlow Collector.
- Switch IP Address: This one can be confusing – by assigning an IP address of here, the NetFlow Collector will treat the VDS as one single entity. It does not need to be a valid, routable IP, but is merely used as an identifier.
- Active flow export timeout in seconds: The amount of time that must pass before
- the switch fragments the flow and ships it off to the collector.
- Idle flow export timeout in seconds: Similar to the active flow timeout, but for flows
- that have entered an idle state.
- Sampling rate: This determines the interval packet to collect. By default, the value is 0,
- meaning to collect all packets. If you set the value to something other than 0, it will
- collect every X packet.
- Process internal flows only: Enabling ensures that the only flows collected are ones that occur between VMs on the same host.
And enable it at designated port group level:
Finally we can create simple lab scenario and capture some ftp flow statistics between two vm’s on different ESXi :
VM’s are running in dedicated vlan on the same DVS port group, collector is running on management network to communicate with vCenter and ESXi hosts. I used ftp connection to generate traffic between vm’s below examples output from two collectors (test ran separate as collector share the same ip) :
ftp client on first vm:
ftp server on second vm:
flow statistics example from netdump:
flow statistics from ManageEngine