Browsed by
Tag: VMware

vRSLCM deployment is not available

vRSLCM deployment is not available

During one of the recent VCF 3.9.1 deployments I’ve found interesting challenge while trying to deploy vRealize Suite components.

Initially integrated with MyVMware and downloaded a vRealize Suite Lifecycle Manager version 2.1 (the only one avaialble to download).

Without paying much attention to the Build Number, which was part of the problem. Even though my vRSLCM was downloaded and properly listed in download history in vRealize Suite tab I was still getting an information that my vRSLCM is not available as follows:

First suspicion went to the build inconsistency, but after doublechecking the Repository there was no vRSLCM with BN 14062628 available at all. BN: 16154511 is a newer build then I thought maybe it was overridden and decided to remove the one I have and check whether it will appear.

In order to remove a bundle from existing repo you need to use cleanup_bundle.py script available on SDDC-Manager – procedure will be described later on in this blog post.

However, it didn’t help – still the only bundle available was pointing to BN 16154511. Then accidentally I bumped into a KB: https://kb.vmware.com/s/article/76869 which doesn’t seem to be related however it describes the procedure how to application-prod.properties file which contains the expected vRSLCM expected version to another one.

Using the procedure I changed the version to include BN 16154511, restarted LCM service and apparently it fixed the problem!

Hope it’s going to help not only me 🙂

Cloud builder 4.2 – 502 bad gateway

Cloud builder 4.2 – 502 bad gateway

Cloud Builder in VCF 4.2 has enhanced password requirements.

Thus if you deploy the appliance with a password which doesn’t match those requirements it’s going to be deployed with an error, making it unusable.

While trying to access the GUI you will get “502 Bad Gateway” which is not so meaningful.


Trying to access SSH will fail as well. However, accessing the direct console gives a clear information about the problem.

Based on that there is nothing left than redeploying the appliance.

VCDX BoM

VCDX BoM

As part of my VCDX preparation I’ve read lots of books, blogs and other documents to expand the knowledge and fill the gaps. That was partially one of the most valuable parts of the journey where I learned a lot. Without that I wouldn’t probably even read half of those great books.

I can’t say this is something that every candidate must read, but I decided to share this list as a VCDX BoM as I found it valuable for my VCDX-DCV prep.
At some point I used the list to define priorities and also track the timing but lost count at some point 🙂
The list presented here is in pretty much random order.

  • IT Architect: Foundation In the Art of Infrastructure Design: A Practical Guide For IT Architects
  • IT Architect: Designing Risk in IT Infrastructure
  • IT Architect: The Journey
  • IT Architect: Stories from the field vol.1
  • VCDX Bootcamp by John Arrasjid, Ben Lin, Mostafa Khalil
  • Storage Design and Implementation in vSphere 6: A Technology Deep Dive, 2nd edition
  • VMware vCloud Architecture Toolkit 3.1
  • VMware vCloud Architecture Toolkit 5.0
  • Clustering Deep Dive
  • vSAN 6.7 U1 Deep Dive
  • vSphere 6.5 Host Resources Deep Dive
  • Mastering VMware vSphere 6 by Nick Marshall
  • Mastering VMware vSphere 6.5
  • VMware vSphere Design 2nd edition by Forbes Guthrie and Scott Lowe
  • Networking for VMware Administrators by Christopher Wahl, Steven Pantol
  • Managing and Optimizing VMware vSphere Deployments by Harley Stagner, Sean Crookston
  • vSphere High Availability deep dive 6.0 by Duncan Epping
  • VCAP5-DCD Official Cert Guide by Paul McSharry
  • Site Realiability Engineering – O’Reilly
  • Disaster Recovery planning – preparing for the unthinkable by Jon Willtiam Toigo
  • Enterprise Systems Backup and Recovery: A Corporate Insurance Policy by Preston de Guise
  • Designing a Storage Performance Platform by Frank Denneman, Todd Mace, Tom Queen
  • Virtualizing SQL Server with VMware – Doing IT Rght by Michael Corey, Jeff Szastak, Michael Webster
  • Mission critical applications on VMware PDFs – Oracle, SAP HANA, SQL, Exchange, etc.

I also attended tremendous amount of training, here are a few that I studied ebooks for quite extensively.

  • vSphere Optimize & Scale 6.7 & 7.0
  • vSphere Design Workshop 6.5 & 7.0
  • Optimize and Scale 6.5
  • SRM, Install Configure Manage 8.2
  • NSX-T ICM 3.0

A few useful genereal links:

  • https://thesaffageek.co.uk/2017/08/21/vcdx-design-scenario-tips/
  • https://en.wikipedia.org/wiki/Service-level_agreement
  • https://en.wikipedia.org/wiki/Data_availability
  • https://en.wikipedia.org/wiki/High_availability
  • http://searchitchannel.techtarget.com/definition/service-level-agreement
  • http://searchdisasterrecovery.techtarget.com/Free-service-level-agreement-template-for-disaster-recovery-programs
  • https://landing.google.com/sre/book/chapters/service-level-objectives.html
  • https://uptimeinstitute.com/tier-certification/design

And can’t forget about Rene’s spectacular list of resources:

  • https://vcdx133.com/2015/01/27/vcdx-series/

If you have any other books that are worth to add to the list, please put them in comments!

A wrap-up of my own VCDX Jurney #291

A wrap-up of my own VCDX Jurney #291

This is a post I was about to write already 2 months ago as a summary of 2020 as it’s imporant to wrap-up a long time I spent preparing for VCDX and also say Thank You. However, due to other tasks, engagements, etc. I just managed to complete it right now!

So first things first – finally, after a very long and extremely informative journey at 15 Dec 2020, I finally received probably the best e-mail ever.

Hi Pawel,
Congratulations! You passed! It gives me great pleasure to welcome you to the VMware Certified Design Expert community.
Your VCDX number is 291.

Initially I was about to write a long story here about my VCDX journey, but then I decided to just make a few points that can be treated as advice, that hopefully might help someone.

  1.  Don’t blindly trust everything you read on the internet about the VCDX process -> it’s like an assumption and if you already started the process already then you know what’s wrong with assumptions 😉
    -> E.g. for me back in 2017 after passing both VCAPs when I decided to set the next goal to be a VCDX.
    Along with my colleague we just completed a very good project which at that time seemed to be a perfect fit for the VCDX. I’ve read also some stories about VCDX jurneys taking 150-200 hours in total to prepare for that exam. That was almost like nothing after dozens of overhours spent with the project we just completed to get it on time. So the decision was made 200 hours is not much after all. So we took the docs we prepared for the Customer, translated them into english, submitted and apparently we were invited for the defense during which we immediatelly and brutally realized we are not even close to make the cut.
  2.  Don’t try to be a hidden hero and brute-force doesn’t work here.
    -> At the beggining I completely ignored the community aspect. I wanted to make it on my own and then just shine with the number. I didn’t participate even a single mock session, didn’t have a mentor, nothing. I’ve read lots of blog posts stressing the importance of it, but I just ignored it. Everything I knew about the defence was from blogs without the real experience with any mocks. I did work with customers at that time so I belived it’s enough. It was wrong, though. If you plan to go that way – just don’t. VCDX community is amazing, you don’t need to be an expert in each area. The community can help you – same with peers at work. If there is someone who can explain you some networking, storage, backup or other aspects – just take that ask for help. There is no need to re-invent the wheel! And this is still an exam which has a formula, that you must be aware of.
  3.  Be realistic and honest with yourself about the time and schedule.
    -> The truth is that along with my peer we went for the first defence completely not prepared. It was back in 2017 and our defence was scheduled just 2 days after my own weeding. It was definitely wrong. Consequently, we started working on our slide deck for the defense in the evening before the defense in the hotel room after landing in UK. It was just unnecessary waste of money, our time as well as panelist time, not good for life balance too!

It was a long jurney and I could find much more examples like those but in my opinion that were the key mistakes I did.
However, there is always a positive side. In reality thanks to those failures ( I know it’s a cliche 🙂 ) it was a huge learning curve for me, that help me not only to pass the exam eventually but most importantly learn the methodology and how to be methodological, soft skills and be comfortable in uncomfortable situations. That’s something I can use right now on the daily basis while working with customers.
kudos

I also owe a public kudos to everyone who I met and who directly or indirectly helped me during this long jurney! (Following list is in alphabetical order based on the first name :)).

Abdullah Abdullah
Asaf Blubshtein
Bilal Ahmed
Chris Noon
Chris Porter
Daniel Zuthof
Fouad El Akkad
Gerard Murphy
Gregg Robertson
Igor Zecevic
Inder S
Jason Grierson
Faisal Rahman
Miroslaw Karwasz
Paul Cradduck
Paul McSharry (before becoming a panelist)
Paul Meehan
Pawel Omiotek
Phoebe Kim
Rene van den Bedem
Shady ElMalatawey
Szymon Ziółkowski
Wesley Geelhoed

Everone who attended my mocks to provide feedback and I forgot to mention above!

+ All panelist of course 🙂

Hope I didn’t miss anyone! If I did – sorry it wasn’t intentional I do appreciated all help and hints that helped be expand the knowledge and develop.

vCloud Director Network considerations

vCloud Director Network considerations

One of the most tricky parts of vCD – networks. It took my some time to to digest how those network relations between different types of network in vCD works. Just to remind we distinguish:

  • External Networks
  • VDC Organization Networks
  • vApp Networks

Moreover for both VDC Orgzanization and vApp networks we distinguish folowing types:

  • Directly connected to upper layer network
  • Routed network
  • Isolated Network

To complicate even further vApp directly connected network can be fenced 🙂

All networks apart from directly connected will create an ESG (yes, even isolated network requires an ESG!). Just don’t be fooled during some test that they are not visible in vSphere  as soon as you create new vApp/Org VDC Network. ESG as well as port group on DVS will be established not at the time of vCD network creation but when you connect and power a VM to this network for the first time.

To understand how we can mix and match these networks I’ve created a diagram as a reference mostly for myself but maybe it will be helpful for you as well as I didn’t find any diagram covering all options. So here we have a vCD network diagram starting from an external network combining all (apart from fenced one) options.

1vCD networks

 

Plus another diagram including ESG as an Org perimiter interconnected with DLR.

2vcd Networks

 

 

Hope it will be informative, if you have any comments or questions, don’t hesitate to write a comment!

NVIDIA P40 – unable to power on a VM

NVIDIA P40 – unable to power on a VM

During recent Horizon deployment with GPUs for a Customer I’ve encounter an issue related to ECC on NVIDIA P40 GPU card. By the way as per documentation it applies to all cards with Pascal architecture.

In NVIDIA documentation you can find an information that apart from installing NVIDIA’s VIB you need to disable ECC. Which is clear for me. However, using nvidia-smi command after installing that VIB the status looks as follows:

nvidia-smi

ECC status is listed as N/A, which I initially treated as disabled. That was wrong.

When I tried to power on a VM with PCI device added I received following error:

Could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vGPU “profile_name”

Of course after typing it into google I’ve found this KB which clearly indicates the root cause – ECC!

With that in mind I went back to SSH console and issued following command to make sure it’s disabled:

nvidia-smi -i ID

nvidia-smi disable

After ESXi host restart I was able to power on all VMs 🙂

 

NSX-V VTEP, MAC, ARP Tables content mapping

NSX-V VTEP, MAC, ARP Tables content mapping

It took me a while to figure out what information I see while displaying VTEP, MAC and ARP table on Controller Cluster in NSX. In documentation you can find what information are included in those tables but it might not be seemingly obvious which field contains what kind of data that’s why I decided to make a short reference for myself but maybe it will help also someone else.

To understand those tables I started with Central CLI to display content of each table which was as follows:

tabeleVTEPitp

Now let’s consider what kind of information we’ve got in each table and how they map to particular components in the environment.

VTEP Table – segment to VTEP IP bindings:

VNI – Logical Switch ID based on configured Segment pool

IP – VTEP IP (VMkernel IP) of host on which VM in VNI 6502 is running

Segment – VTEP Segment – in my case that’s only one L3 network which is used

MAC – MAC address of physical NIC configured for VTEP

MAC Table – VM MAC address to VTEP IP (host) mapping:

VNI – Logical Switch ID based on configured Segment pool

MAC – MAC address of VM accessible through VTEP IP displayed in column on the right.

VTEP-IP – IP of a host VTEP on which VM with MAC address from previous column is running.

ARP Table – Virtual Machine MAC to IP mapping:

VNI – Logical Switch ID based on configured Segment pool

IP – IP address of a Virtual Machine connected to that Logical Switch with following VNI

MAC – MAC address of Virtual Machine

 

To make it even easier here you have got a summary diagram with those mappings.

Drawing1

If you want to dig deeper into details how those tables are populated I strongly recommend watching this video from VMworld 2017 which clearly explains it step by step:

VSAN real capacity utilization

VSAN real capacity utilization

There are a few caveats that make the calculation and planning of VSAN capacity tough and gets even harder when you try to map it with real consumption on the VSAN datastore level.

  1. VSAN disks objects are thin provisioned by default.
  2. Configuring full reservation of storage space through Object Space Reservation rule in Storage Policy, does not mean

disk object block will be inflated on a datastore. This only means the space will be reserved and showed as used in VSAN Datastore Capacity pane.

Which makes it even harder to figure out why size of “files” on this datastore is not compliant with other information related to capacity.

  1. In order to plan capacity you need to include overhead of Storage Policies. Policies – as I haven’t met an environment which would use only one for all kinds of workloads. This means that planning should start with dividing workloads for different groups which might require different levels of protections.
  1. Apart from disks objects there are different objects especially SWAP which are not displayed in GUI and can be easily forgotten. However, based on the size of environment they might consume considerable amount of storage space.
  1. VM SWAP object does not adhere to Storage Policy assigned to VM. What does it mean? Even if you configure your VM’s disks with PFTT=0

SWAP will always utilize PFTT=1. Unless you configure advanced option (SwapThickProfivisionedDisabled) to disable it.

I have made a test to check how much space will consume my empty VM. (Empty means here without operating system even)

In order to see that a VM called Prod-01 has been created with 1 GB of memory and 2 GB of Hard disk and default storage policy assigned (PFTT=1)

Based on the Edit Setting window the VM disk size on datastore is 4 GB (Maximum sized based on disk size and policy). However, used storage space is 8 MB which means there will be 2 replicas 4 MB each, which is fine as there is no OS installed at all.

VMka wyłączona

However, when you open datastore files you will see this list with Virtual Disk object you will notice that the size is 36 864 KB which gives us 36 MB. So it’s neither 4 GB nor 8 MB as displayed by edit setting consumption..vsan pliki

Meanwhile datastore provisioned space is listed as 5,07 GB.

vmka dysk 2GB default policy i 1GB RAM - wyłączona

 

So let’s power on that VM.

Now the disks size remain intact, but other files appear as for instance SWAP has been created as well as log and other temporary files.

VSAN VMKa wlaczona

 

Looking at datastore provisioned space now it shows 5,9 GB. Which again is confisung even if we forgot about previous findings powering on VM triggers SWAP creation which according to the theory should be protected with PFTT=1 and be thick provisioned. But if that’s the case then the provisioned storage consumption should be increased by 2 GB not 0,83 (where some space is consumed for logs and other small files included in Home namespace object)

 

vmka dysk 2GB default policy i 1GB RAM - włączona

Moreover during those observations I noticed that during the VM booting process the provisioned space is peaking up to 7,11 GB for a very short period of time

And this value after a few seconds decreases to 5.07 GB. Even after a few reboots those values stays consistent.

vmka dysk 2GB default policy i 1GB RAM - podczas bootowania

The question is why those information are not consistent and what heppens during booting of the VM that is the reason for peak of provisioned space?

That’s the quest for not to figure it out 🙂

 

 

Perennially reservations weird behaviour whilst not configured correctly

Perennially reservations weird behaviour whilst not configured correctly

Whilst using RDM disks in your environment you might notice long (even extremely long) boot time of your ESXi hosts. That’s because ESXi host uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as perennially reserved that is participating in an MSCS cluster. During the start of an ESXi host, the storage mid-layer attempts to discover all devices presented to an ESXi host during the device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the start process to lengthen as the ESXi host cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.

Configuring the device to be perennially reserved is local to each ESXi host, and must be performed on every ESXi host that has visibility to each device participating in an MSCS cluster. This improves the start time for all ESXi hosts that have visibility to the devices.

The process is described in this KB  and is requires to issue following command on each ESXi:

 esxcli storage core device setconfig -d naa.id –perennially-reserved=true

You can check the status using following command:

esxcli storage core device list -d naa.id

In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved.

However, recently I came across on a problem with snapshot consolidation, even storage vMotion was not possible for particular VM.

Whilst checking VM settings one of the disks was locked and indicated that it’s running on a delta disks which means there is a snapshot. However, Snapshot manager didn’t showed any snapshot, at all. Moreover, creating new and delete all snapshot which in most cases solves the consolidation problem didn’t help as well.

Per1

In the vmkernel.log while trying to consolidate VM lots of perenially reservation entries was present. Which initially I ignored because there were RDMs which were intentionally configured as perennially reserved to prevent long ESXi boot.

log

However, after digging deeper and checking a few things, I return to perenially reservations and decided to check what the LUN which generates these warnings is and why it creates these entries especially while trying consolidation or storage vMotion of a VM.

To my surprise I realised that datastore on which the VM’s disks reside is configured as perenially reserved! It was due to a mistake when the PowerCLi script was prepared accidentially someone configured all available LUNs as perenially reserved. Changing the value to false happily solved the problem.

The moral of the story is simple – logs are not issued to be ignored 🙂

vCloud Director 9 – Released!

vCloud Director 9 – Released!

Today new version of VMware vCloud Director for Service Providers was released.

There are plenty of new features and enhancements like:

  • vVols support
  • Increased vCD-vCenter latensy up to 100 ms
  • Multisite feature which lets service providers offer a single port of entry to Tenants having multiple Virtual Data Centers (Org vDC’s) in different instances of vCD
  • Ability to manage routing between two or mogr Org vDC Networks with NSX DLR
  • PostgreSQL database support as an externam database

There are a few more as well as a list of known issues resolved.

Release notes for the product can be found here.

Complete list of new features and enhancements could be found here.