Browsed by
Tag: HA

vCenter Server HA – changes in vSphere 6.5

vCenter Server HA – changes in vSphere 6.5

In vSphere 6.5 vCenter has a new native high availability solution that is available exclusively for the vCenter Server Appliance. This solution consists of Active, Passive, and Witness nodes which are cloned from the existing vCenter Server. The vCenter HA cluster can be enabled, disabled, or destroyed at any time. There is also a maintenance mode so planned maintenance does not cause an unwanted failover.

vcenter10

vCenter HA supports both an external PSC as well as an embedded PSC. Note, however, that in vSphere 6.5 at GA an embedded PSC cannot be used to replicate to any other PSC. Thus, if using an embedded PSC the vCenter Server cannot participate in Enhanced Linked Mode.

vCenter HA has some basic network requirements. A vCenter HA network must be established be and separate from the currently used subnet of the primary network interface of the vCenter Server Appliance (eth0). If using the Basic workflow a new interface, eth1, will be added to the appliance automatically prior to the cloning process. eth1 will be attached to the vCenter HA private network. The port group connecting to this network may reside on either a VMware Virtual Standard Switch (VSS) or a VMware Virtual Distributed Switch (VDS). There are no specific TCP/IP requirements for the vCenter HA network other than latency within the prescribed 10 ms RTT. Layer 2 connectivity is not required.

Failover can occur when an entire node is lost (host failure for example) or when certain key services fail. For the initial release of vCenter HA an RTO of about 5 minutes is expected but may vary slightly depending on load, size, and capabilities of the underlying hardware. During a failover event a temporary web page will be displaying indicating that a failover is in progress. That page will then refresh to the vSphere Web Client login page once vCenter Server is back online. In the case where a user is not active during the failover they may not be prompted to re-login. When compared to other high availability solutions, vCenter HA has several advantages:

vcenter11

PSC High Availability

After making vCenter Server highly available we also need to consider the availability options for the Platform Services Controller.

As you remember in vSphere 6.0 to provide HA for the PSC a supported load balancer was required –. If automated failover is not required we got option to manually repoint a vCenter Server between PSCs within an SSO site.vcenter12

In vSphere 6.5 VMware is  providing PSC HA solution that doesn’t require a load balancer but there is some integration work to be completed with other products in the SDDC portfolio before native PSC HA can be enabled.

I plan to test new vC and PSC HA  features in our lab environment – will provide separate article with my configuration details. At this moment let me point you to VMware KB as additional  reference:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1024051

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2147672

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2147018

vSphere 6.5 – vSphere HA Orchestrated Restart

vSphere 6.5 – vSphere HA Orchestrated Restart

VMware announced a new feature in vSphere 6.5 called HA Orchestrated Restart. But wait a minute – it was already available in previous version where you were able to set the restart priority for specific VMs or group of VMs. So what’s going on with this “new feature” ? As always, the devil is in the details 🙂

Let’s start from the old behavior. Using VM overrides in previous version of vSphere, we could set one of three available priorities – High, Medium (default) and Low. However it doesn’t guarantee that the restart order will be successful for our three-tier apps because HA is only really concerned about the resources to the VM, and once the VM had received the resources, HA’s job was done. The restart priority defined the order in which VMs would secure their resources. But if there was plenty of resources for everyone, then the VMs would receive their allocations in pretty quick succession and could start powering on. For example if DB server takes longer to boot than the App server for example, the App will not be able to access the DB and may fail.

vSphere 6.5 now allows you to create VM to VM dependency chains.  These dependency rules are also enforced if when vSphere HA is used to restart VMs from failed hosts.  That’s gives you the ability to configure the right chain of dependency where App server will wait for DB until it boots up. The VM to VM rules must also be created that complies with the Restart Priority level.  In this example, if the app server depended on the database server, the database server would need to be configured in a priority level higher or equal to the app server.orchestrated-restart

Validation checks are also automatically done when this feature is configured to ensure circular dependencies or conflicting rules are not unknowingly created.

There are number of conditions that HA can check for to determine the readiness of a VM which can be chosen by the administrator as the acceptable readiness state for orchestrated restarts.

Conditions:

  1. VM has resources secured (same as old behavior)
  2. VM is powered on
  3. VMware Tools heartbeat detected
  4. VMware Tools Application heartbeat detected

Post condition delays:

  1. User-configurable delay – e.g. wait 10 minutes after power on

The configuration of the dependency chain is very simple.  In the Cluster configuration of the Web Client, you would first create the VM groups under VM/Host Groups.  For each group, you would include only a single VM.

orchestrated-restart2-jpg

The next thing to configure is the VM Rules in VM/Host Rules section.  This is where you can define the dependency between the VM Groups.  Since each group only contains a single VM, you are essentially creating a VM to VM rule.

orchestrated-restart3

In previous releases we were able to manage such behavior using e.g. SRM during failover to recovery site. However there are plenty of use cases where it’s necessary to provide the correct order of restarts during single site and HA cluster. Fortunately, now it’s possible 🙂

What’s New in vSphere 6.5 – ProactiveHA

What’s New in vSphere 6.5 – ProactiveHA

Proactive HA is a new feature Available in vSphere 6.5 released recently. It’s a kind of feature which will even better help you to protect you environment in case of hardware failure.

Currently all of the hardware components are redundant including power supplies, fans, network cards etc. However the most possible cause of whole server failure occurs while one of these theoretically redundant components fails. To better imagine that let’s think about power supply fail. There is still the second one but during there is only one it is much more loaded. (Similar things you can observe with hard disks in a RAID group – the biggest possibility of a disk fail is during RAID re-building).

ProactiveHA will help you protect the environment in such situations. It will detect hardware conditions of a host and allow you to evacuate the VMs before the trivial issue causes the serious outage.  For this feature to function, the hardware vendor must participate.  Their hardware monitoring solution will advertise the health of the hardware, and vCenter will query that system to get a status of the hardware components such as the fans, memory, and power supplies.  vSphere can then be configured to respond according to the failure.

 

To let it functions there is a new ESXi host state in vSphere 6.5 – Quarantine mode. It’s similar to maintenance mode but it is not as severe as maintenance mode. It’s mean that DRS will attempt to evacuate all VMs from the host, but only if:

  • No performance impact on any virtual machine in the cluster
  • None of the business rules is disregarded
  • Additionally, any soft affinity or-anti-affinity rules will not be overridden by the evacuation. However, DRS will seek to avoid placing any new VMs.

To set the Proactive HA features, find the Partial Failures and Responses section and set how vSphere should respond to partial failures.  The options are to place a degraded host into Quarantine Mode, Maintenance Mode, or Mixed Mode.

Mixed mode means that for moderate degradation, the host will be placed into Quarantine Mode.  For Severe failures, it will be placed into Maintenance Mode.

proactiveha

For the moment of writing and availibility of vSphere 6.5 GA the supported failure condition types are:

  • Memory
  • Power
  • Fan
  • Network
  • Storage