Product and service reviews are conducted independently by our editorial team, but we sometimes make money when you click on links. Learn more.

Cluster Reserve and VMware HA

Cluster Reserve and VMware HA

How a miscalculated design can have a huge downtime impact.

How a small design mistake can have huge downtime impacts: VMWare HA or extended unexpected downtime.

It happens more often than you’d think, and across all sorts of environments from the very smallest to the massive enterprises.  When it does happen, it creates huge problems – issues that were supposed to have been prevented by its very existence.

What is this horrific thing?  Some call it VMware HA.  Those who haven’t planned appropriately for its services call itextended unexpected downtime.

Perhaps an extreme intro for a functionality most people enjoy quite a bit, VMware HA does a lot of good for a VMware vSphere environment.  Properly configured, HA will automatically migrate virtual machines off a failed host and onto another surviving cluster host.  When correctly configured with enough resources, the failure of a cluster host might incur scant minutes (sometimes only seconds) of virtual machine downtime.

While this service indeed provides a valuable service for a VMware vSphere virtual environment, there are situations where well-meaning administrators can inadvertently enable its functions while setting themselves up for an extended problem.

At the center of this misconfiguration is a concept known generically as cluster reserve, although a simpler term might be “not having enough cluster hosts”.

Here’s the problem:  For VMware HA to function in a cluster, there must be enough resources available in that cluster to support VMs in the case of a host failure.  These extra resources – generally considered to be processing and memory – must exist somewhere within the hosts in a VMware HA cluster.

This first requirement is generally well understood by most virtual administrators.  Resources are indeed required for VMs to run.  However, what is commonly missed by manyvirtual administrators is the more costly requirement that these resources be set aside as unused.  Wasted.  Reserved.

At fault in this disconnect is perhaps VMware itself, which over the years has spent much of its time, efforts, and marketing dollars in highlighting its impressive resource overcommittment features.  In a VMware vSphere environment, memory resources can be overcommitted, whereby more virtual memory can be assigned to VMs than actually exists on a host.  Other resources like processing and network enjoy similar overcommittment capabilities.

This focus on overcommitting resources perhaps inadvertently causes some inexperienced virtual administrators to believe that overcommittment is the proverbial IT panacea:  A solution to every problem, including the resource constraint problems that happen during an HA event.

The reality is that a fully-prepared cluster must set aside one full server’s contribution of resources as reserved in the case of an HA event.  Thus, for example, a four-node cluster comprised of equally-sized hosts must set aside as unused the resources of one full server. 

Not doing so could mean that an HA event inadvertently and automatically creates a much larger problem the moment VMs begin migrating to alternate hosts.  In this not-enough-resources situation, that compression of VMs onto surviving hosts could increase resource consumption above the total amount of physical resources available to those hosts.  What results is swapping (of various forms), which represents a very painful and performance-draining last-ditch effort for hosts to keep each VM running.

Avoid this problem by avoiding three key mistakes before you ever turn in VMware HA in your vSphere cluster of ESXi hosts.  First, always design clusters with at least an N+1 configuration.  Make sure cluster hosts can support both the short- and long-term needs of VMs plus one entire host’s contribution that gets set aside as unused.

Greg ShieldsGreg ShieldsGreg Shields is a Microsoft MVP and VMware vExpert. He is a technology author, speaker and IT consultant, as well as a Partner and Principal Technologist with Concentrated Technology, with extensive experience in systems administration, engineering, and architecture specializing in Microsoft OS, remote application, systems management, and virtualization technologies.

See here for all of Greg's Tom's IT Pro articles.