0% found this document useful (0 votes)

192 views31 pages

Vmware Vsphere Metro Storage Cluster Recommended Practices

antimate

Uploaded by

hiehie272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

192 views31 pages

Vmware Vsphere Metro Storage Cluster Recommended Practices

antimate

Uploaded by

hiehie272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

VMware vSphere

Metro Storage Cluster

Recommended Practices
VMware vSphere 6.0
T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N

V 3 . 0 M AY 2 0 1 6
VMware vSphere Metro Storage Cluster
Recommended Practices

Table of Contents

Purpose and Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Target Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Interpreting This Document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Technical Requirements and Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Uniform Versus Nonuniform vMSC Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Infrastructure Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
vSphere Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
vSphere HA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Permanent Device Loss and All Paths Down Scenarios. . . . . . . . . . . . . . . . . . . . . . . . 11
vSphere DRS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Correcting Affinity Rule Violation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vSphere Storage DRS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Failure Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Single-Host Failure in Frimley Data Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Single-Host Isolation in Frimley Data Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Storage Partition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Data Center Partition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Disk Shelf Failure in Frimley Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Full Storage Failure in Frimley Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Permanent Device Loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Full Compute Failure in Frimley Data Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Loss of Frimley Data Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
About the Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2
VMware vSphere Metro Storage Cluster
Recommended Practices

Purpose and Overview

VMware vSphere Metro Storage Cluster (vMSC) is a specific configuration within the VMware Hardware
Compatibility List (HCL). These configurations are commonly referred to as stretched storage clusters or metro
storage clusters and are implemented in environments where disaster and downtime avoidance is a key
requirement. This best practices document was developed to provide additional insight and information for
operation of a vMSC infrastructure in conjunction with VMware vSphere. This paper explains how vSphere
handles specific failure scenarios, and it discusses various design considerations and operational procedures.
For detailed information about storage implementations, refer to documentation provided by the appropriate
VMware storage partner.

Target Audience
This document is intended for individuals with a technical background who design, deploy, or manage a
vSphere Metro Storage Cluster infrastructure. This includes but is not limited to technical consultants,
infrastructure architects, IT managers, implementation engineers, partner engineers, sales engineers,
and customer staff. This solution brief is not intended to replace or override existing certified designs for
vSphere Metro Storage Cluster solutions; it instead is meant to supplement knowledge and provide
additional information.

Interpreting This Document

The overall structure of this case study is largely self-explanatory. Throughout this document, however, there are
key points of particular importance that are highlighted for the reader. These points are identified with the
following label and example:
NOTE: This is a general point of importance or a further explanation to a particular section.
The authors of this document assume that the reader is familiar with vSphere, VMware vCenter Server,
VMware vSphere High Availability (vSphere HA), VMware vSphere Distributed Resource Scheduler (vSphere DRS),
VMware vSphere Storage DRS, and replication and storage clustering technology and terminology.

Introduction
A VMware vSphere Metro Storage Cluster configuration is a vSphere certified solution that combines replication
with array-based clustering. These solutions are typically deployed in environments where the distance between
data centers is limited, often metropolitan or campus environments.
vMSC infrastructures are implemented with a goal of reaping the same benefits that high-availability clusters
provide to a local site, in a geographically dispersed model with two data centers in different locations.
A vMSC infrastructure is essentially a stretched cluster. The architecture is built on the premise of extending
what is defined as local in terms of network and storage to enable these subsystems to span geographies,
presenting a single and common base infrastructure set of resources to the vSphere cluster at both sites.
It in essence stretches storage and the network between sites.
The primary benefit of a stretched cluster model is that it enables fully active and workload-balanced data
centers to be used to their full potential while gaining the capability to migrate virtual machines (VMs) with
VMware vSphere vMotion, and VMware vSphere Storage vMotion, between sites to enable on-demand and
nonintrusive mobility of workloads. The capability of a stretched cluster to provide this active balancing of
resources should always be the primary design and implementation goal. Although often associated with
disaster recovery, vMSC infrastructures are not recommended as primary solutions for pure disaster recovery.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 3
VMware vSphere Metro Storage Cluster
Recommended Practices

This document does not explain the difference between a disaster recovery and a downtime- or
disaster-avoidance solution. For more details on this distinction, refer to Stretched Clusters and VMware vCenter
Site Recovery Manager: Understanding the Options and Goals, located in the VMware Technical Resource Center
at http://www.vmware.com/resources/techresources/10262.
Stretched cluster solutions offer the following benefits:
Workload mobility
Cross-site automated load balancing
Enhanced downtime avoidance
Disaster avoidance

Technical Requirements and Constraints

Due to the technical constraints of an online migration of VMs, the following specific requirements, which are
listed in the VMware Compatibility Guide, must be met prior to consideration of a stretched cluster
implementation:
Storage connectivity using Fibre Channel, iSCSI, NFS, and FCoE is supported.
The maximum supported network latency between sites for the VMware ESXi management networks is 10ms
round-trip time (RTT).
vSphere vMotion, and vSphere Storage vMotion, supports a maximum of 150ms latency as of vSphere 6.0,
but this is not intended for stretched clustering usage.
The maximum supported latency for synchronous storage replication links is 10ms RTT. Refer to
documentation from the storage vendor because the maximum tolerated latency is lower in most cases.
The most commonly supported maximum RTT is 5ms.
The ESXi vSphere vMotion network has a redundant network link minimum of 250Mbps.
The storage requirements are slightly more complex. A vSphere Metro Storage Cluster requires what is in effect
a single storage subsystem that spans both sites. In this design, a given datastore must be accessiblethat is,
be able to be read and be written tosimultaneously from both sites. Further, when problems occur, the ESXi
hosts must be able to continue to access datastores from either array transparently and with no impact to
ongoing storage operations.
This precludes traditional synchronous replication solutions because they create a primarysecondary
relationship between the active (primary) LUN where data is being accessed and the secondary LUN that is
receiving replication. To access the secondary LUN, replication is stopped, or reversed, and the LUN is made
visible to hosts. This promoted secondary LUN has a completely different LUN ID and is essentially a newly
available copy of a former primary LUN. This type of solution works for traditional disaster recoverytype
configurations because it is expected that VMs must be started up on the secondary site. The vMSC configuration
requires simultaneous, uninterrupted access to enable live migration of running VMs between sites.
The storage subsystem for a vMSC must be able to be read from and write to both locations simultaneously.
All disk writes are committed synchronously at both locations to ensure that data is always consistent regardless
of the location from which it is being read. This storage architecture requires significant bandwidth and very low
latency between the sites in the cluster. Increased distances or latencies cause delays in writing to disk and a
dramatic decline in performance. They also preclude successful vMotion migration between cluster nodes that
reside in different locations.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 4
VMware vSphere Metro Storage Cluster
Recommended Practices

Uniform Versus Nonuniform vMSC Configurations

vMSC solutions are classified into two distinct categories. These categories are based on a fundamental
difference in how hosts access storage. It is important to understand the different types of stretched storage
solutions because this influences design considerations. The following two main categories are as described
on the VMware Hardware Compatibility List:
Uniform host access configuration ESXi hosts from both sites are all connected to a storage node in the
storage cluster across all sites. Paths presented to ESXi hosts are stretched across a distance.
Nonuniform host access configuration ESXi hosts at each site are connected only to storage node(s) at
the same site. Paths presented to ESXi hosts from storage nodes are limited to the local site.
The following in-depth descriptions of both categories clearly define them from architectural and
implementation perspectives.
With uniform host access configuration, hosts in data center A and data center B have access to the storage
systems in both data centers. In effect, the storage area network is stretched between the sites, and all hosts can
access all LUNs. NetApp MetroCluster software is an example of uniform storage. In this configuration, read/write
access to a LUN takes place on one of the two arrays, and a synchronous mirror is maintained in a hidden,
read-only state on the second array. For example, if a LUN containing a datastore is read/write on the array in
data center A, all ESXi hosts access that datastore via the array in data center A. For ESXi hosts in data center A,
this is local access. ESXi hosts in data center B that are running VMs hosted on this datastore send read/write
traffic across the network between data centers. In case of an outage or an operator-controlled shift of control
of the LUN to data center B, all ESXi hosts continue to detect the identical LUN being presented, but it is now
being accessed via the array in data center B.
The ideal situation is one in which VMs access a datastore that is controlled (read/write) by the array in the same
data center. This minimizes traffic between data centers to avoid the performance impact of reads traversing
the interconnect.
The notion of site affinity for a VM is dictated by the read/write copy of the datastore. Site affinity is also
sometimes referred to as site bias or LUN locality. This means that when a VM has site affinity with data
center A, its read/write copy of the datastore is located in data center A. This is explained in more detail in the
vSphere DRS subsection of this paper.

Data Center A Data Center B

ESXi-01 ESXi-02 Stretched Cluster ESXi-03 ESXi-04

Storage A Storage B

LUN 1 LUN 1
(Read/Write) (Read-Only)
Replication
LUN 2 LUN 2
(Read-Only) (Read/Write)

Figure 1. Uniform Configuration

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 5
VMware vSphere Metro Storage Cluster
Recommended Practices

With nonuniform host access configuration, hosts in data center A have access only to the array within the local
data center; the array, as well as its peer array in the opposite data center, is responsible for providing access to
datastores in one data center to ESXi hosts in the opposite data center. EMC VPLEX is an example of a storage
system that can be deployed as a nonuniform storage cluster, although it can also be configured in a uniform
manner. VPLEX provides the concept of a virtual LUN, which enables ESXi hosts in each data center to read
and write to the same datastore or LUN. VPLEX technology maintains the cache state on each array so ESXi
hosts in either data center detect the LUN as local. EMC calls this solution write anywhere. Even when two VMs
reside on the same datastore but are located in different data centers, they write locally without any performance
impact on either VM. A key point with this configuration is that each LUN or datastore has site affinity, also
sometimes referred to as site bias or LUN locality. In other words, if anything happens to the link between
the sites, the storage system on the preferred site for a given datastore will be the only one remaining with
read/write access to it. This prevents any data corruption in case of a failure scenario.

Data Center A Data Center B

ESXi-01 ESXi-02 Stretched Cluster ESXi-03 ESXi-04

Distributed VMware vSphere

VMFS Datastore

Storage Storage
Cluster A Cluster B

Physical Physical
Storage A Storage B

LUN LUN
(Read/Write) Replication (Read/Write)

Figure 2. Nonuniform Configuration

Our examples use uniform storage because these configurations are currently the most commonly deployed.
Many of the design considerations, however, also apply to nonuniform configurations. We point out exceptions
when this is not the case

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 6
VMware vSphere Metro Storage Cluster
Recommended Practices

Infrastructure Architecture
In this section, we describe the basic architecture referenced in this document. We also discuss basic
configuration and performance of the various vSphere features. For an in-depth explanation of each feature,
refer to the vSphere 6.0 Availability Guide and the vSphere 6.0 Resource Management Guide. We make specific
recommendations based on VMware best practices and provide operational guidance where applicable. It is
explained in our failure scenarios how these best practices prevent or limit downtime.

Infrastructure
The described infrastructure consists of a single vSphere 6.0 cluster with four ESXi 6.0 hosts. These hosts are
managed by a VMware vCenter Server Appliance 6.0 instance. The first site is called Frimley; the second site is
called Bluefin. The network between Frimley data center and Bluefin data center is a stretched layer 2 network.
There is a minimal distance between the sites, as is typical in campus cluster scenarios.
Each site has two ESXi hosts, and the vCenter Server instance is configured with vSphere DRS affinity to the
hosts in Bluefin data center. In a stretched cluster environment, only a single vCenter Server instance is used.
This is different from a traditional VMware Site Recovery Manager configuration in which a dual vCenter Server
configuration is required. The configuration of VM-to-host affinity rules is discussed in more detail in the
vSphere DRS subsection of this document.
Eight LUNs are depicted in Figure 3. Four of these are accessed through the virtual IP address active on the
iSCSI storage system in the Frimley data center; four are accessed through the virtual IP address active on the
iSCSI storage system in the Bluefin data center.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Replication

Figure 3. Test Environment

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 7
VMware vSphere Metro Storage Cluster
Recommended Practices

LO C AT I O N HOSTS DATA S TO R E S LO C A L I S O L AT I O N
ADDRESS

Bluefin 172.16.103.184 Bluefin01 172.16.103.10

172.16.103.185 Bluefin02

Bluefin03

Bluefin04

Frimley 172.16.103.182 Frimley01 172.16.103.11

172.16.103.183 Frimley02

Frimley03

Frimley04

Table 1. Infrastructure Details

The vSphere cluster is connected to a stretched storage system in a fabric configuration with a uniform device
access model. This means that every host in the cluster is connected to both storage heads. Each of the heads is
connected to two switches, which are connected to two similar switches in the secondary location. For any given
LUN, one of the two storage heads presents the LUN as read/write via iSCSI. The other storage head maintains
the replicated, read-only copy that is effectively hidden from the ESXi hosts.

vSphere Configuration
Our focus in this document is on vSphere HA, vSphere DRS, and vSphere Storage DRS in relation to stretched
cluster environments. Design and operational considerations regarding vSphere are commonly overlooked and
underestimated. Much emphasis has traditionally been placed on the storage layer, but little attention has been
applied to how workloads are provisioned and managed.
One of the key drivers for using a stretched cluster is workload balance and disaster avoidance. How do we ensure
that our environment is properly balanced without impacting availability or severely increasing the operational
expenditure? How do we build the requirements into our provisioning process and validate periodically that we
still meet them? Ignoring the requirements makes the environment confusing to administrate and less
predictable during the various failure scenarios for which it should be of help.
Each of these three vSphere features has very specific configuration requirements and can enhance environment
resiliency and workload availability. Architectural recommendations based on our findings during the testing of
the various failure scenarios are given throughout this section.
vSphere HA
The environment has four hosts and a uniform stretched storage solution. A full site failure is one scenario that
must be taken into account in a resilient architecture. VMware recommends enabling vSphere HA admission
control. Workload availability is the primary driver for most stretched cluster environments, so providing
sufficient capacity for a full site failure is recommended. Such hosts are equally divided across both sites.
To ensure that all workloads can be restarted by vSphere HA on just one site, configuring the admission control
policy to 50 percent for both memory and CPU is recommended.
VMware recommends using a percentage-based policy because it offers the most flexibility and reduces
operational overhead. Even when new hosts are introduced to the environment, there is no need to change the
percentage and no risk of a skewed consolidation ratio due to possible use of VM-level reservations. For more
details about admission control policies and the associated algorithms, refer to the vSphere 6.0 Availability Guide.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 8
VMware vSphere Metro Storage Cluster
Recommended Practices

Figure 4 shows a vSphere HA cluster configured with admission control enabled and with the percentage-based
policy set to 50 percent.

Figure 4. vSphere HA Configuration

vSphere HA uses heartbeat mechanisms to validate the state of a host. There are two such mechanisms:
network heartbeating and datastore heartbeating. Network heartbeating is the primary mechanism for
vSphere HA to validate availability of the hosts. Datastore heartbeating is the secondary mechanism used by
vSphere HA; it determines the exact state of the host after network heartbeating has failed.
If a host is not receiving any heartbeats, it uses a fail-safe mechanism to detect if it is merely isolated from its
master node or completely isolated from the network. It does this by pinging the default gateway. In addition to
this mechanism, one or more isolation addresses can be specified manually to enhance reliability of isolation
validation. VMware recommends specifying a minimum of two additional isolation addresses, with each address
site local.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 9
VMware vSphere Metro Storage Cluster
Recommended Practices

In our scenario, one of these addresses physically resides in the Frimley data center; the other physically resides
in the Bluefin data center. This enables vSphere HA validation for complete network isolation, even in case of a
connection failure between sites. Figure 5 shows an example of how to configure multiple isolation addresses.
The vSphere HA advanced setting used is das.isolationaddress. More details on how to configure this can be
found in VMware Knowledge Base article 1002117.
The minimum number of heartbeat datastores is two and the maximum is five. For vSphere HA datastore
heartbeating to function correctly in any type of failure scenario, VMware recommends increasing the number of
heartbeat datastores from two to four in a stretched cluster environment. This provides full redundancy for both
data center locations. Defining four specific datastores as preferred heartbeat datastores is also recommended,
selecting two from one site and two from the other. This enables vSphere HA to heartbeat to a datastore even in
the case of a connection failure between sites. Subsequently, it enables vSphere HA to determine the state of a
host in any scenario.
Adding an advanced setting called das.heartbeatDsPerHost can increase the number of heartbeat datastores.
This is shown in Figure 5.

Figure 5. vSphere HA Advanced Settings

To designate specific datastores as heartbeat devices, VMware recommends using Select any of the cluster
datastores taking into account my preferences. This enables vSphere HA to select any other datastore if the
four designated datastores that have been manually selected become unavailable. VMware recommends
selecting two datastores in each location to ensure that datastores are available at each site in the case of a
site partition.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 0
VMware vSphere Metro Storage Cluster
Recommended Practices

Figure 6. Datastore Heartbeating

Permanent Device Loss and All Paths Down Scenarios

As of vSphere 6.0, enhancements have been introduced to enable an automated failover of VMs residing on a
datastore that has either an all paths down (APD) or a permanent device loss (PDL) condition. PDL is applicable
only to block storage devices.
A PDL condition, as is discussed in one of our failure scenarios, is a condition that is communicated by the array
controller to the ESXi host via a SCSI sense code. This condition indicates that a device (LUN) has become
unavailable and is likely permanently unavailable. An example scenario in which this condition is communicated
by the array is when a LUN is set offline. This condition is used in nonuniform models during a failure scenario to
ensure that the ESXi host takes appropriate action when access to a LUN is revoked. When a full storage failure
occurs, it is impossible to generate the PDL condition because there is no communication possible between the
array and the ESXi host. This state is identified by the ESXi host as an APD condition. Another example of an
APD condition is where the storage network has failed completely. In this scenario, the ESXi host also does not
detect what has happened with the storage and declares an APD.
To enable vSphere HA to respond to both an APD and a PDL condition, vSphere HA must be configured in a
specific way. VMware recommends enabling VM Component Protection (VMCP). After the creation of the
cluster, VMCP must be enabled, as is shown in Figure 7.

Figure 7. VM Component Protection

The configuration screen can be found as follows:

Log in to VMware vSphere Web Client.
Click Hosts and Clusters.
Click the cluster object.
Click the Manage tab.
Click vSphere HA and then Edit.
Select Protect against Storage Connectivity Loss.
Select individual functionality, as described in the following, by opening Failure conditions and VM response.
The configuration for PDL is basic. In the Failure conditions and VM response section, the response following
detection of a PDL condition can be configured. VMware recommends setting this to Power off and restart VMs.
When this condition is detected, a VM is restarted instantly on a healthy host within the vSphere HA cluster.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 1
VMware vSphere Metro Storage Cluster
Recommended Practices

For an APD scenario, configuration must occur in the same section, as is shown in Figure 8. Besides defining the
response to an APD condition, it is also possible to alter the timing and to configure the behavior when the
failure is restored before the APD timeout has passed.

Figure 8. VMCP Detailed Configuration

When an APD condition is detected, a timer is started. After 140 seconds, the APD condition is officially declared
and the device is marked as APD timeout. When 140 seconds have passed, vSphere HA starts counting. The
default vSphere HA timeout is 3 minutes. When the 3 minutes have passed, vSphere HA restarts the impacted
VMs, but VMCP can be configured to respond differently if preferred. VMware recommends configuring it to
Power off and restart VMs (conservative).
Conservative refers to the likelihood that vSphere HA will be able to restart VMs. When set to conservative,
vSphere HA restarts only the VM that is impacted by the APD if it detects that a host in the cluster can access
the datastore on which the VM resides. In the case of aggressive, vSphere HA attempts to restart the VM even if
it doesnt detect the state of the other hosts. This can lead to a situation in which a VM is not restarted because
there is no host that has access to the datastore on which the VM is located.
If the APD is lifted and access to the storage is restored before the timeout has passed, vSphere HA does not
unnecessarily restart the VM unless explicitly configured to do so. If a response is chosen even when the
environment has recovered from the APD condition, Response for APD recovery after APD timeout can be
configured to Reset VMs. VMware recommends leaving this setting disabled.
Starting with vSphere 5.5, an advanced setting called Disk.AutoremoveOnPDL was introduced, and is
implemented by default. This functionality enables vSphere to remove devices marked as PDL and helps
prevent reaching, for example, the 256-device limit for an ESXi host. However, if the PDL scenario is resolved
and the device returns, the ESXi hosts storage system must be rescanned before the device will appear.
VMware recommends disabling Disk.AutoremoveOnPDL for vSphere 5.5 hosts by setting the host advanced
settings value to 0. For vSphere 6.0 hosts, this advanced setting is no longer required to be changed from
the default configuration to properly recover the devices marked as PDL.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 2
VMware vSphere Metro Storage Cluster
Recommended Practices

vSphere DRS
vSphere DRS is used in many environments to distribute load within a cluster. It offers many other features that
can be very helpful in stretched cluster environments. VMware recommends enabling vSphere DRS to facilitate
load balancing across hosts in the cluster. The vSphere DRS load-balancing calculation is based on CPU and
memory use. Care should be taken with regard to both storage and networking resources as well as to traffic
flow. To avoid storage and network traffic overhead in a stretched cluster environment, VMware recommends
implementing vSphere DRS affinity rules to enable a logical separation of VMs. This subsequently helps improve
availability. For VMs that are responsible for infrastructure services, such as Microsoft Active Directory and DNS,
it assists by ensuring separation of these services across sites.
vSphere DRS affinity rules also help prevent unnecessary downtime, and storage and network traffic flow
overhead, by enforcing preferred site affinity. VMware recommends aligning vSphere VM-to-host affinity rules
with the storage configurationthat is, setting VM-to-host affinity rules with a preference that a VM run on a
host at the same site as the array that is configured as the primary read/write node for a given datastore.
For example, in our test configuration, VMs stored on the Frimley01 datastore are set with VM-to-host affinity
with a preference for hosts in the Frimley data center. This ensures that in the case of a network connection
failure between sites, VMs do not lose connection with the storage system that is primary for their datastore.
VM-to-host affinity rules aim to ensure that VMs stay local to the storage primary for that datastore. This
coincidentally also results in all read I/Os staying local.
NOTE: Different storage vendors use different terminology to describe the relationship of a LUN to a particular
array or controller. For the purposes of this document, we use the generic term storage site affinity, which
refers to the preferred location for access to a given LUN.
VMware recommends implementing should rules because these are violated by vSphere HA in the case of a
full site failure. Availability of services should always prevail. In the case of must rules, vSphere HA does not
violate the rule set, and this can potentially lead to service outages. In the scenario where a full data center fails,
must rules do not allow vSphere HA to restart the VMs, because they do not have the required affinity to start
on the hosts in the other data center. This necessitates the recommendation to implement should rules.
vSphere DRS communicates these rules to vSphere HA, and these are stored in a compatibility list governing
allowed start-up. If a single host fails, VM-to-host should rules are ignored by default. VMware recommends
configuring vSphere HA rule settings to respect VM-to-host affinity rules where possible. With a full site failure,
vSphere HA can restart the VMs on hosts that violate the rules. Availability takes preference in this scenario.

Figure 9. vSphere HA Affinity Rule Settings

Under certain circumstances, such as massive host saturation coupled with aggressive recommendation
settings, vSphere DRS can also violate should rules. Although this is very rare, we recommend monitoring for
violation of these rules because a violation might impact availability and workload performance.
VMware recommends manually defining sites by creating a group of hosts that belong to a site and then
adding VMs to these sites based on the affinity of the datastore on which they are provisioned. In our scenario,
only a limited number of VMs were provisioned. VMware recommends automating the process of defining site
affinity by using tools such as VMware vCenter Orchestrator or VMware vSphere PowerCLI. If automating the
process is not an option, use of a generic naming convention is recommended to simplify the creation of these
groups. VMware recommends that these groups be validated on a regular basis to ensure that all VMs belong to
the group with the correct site affinity.
Figures 11 through 14 depict the configuration used for this white paper. In the first screenshot, all VMs that
should remain local to the Bluefin data center are added to the Bluefin VM group.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 3
VMware vSphere Metro Storage Cluster
Recommended Practices

Figure 10. VM Group

Next, a Bluefin host group is created that contains all hosts residing in this location.

Figure 11. Host Group

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 4
VMware vSphere Metro Storage Cluster
Recommended Practices

Next, a new rule is created that is defined as a should run on rule. It links the host group and the VM group for
the Bluefin location.

Figure 12. Rule Definition

This should be done for both locations, which should result in two rules.

Figure 13. VM/Host Rules

Correcting Affinity Rule Violation

vSphere DRS assigns a high priority to correcting affinity rule violations. During invocation, the primary goal of
vSphere DRS is to correct any violations and generate recommendations to migrate VMs to the hosts listed in
the host group. These migrations have a higher priority than load-balancing moves and are started before them.
vSphere DRS is invoked every 5 minutes by default, but it is also triggered if the cluster detects changes.
For instance, when a host reconnects to the cluster, vSphere DRS is invoked and generates recommendations
to correct the violation. Our testing has shown that vSphere DRS generates recommendations to correct affinity
rules violations within 30 seconds after a host reconnects to the cluster. vSphere DRS is limited by the overall
capacity of the vSphere vMotion network, so it might take multiple invocations before all affinity rule violations
are corrected.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 5
VMware vSphere Metro Storage Cluster
Recommended Practices

vSphere Storage DRS

vSphere Storage DRS enables aggregation of datastores to a single unit of consumption from an administrative
perspective, and it balances VM disks when defined thresholds are exceeded. It ensures that sufficient disk
resources are available to a workload. VMware recommends enabling vSphere Storage DRS with I/O Metric
disabled. The use of I/O Metric or VMware vSphere Storage I/O Control is not supported in a vMSC configuration,
as is described in VMware Knowledge Base article 2042596.

Figure 14. Storage DRS Configuration

vSphere Storage DRS uses vSphere Storage vMotion to migrate VM disks between datastores within a datastore
cluster. Because the underlying stretched storage systems use synchronous replication, a migration or series of
migrations have an impact on replication traffic and might cause the VMs to become temporarily unavailable
due to contention for network resources during the movement of disks. Migration to random datastores can also
potentially lead to additional I/O latency in uniform host access configurations if VMs are not migrated along
with their virtual disks. For example, if a VM residing on a host at site A has its disk migrated to a datastore at site
B, it continues operating but with potentially degraded performance. The VMs disk reads now are subject to the
increased latency associated with reading from the virtual iSCSI IP at site B. Reads are subject to intersite latency
rather than being satisfied by a local target.
To control if and when migrations occur, VMware recommends configuring vSphere Storage DRS in manual
mode. This enables human validation per recommendation as well as recommendations to be applied during
off-peak hours, while gaining the operational benefit and efficiency of the initial placement functionality.
VMware recommends creating datastore clusters based on the storage configuration with respect to storage
site affinity. Datastores with a site affinity for site A should not be mixed in datastore clusters with datastores
with a site affinity for site B. This enables operational consistency and eases the creation and ongoing
management of vSphere DRS VM-to-host affinity rules. Ensure that all vSphere DRS VM-to-host affinity rules
are updated accordingly when VMs are migrated via vSphere Storage vMotion between datastore clusters and
when crossing defined storage site affinity boundaries. To simplify the provisioning process, VMware
recommends aligning naming conventions for datastore clusters and VM-to-host affinity rules.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 6
VMware vSphere Metro Storage Cluster
Recommended Practices

Figure 15. Datastore Clusters

The naming convention used in our testing gives both datastores and datastore clusters a site-specific name to
provide ease of alignment of vSphere DRS host affinity with VM deployment in the correlate site.

Failure Scenarios
There are many failures that can be introduced in clustered systems. But in a properly architected environment,
vSphere HA, vSphere DRS, and the storage subsystem do not detect many of these. We do not address the
zero-impact failures, such as the failure of a single network cable, because they are explained in depth in the
documentation provided by the storage vendor of the various solutions. We discuss the following common
failure scenarios:
Single-host failure in Frimley data center
Single-host isolation in Frimley data center
Storage partition
Data center partition
Disk shelf failure in Frimley data center
Full storage failure in Frimley data center
Full compute failure in Frimley data center
Full compute failure in Frimley data center and full storage failure in Bluefin data center
Loss of complete Frimley data center
We also examine scenarios in which specific settings are incorrectly configured. These settings determine the
availability and recoverability of VMs in a failure scenario. It is important to understand the impact of
misconfigurations such as the following:
Incorrectly configured VM-to-host affinity rules
Incorrectly configured heartbeat datastores
Incorrectly configured isolation address
Incorrectly configured PDL handling
vCenter Server split-brain scenario

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 7
VMware vSphere Metro Storage Cluster
Recommended Practices

Single-Host Failure in Frimley Data Center

In this scenario, we describe the complete failure of a host in Frimley data center. This scenario is depicted in
Figure 17.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Replication

Figure 16. Single-Host Failure Scenario

Result
vSphere HA successfully restarted all VMs in accordance with VM-to-host affinity rules.
Explanation
If a host fails, the clusters vSphere HA master node detects the failure because it no longer is receiving network
heartbeats from the host. Then the master starts monitoring for datastore heartbeats. Because the host has
failed completely, it cannot generate datastore heartbeats; these too are detected as missing by the vSphere HA
master node. During this time, a third availability checkpinging the management addresses of the failed
hostsis conducted. If all of these checks return as unsuccessful, the master declares the missing host as dead
and attempts to restart all the protected VMs that had been running on the host before the master lost contact
with the host.
The vSphere VM-to-host affinity rules defined on a cluster level are should rules. vSphere HA VM-to-host
affinity rules should be respected so all VMs are restarted within the correct site.
However, if the host elements of the VM-to-host group are temporarily without resources, or if they are
unavailable for restarts for any other reason, vSphere HA can disregard the rules and restart the remaining VMs
on any of the remaining hosts in the cluster, regardless of location and rules. If this occurs, vSphere DRS attempts
to correct any violated affinity rules at the first invocation and automatically migrates VMs in accordance with
their affinity rules to bring VM placement in alignment. VMware recommends manually invoking vSphere DRS
after the cause for the failure has been identified and resolved. This ensures that all VMs are placed on hosts in
the correct location to avoid possible performance degradation due to misplacement.
T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 8
VMware vSphere Metro Storage Cluster
Recommended Practices

Single-Host Isolation in Frimley Data Center

In this scenario, we describe the response to isolation of a single host in Frimley data center from the rest of
the network.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B
Figure 17. Single-Host Isolation Scenario

Result
VMs remain running because isolation response is configured to leave powered on.
Explanation
When a host is isolated, the vSphere HA master node detects the isolation because it no longer is receiving
network heartbeats from the host. Then the master starts monitoring for datastore heartbeats. Because the host
is isolated, it generates datastore heartbeats for the secondary vSphere HA detection mechanism. Detection of
valid host heartbeats enables the vSphere HA master node to determine that the host is running but is isolated
from the network. Depending on the isolation response configured, the impacted host can power off or shut
down VMs or can leave them powered on. The isolation response is triggered 30 seconds after the host has
detected that it is isolated.
VMware recommends aligning the isolation response to business requirements and physical constraints.
From a best practices perspective, leave powered on is the recommended isolation response setting for the
majority of environments. Isolated hosts are rare in a properly architected environment, given the built-in
redundancy of most modern designs. In environments that use network-based storage protocols, such as iSCSI
and NFS, and where networks are converged, the recommended isolation response is power off. In these
environments, it is more likely that a network outage that causes a host to become isolated also affects the
hosts ability to communicate to the datastores.
If an isolation response different from the recommended leave powered on is selected and a power off or
shut down response is triggered, the vSphere HA master restarts VMs on the remaining nodes in the cluster.
The vSphere VM-to-host affinity rules defined on a cluster level are should rules. However, because the
vSphere HA rule settings specify that the vSphere HA VM-to-host affinity rules should be respected, all VMs
are restarted within the correct site under normal circumstances.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 1 9
VMware vSphere Metro Storage Cluster
Recommended Practices

Storage Partition
In this scenario, a failure has occurred on the storage network between data centers, as is depicted in Figure 19.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Figure 18. Storage Partition Scenario

Result
VMs remain running with no impact.
Explanation
Storage site affinity is defined for each LUN, and vSphere DRS rules align with this affinity. Therefore, because
storage remains available within the site, no VM is impacted.
If for any reason the affinity rule for a VM has been violated and the VM is running on a host in Frimley data
center while its disk resides on a datastore that has affinity with Bluefin data center, it cannot successfully issue
I/O following an intersite storage partition. This is because the datastore is in an APD condition. In this scenario,
the VM can be restarted because vSphere HA is configured to respond to APD conditions. The response occurs
after the 3-minute grace period has passed. This 3-minute period starts after the APD timeout of 140 seconds
has passed and the APD condition has been declared.
To avoid unnecessary downtime in an APD scenario, VMware recommends monitoring compliance of vSphere
DRS rules. Although vSphere DRS is invoked every 5 minutes, this does not guarantee resolution of all affinity
rule violations. Therefore, to prevent unnecessary downtime, rigid monitoring is recommended that enables
quick identification of anomalies such as a VMs computes residing in one site while its storage resides in the
other site.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 0
VMware vSphere Metro Storage Cluster
Recommended Practices

Data Center Partition

In this scenario, the Frimley data center is isolated from the Bluefin data center, as is depicted in Figure 20.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Figure 19. Data Center Partition Scenario

Result
VMs remain running with no impact.
Explanation
In this scenario, the two data centers are fully isolated from each other. This scenario is similar to both the
storage partition and the host isolation scenario. VMs are not impacted by this failure because vSphere DRS
rules were correctly implemented and no rules were violated.
vSphere HA follows this logical process to determine which VMs require restarting during a cluster partition:
The vSphere HA master node running in Frimley data center detects that all hosts in Bluefin data center are
unreachable. It first detects that no network heartbeats are being received. It then determines whether any
storage heartbeats are being generated. This check does not detect storage heartbeats because the storage
connection between sites also has failed, and the heartbeat datastores are updated only locally. Because the
VMs with affinity to the remaining hosts are still running, no action is needed for them. Next, vSphere HA
determines whether a restart can be attempted. However, the read/write version of the datastores located in
Bluefin data center are not accessible by the hosts in Frimley data center. Therefore, no attempt is made to start
the missing VMs.
Similarly, the ESXi hosts in Bluefin data center detect that there is no master available, and they initiate a master
election process. After the master has been elected, it tries to determine which VMs had been running before
the failure and it attempts to restart them. Because all VMs with affinity to Bluefin data center are still running
there, there is no need for a restart. Only the VMs with affinity to Frimley data center are unavailable, and
vSphere HA cannot restart them because the datastores on which they are stored have affinity with Frimley data
center and are unavailable in Bluefin data center.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 1
VMware vSphere Metro Storage Cluster
Recommended Practices

If VM-to-host affinity rules have been violatedthat is, VMs have been running at a location where their storage
is not defined as read/write by defaultthe behavior changes. The following sequence describes what would
happen in that case:
1. The VM with affinity to Frimley data center but residing in Bluefin data center is unable to reach its
datastore. This results in the VMs being unable to write to or read from disk.
2. In Frimley data center, this VM is restarted by vSphere HA because the hosts in Frimley data center do not
detect the instances running in Bluefin data center.
3. Because the datastore is available only to Frimley data center, one of the hosts in Frimley data center
acquires a lock on the VMDK and is able to power on this VM.
4. This can result in a scenario in which the same VM is powered on and running in both data centers.

Frimley Bluefin

VM-1 VM-1

2 Power-On Operation

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch 1 I/O Failure

Switch Switch

Storage A Storage B

Frimley01 Frimley01
Read/Write Read-Only

VM-1-VMDK
3 Acquire Lock

Figure 20. Ghost VM

5. If the APD response is configured to Power off and restart VMs (aggressive), the VM is powered off after
the APD timeout and the grace period have passed. This behavior is new in vSphere 6.0.
If the APD response is not correctly configured, two VMs will be running, for the following possible reasons:
The network heartbeat from the host that is running this VM is missing because there is no connection to
that site.
The datastore heartbeat is missing because there is no connection to that site.
A ping to the management address of the host that is running the VM fails because there is no connection to
that site.
The master located in Frimley data center detects that the VM had been powered on before the failure.
Because it is unable to communicate with the VMs host in Bluefin data center after the failure, it attempts to
restart the VM because it cannot detect the actual state.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 2
VMware vSphere Metro Storage Cluster
Recommended Practices

If the connection between sites is restored, a classic VM split-brain scenario will exist. For a short period of
time, two copies of the VM will be active on the network, with both having the same MAC address. Only one
copy, however, will have access to the VM files, and vSphere HA will detect this. As soon as this is detected, all
processes belonging to the VM copy that has no access to the VM files will be killed, as is depicted in Figure 22.

Figure 21. Tasks and Events

In this example, the downtime equates to a VMs having to be restarted. Proper maintenance of site affinity can
prevent this. To avoid unnecessary downtime, VMware recommends close monitoring to ensure that vSphere DRS
rules align with datastore site affinity.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 3
VMware vSphere Metro Storage Cluster
Recommended Practices

Disk Shelf Failure in Frimley Data Center

In this scenario, one of the disk shelves in Frimley data center has failed. Both Frimley01 and Frimley02 on
storage A are impacted.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read/Write Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read/Write Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Replication

Figure 22. Disk Shelf Failure Scenario

Result
VMs remain running with no impact.
Explanation
In this scenario, only a disk shelf in Frimley data center has failed. The storage processor has detected the failure
and has instantly switched from the primary disk shelf in Frimley data center to the mirror copy in Bluefin data
center. There is no noticeable impact to any of the VMs except for a typical short spike in I/O response time. The
storage solution fully detects and handles this scenario. There is no need for a rescan of the datastores or the
HBAs because the switchover is seamless and the LUNs are identical from the ESXi perspective.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 4
VMware vSphere Metro Storage Cluster
Recommended Practices

Full Storage Failure in Frimley Data Center

In this scenario, a full storage system failure has occurred in Frimley data center.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Figure 23. Full Storage Failure Scenario

Result
VMs remain running with no impact.
Explanation
When the full storage system fails in Frimley data center, a take over command must be initiated manually.
As described previously, we used a NetApp MetroCluster configuration to describe this behavior. This take over
command is particular to NetApp environments; depending on the implemented storage system, the required
procedure can differ. After the command has been initiated, the mirrored, read-only copy of each of the failed
datastores is set to read/write and is instantly accessible. We have described this process on an extremely high
level. For more details, refer to the storage vendors documentation.
From the VM perspective, this failover is seamless: The storage controllers handle this, and no action is required
from either the vSphere or storage administrator. All I/O now passes across the intrasite connection to the other
data center because VMs remain running in Frimley data center while their datastores are accessible only in
Bluefin data center.
vSphere HA does not detect this type of failure. Although the datastore heartbeat might be lost briefly,
vSphere HA does not take action because the vSphere HA master agent checks for the datastore heartbeat only
when the network heartbeat is not received for 3 seconds. Because the network heartbeat remains available
throughout the storage failure, vSphere HA is not required to initiate any restarts.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 5
VMware vSphere Metro Storage Cluster
Recommended Practices

Permanent Device Loss

In the scenario shown in Figure 25, a permanent device loss (PDL) condition occurs because datastore Frimley01
has been taken offline for ESXi-01 and ESXi-02. PDL scenarios are uncommon in uniform configurations and are
more likely to occur in a nonuniform vMSC configuration. However, a PDL scenario can, for instance, occur when
the configuration of a storage group changes as in the case of this described scenario.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Offline Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Replication

Figure 24. Permanent Device Loss Scenario

Result
VMs are restarted by vSphere HA on ESXi-03 and ESXi-04.
Explanation
When the PDL condition occurs, VMs running on datastore Frimley01 on hosts ESXi-01 and ESXi-02 are killed
instantly. They then are restarted by vSphere HA on hosts within the cluster that have access to the datastore,
ESXi-03 and ESXi-04 in this scenario. The PDL and killing of the VM world group leader can be witnessed by
following the entries in the vmkernel.log file located in /var/log/ on the ESXi hosts. The following is an outtake
of the vmkernel.log file where a PDL is recognized and appropriate action is taken.
2012-03-14T13:39:25.085Z cpu7:4499)WARNING: VSCSI: 4055: handle
8198(vscsi4:0):opened by wid 4499 (vmm0:fri-iscsi-02) has Permanent
Device Loss. Killing world group leader 4491

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 6
VMware vSphere Metro Storage Cluster
Recommended Practices

VMware recommends configuring Response for Datastore with Permanent Device Loss (PDL) to Power off
and restart VMs. This setting ensures that appropriate action is taken when a PDL condition exists. The correct
configuration is shown in Figure 26.

Figure 25. APD/PDL Configuration

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 7
VMware vSphere Metro Storage Cluster
Recommended Practices

Full Compute Failure in Frimley Data Center

In this scenario, a full compute failure has occurred in Frimley data center.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B
Figure 26. Full Compute Failure Scenario

Result
All VMs are successfully restarted in Bluefin data center.
Explanation
The vSphere HA master was located in Frimley data center at the time of the full compute failure at that location.
After the hosts in Bluefin data center detected that no network heartbeats had been received, an election
process was started. Within approximately 20 seconds, a new vSphere HA master was elected from the
remaining hosts. Then the new master determined which hosts had failed and which VMs had been impacted by
this failure. Because all hosts at the other site had failed and all VMs residing on them had been impacted,
vSphere HA initiated the restart of all of these VMs. vSphere HA can initiate 32 concurrent restarts on a single
host, providing a low restart latency for most environments. The only sequencing of start order comes from the
broad high, medium, and low categories for vSphere HA. This policy must be set on a per-VM basis. These
policies were determined to have been adhered to; high-priority VMs started first, followed by medium-priority
and low-priority VMs.
As part of the test, the hosts at the Frimley data center were again powered on. As soon as vSphere DRS
detected that these hosts were available, a vSphere DRS run was invoked. Because the initial vSphere DRS run
corrects only the vSphere DRS affinity rule violations, resource imbalance was not correct until the next full
invocation of vSphere DRS. vSphere DRS is invoked by default every 5 minutes or when VMs are powered off or
on through the use of vSphere Web Client.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 8
VMware vSphere Metro Storage Cluster
Recommended Practices

Loss of Frimley Data Center

In this scenario, a full failure of Frimley data center is simulated.

Frimley Bluefin
Switch Switch
Management
Network
Switch Switch

ESXi-01 ESXi-02 ESXi-03 ESXi-04

172.16.103.182 172.16.103.183 Stretched Cluster 172.16.103.184 172.16.103.185

Switch Switch

Storage
Network Switch Switch

Storage A Storage B

Bluefin01 Frimley01 Frimley01 Bluefin01

Read-Only Read/Write Read-Only Read/Write

Bluefin02 Frimley02 Frimley02 Bluefin02

Read-Only Read/Write Read-Only Read/Write

Bluefin03 Frimley03 Frimley03 Bluefin03

Read-Only Read/Write Read-Only Read/Write

Bluefin04 Frimley04 Frimley04 Bluefin04

Read-Only Read/Write Read-Only Read/Write

Figure 27. Full Data Center Failure Scenario

Result
All VMs were successfully restarted in Bluefin data center.
Explanation
In this scenario, the hosts in Bluefin data center lost contact with the vSphere HA master and elected a new
vSphere HA master. Because the storage system had failed, a take over command had to be initiated on the
surviving site, again due to the NetApp-specific process. After the take over command had been initiated, the
new vSphere HA master accessed the per-datastore files that vSphere HA uses to record the set of protected
VMs. The vSphere HA master then attempted to restart the VMs that were not running on the surviving hosts in
Bluefin data center. In our scenario, all VMs were restarted within 2 minutes after failure and were fully accessible
and functional again.
NOTE: By default, vSphere HA stops attempting to start a VM after 30 minutes. If the storage team does not
issue a takeover command within that time frame, the vSphere administrator must manually start up VMs after
the storage becomes available.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 2 9
VMware vSphere Metro Storage Cluster
Recommended Practices

Summary
When properly operated and architected, stretched clusters are an excellent solution to increase resiliency and
offer intersite workload mobility. There has always been, however, confusion regarding failure scenarios and the
various types of responses from both the vSphere layer and the storage layer. In this white paper, we have tried
to explain how vSphere HA and vSphere DRS respond to certain failures in a stretched cluster environment and
to offer recommendations for configuration of a vSphere cluster in this type of environment. This paper
highlights the importance of site affinity, the role played by vSphere DRS rules and groups, how vSphere HA
interacts with those rules and groups, and how users must ensure that the logic enforced by those rules and
groups is maintained over time to provide the reliability and predictability of the cluster.

About the Authors

Duncan Epping is a chief technologist in the office of the CTO at VMware and is focused on cloud infrastructure
management architecture. He was among the first VMware certified design experts (VCDX 007). He is the
co-author of several books, including the VMware vSphere Clustering Deepdive series and Essential Virtual SAN.
He is the owner and main author of leading virtualization blog yellow-bricks.com.
Follow Duncans blogs at http://www.yellow-bricks.com.
Follow Duncan on Twitter: @DuncanYB.
Lee Dilworth is a principal systems engineer focused on availability and disaster recovery solutions. He has been
with VMware since 2005.
Follow Lee on Twitter: @LeeDilworth.

Acknowledgments
The authors would like to thank Cormac Hogan, Mustafa Uysal, Manoj Krishnan, Matthew Meyer, and
Charu Chaubal for their help.

T E C H N I C A L M A R K E T I N G D O C U M E N TAT I O N / 3 0
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright 2016 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed
at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be
trademarks of their respective companies. Item No: VMW-TMD-vSphr-Mtro Strge Clster-USLET-1.3 Docsource: OIC-FP-1521

VSFT51 Lecture Volume 3 PDF
No ratings yet
VSFT51 Lecture Volume 3 PDF
266 pages
Zerto Virtual Manager Administration Guide PDF
No ratings yet
Zerto Virtual Manager Administration Guide PDF
336 pages
CDOT 8.2.1 NAS Command Layout
No ratings yet
CDOT 8.2.1 NAS Command Layout
1 page
ONTAP 90 SVM Disaster Recovery Express Guide
No ratings yet
ONTAP 90 SVM Disaster Recovery Express Guide
24 pages
INF BCO1436 Vmware Replication Best Practices
No ratings yet
INF BCO1436 Vmware Replication Best Practices
46 pages
Isilon OneFS 7.1 - GUI
No ratings yet
Isilon OneFS 7.1 - GUI
434 pages
Vsphere Vcenter 803 Installation Guide
No ratings yet
Vsphere Vcenter 803 Installation Guide
89 pages
Using EMC VNX Storage With VMware Vsphere
No ratings yet
Using EMC VNX Storage With VMware Vsphere
284 pages
VI Io Guide
No ratings yet
VI Io Guide
413 pages
VMware Virtual SAN PSO Quick Guide
No ratings yet
VMware Virtual SAN PSO Quick Guide
11 pages
Module 9. Deploying and Configuring VSphere Clusters
No ratings yet
Module 9. Deploying and Configuring VSphere Clusters
97 pages
XenServer Storage Deep Dive
No ratings yet
XenServer Storage Deep Dive
44 pages
Setup & Configure VMware VSphere Replication With VMware Site Recovery Manager (SRM) 8.6
No ratings yet
Setup & Configure VMware VSphere Replication With VMware Site Recovery Manager (SRM) 8.6
20 pages
VCF 39 Upgrade
No ratings yet
VCF 39 Upgrade
45 pages
ExaGrid Detailed Product Description - DS 10
No ratings yet
ExaGrid Detailed Product Description - DS 10
35 pages
VMware vSphere 6.0 Overview
100% (1)
VMware vSphere 6.0 Overview
85 pages
Vsphere Esxi Vcenter Server 703 Resource Management Guide
No ratings yet
Vsphere Esxi Vcenter Server 703 Resource Management Guide
176 pages
(VMAX) What Is New With VMAX?: Print Email
No ratings yet
(VMAX) What Is New With VMAX?: Print Email
2 pages
AW - WS1DM - Lab Manual Days 3 - 4
No ratings yet
AW - WS1DM - Lab Manual Days 3 - 4
118 pages
Vmware Backup Presentation
No ratings yet
Vmware Backup Presentation
19 pages
Vmware Vsan Deploy Service: 1. Overview
No ratings yet
Vmware Vsan Deploy Service: 1. Overview
7 pages
NetApp ONTAP 9.8 Configuration Guide
No ratings yet
NetApp ONTAP 9.8 Configuration Guide
85 pages
Deep Security 96 SP1 Install Guide Basic en
No ratings yet
Deep Security 96 SP1 Install Guide Basic en
88 pages
vSphereInstructorCertificationWorkshop V5x1
No ratings yet
vSphereInstructorCertificationWorkshop V5x1
2 pages
Zerto Virtual Replication Installation
No ratings yet
Zerto Virtual Replication Installation
51 pages
Module 1 - Avamar Fundamentals
No ratings yet
Module 1 - Avamar Fundamentals
52 pages
Nimblestorage SMB Solution Guide
No ratings yet
Nimblestorage SMB Solution Guide
23 pages
BRM Support Day - March 2010-Vmware
No ratings yet
BRM Support Day - March 2010-Vmware
120 pages
NexentaConnect VMware VSAN QuickStart InstallGuide 1.0.2 FP2 GA
No ratings yet
NexentaConnect VMware VSAN QuickStart InstallGuide 1.0.2 FP2 GA
15 pages
Vmware Cloud Foundation Edge (Vcfe) : Specific Program Documentation ("SPD")
No ratings yet
Vmware Cloud Foundation Edge (Vcfe) : Specific Program Documentation ("SPD")
4 pages
CISv2 Student Guide - Final PDF
No ratings yet
CISv2 Student Guide - Final PDF
545 pages
Unstructured Data in The Cloud With ECS: Mikhail Vladimirov Senior Sales Engineer
No ratings yet
Unstructured Data in The Cloud With ECS: Mikhail Vladimirov Senior Sales Engineer
32 pages
cr4 2x PDF
No ratings yet
cr4 2x PDF
2,092 pages
h14556 Dell Emc Powermax Vmax All Flash SRDF Metro Overview Best Practices
No ratings yet
h14556 Dell Emc Powermax Vmax All Flash SRDF Metro Overview Best Practices
120 pages
Barracuda Load Balancer Administrator S Guide
No ratings yet
Barracuda Load Balancer Administrator S Guide
108 pages
Module 7. Deploying Virtual Machines
No ratings yet
Module 7. Deploying Virtual Machines
104 pages
Vsphere 8.0 Configuration - Maximums
No ratings yet
Vsphere 8.0 Configuration - Maximums
15 pages
Vsan 703 Administration
No ratings yet
Vsan 703 Administration
126 pages
VMware Daily Checklist
No ratings yet
VMware Daily Checklist
1 page
Nimble Best Practices For VMware
No ratings yet
Nimble Best Practices For VMware
15 pages
Zerto Virtual Replication VSphere Live DR Testing
No ratings yet
Zerto Virtual Replication VSphere Live DR Testing
32 pages
BPG Data Domain Commvault
No ratings yet
BPG Data Domain Commvault
86 pages
VMware Cloud On AWS - Deploy Configure Manage 2021 Lab Manual
No ratings yet
VMware Cloud On AWS - Deploy Configure Manage 2021 Lab Manual
124 pages
Sam Administrator Guide
No ratings yet
Sam Administrator Guide
516 pages
VMware ESX 6.5
100% (1)
VMware ESX 6.5
56 pages
VMW VCP NV Exam Preparation Guide
No ratings yet
VMW VCP NV Exam Preparation Guide
7 pages
Portfolio Comparison - Nutanix Vs VMware by Broadcom
No ratings yet
Portfolio Comparison - Nutanix Vs VMware by Broadcom
5 pages
VMware Optimization Guide: VCF Insights
No ratings yet
VMware Optimization Guide: VCF Insights
12 pages
Dell EMC Avamar For VMware
No ratings yet
Dell EMC Avamar For VMware
164 pages
V Sphere Security@nettrain
No ratings yet
V Sphere Security@nettrain
428 pages
Zerto Virtual Replication Installation Guide For Hyper-V
No ratings yet
Zerto Virtual Replication Installation Guide For Hyper-V
26 pages
VMware CVC Lab 01
No ratings yet
VMware CVC Lab 01
31 pages
VMware Vsphere Storage Appliance Evaluation Guide
No ratings yet
VMware Vsphere Storage Appliance Evaluation Guide
46 pages
VMware NFS Best Practices WP en New
No ratings yet
VMware NFS Best Practices WP en New
23 pages
Whats New VMware Vsphere 50 Storage Technical Whitepaper
No ratings yet
Whats New VMware Vsphere 50 Storage Technical Whitepaper
15 pages
Zerto Virtual Manager VSphere Administration Guide
No ratings yet
Zerto Virtual Manager VSphere Administration Guide
307 pages
ONTAP System Manager Classic Documentation
No ratings yet
ONTAP System Manager Classic Documentation
639 pages
Vmware v2d foundationRA
No ratings yet
Vmware v2d foundationRA
23 pages
Vmware Horizon 7 Enterprise Edition Reference Architecture Multi Site
100% (1)
Vmware Horizon 7 Enterprise Edition Reference Architecture Multi Site
136 pages
CompTIA A Practice Questions Core 1 220 1001 and Core 2 220 1002
100% (2)
CompTIA A Practice Questions Core 1 220 1001 and Core 2 220 1002
18 pages
Top Python Frameworks 2022
No ratings yet
Top Python Frameworks 2022
6 pages
44 Python Questions and Answers Guide
No ratings yet
44 Python Questions and Answers Guide
12 pages
Ics Security Offerings Fact Sheet S508C
No ratings yet
Ics Security Offerings Fact Sheet S508C
5 pages
102-500 LPIC-1 Exam Dumps Download
No ratings yet
102-500 LPIC-1 Exam Dumps Download
10 pages
Lpi Passleader 101 500 Exam Question 2021 Dec 28 by Chisel 176q
No ratings yet
Lpi Passleader 101 500 Exam Question 2021 Dec 28 by Chisel 176q
11 pages
41 Dot Net NET Framework Interview Questions and Answers
No ratings yet
41 Dot Net NET Framework Interview Questions and Answers
8 pages
Maxim Magazine Australia - October 2011
40% (5)
Maxim Magazine Australia - October 2011
132 pages
LPIC-1 101-500 Exam Dumps PDF & VCE
No ratings yet
LPIC-1 101-500 Exam Dumps PDF & VCE
13 pages
Storage Virtualization: IBM SAN Volume Controller
No ratings yet
Storage Virtualization: IBM SAN Volume Controller
41 pages
(IBM Security) IBM Security QRadar Installation Guide
No ratings yet
(IBM Security) IBM Security QRadar Installation Guide
54 pages
Nuts Magazine Sexiest Calendars - 16 September 2011
100% (1)
Nuts Magazine Sexiest Calendars - 16 September 2011
92 pages
(Kaspersky) Kaspersky Anti-Virus 8.0 For Windows Servers Enterprise Edition
No ratings yet
(Kaspersky) Kaspersky Anti-Virus 8.0 For Windows Servers Enterprise Edition
159 pages
(ForeSout) 8.2 Test Drive Demo Guide - FINAL
0% (1)
(ForeSout) 8.2 Test Drive Demo Guide - FINAL
46 pages
(IBM) System Storage SAN Volume Controller Version 6.4.0
No ratings yet
(IBM) System Storage SAN Volume Controller Version 6.4.0
350 pages
(IBM) FlashSystem V9000 Model AC3 With Flash Enclosure Model AE3 Product Guide
No ratings yet
(IBM) FlashSystem V9000 Model AC3 With Flash Enclosure Model AE3 Product Guide
64 pages
(McAfee) McAfee Epolicy Orchestrator 5.10.0 Product Guide 5-3-2020
No ratings yet
(McAfee) McAfee Epolicy Orchestrator 5.10.0 Product Guide 5-3-2020
271 pages
(Kaspersky) Kaspersky Security Center 10.0 Implementation Guide
No ratings yet
(Kaspersky) Kaspersky Security Center 10.0 Implementation Guide
91 pages
Deploying Kali Linux On VMWare Workstation 11
No ratings yet
Deploying Kali Linux On VMWare Workstation 11
9 pages
VMware Vcenter Converter Standalone 5.0 User's Guide
No ratings yet
VMware Vcenter Converter Standalone 5.0 User's Guide
96 pages
Thinapp51 Manual
No ratings yet
Thinapp51 Manual
104 pages
Pivotal Lifecycle Matrix
No ratings yet
Pivotal Lifecycle Matrix
12 pages
Ve6120/Ve6125 Virtual Appliances Installation Guide Vmware® Platform
No ratings yet
Ve6120/Ve6125 Virtual Appliances Installation Guide Vmware® Platform
56 pages
Vsan 703 Planning Deployment Guide
100% (1)
Vsan 703 Planning Deployment Guide
87 pages
Virtual Machines Provisioning and Migration Services
100% (5)
Virtual Machines Provisioning and Migration Services
64 pages
Horizon Manually Creates Optimization Templates
No ratings yet
Horizon Manually Creates Optimization Templates
114 pages
Vi3 Vec 10 Admin Guide
No ratings yet
Vi3 Vec 10 Admin Guide
83 pages
Vmware NSX: Install, Configure, Manage Lab Topology: © 2018 Vmware Inc. All Rights Reserved
No ratings yet
Vmware NSX: Install, Configure, Manage Lab Topology: © 2018 Vmware Inc. All Rights Reserved
32 pages
Vmware Training 06-09-21 Guruprasanth.s
No ratings yet
Vmware Training 06-09-21 Guruprasanth.s
9 pages
Quick Start Guide: Esx Server 3.5 and Virtualcenter 2.5
No ratings yet
Quick Start Guide: Esx Server 3.5 and Virtualcenter 2.5
86 pages
Ise Upgrade Guide Chapter 01
No ratings yet
Ise Upgrade Guide Chapter 01
16 pages
VSICM55 LabConfigSheet Student1A v2
No ratings yet
VSICM55 LabConfigSheet Student1A v2
4 pages
VMware Virtualisation
No ratings yet
VMware Virtualisation
32 pages
Hawkeye 2.4-EA-SP3 Release Notes
No ratings yet
Hawkeye 2.4-EA-SP3 Release Notes
14 pages
Dell EMC Unity Implementation and Administration
No ratings yet
Dell EMC Unity Implementation and Administration
2 pages
NetWorker 8.2 and Service Packs Release Notes
No ratings yet
NetWorker 8.2 and Service Packs Release Notes
167 pages
Introduction To Virtualization and VMware Solutions
No ratings yet
Introduction To Virtualization and VMware Solutions
21 pages
Dell EMC Unity: Unisphere Overview
No ratings yet
Dell EMC Unity: Unisphere Overview
41 pages
RVTools 3.8.6 User Guide
No ratings yet
RVTools 3.8.6 User Guide
101 pages
100 REAL TIME VMWARE Interview Questions and Answers 2017
50% (24)
100 REAL TIME VMWARE Interview Questions and Answers 2017
38 pages
Mastering VMware Vsphere Storage - Sample Chapter
No ratings yet
Mastering VMware Vsphere Storage - Sample Chapter
28 pages
Dell EMC PowerVault ME4 Series and VMware Vsphere
No ratings yet
Dell EMC PowerVault ME4 Series and VMware Vsphere
24 pages
C-27 - CCL Experiment No 2B - Virtual Box
No ratings yet
C-27 - CCL Experiment No 2B - Virtual Box
11 pages
Installation Instructions For BMW DIS in VMware
No ratings yet
Installation Instructions For BMW DIS in VMware
6 pages
VMware Cloud Foundation Planning and Preparation Guide
No ratings yet
VMware Cloud Foundation Planning and Preparation Guide
50 pages
CheckPoint R80.10 ReleaseNotes
No ratings yet
CheckPoint R80.10 ReleaseNotes
27 pages
VMware Dumps 1V0-605 v2015-08-18 by Exampass 37q PDF
No ratings yet
VMware Dumps 1V0-605 v2015-08-18 by Exampass 37q PDF
13 pages
VERINT
No ratings yet
VERINT
14 pages