VMware Virtual SAN
Hyper-converged infrastructure software
Duncan Epping
Chief Technologist
Office of the CTO
Storage & Availability
Agenda
1 Introduction
2 Virtual SAN, what is it?
3 Virtual SAN, a bit of a deeper dive
4 Virtual SAN Recent Enhancements
5 Wrapping up
2
The Software Defined Data Center
• All infrastructure services virtualized:
compute, networking, storage
Management
• Underlying hardware abstracted,
resources are pooled
• Control of data center automated by
software (management, security)
• Virtual Machines are first class citizens
Compute Networking Storage of the SDDC
• Today’s session will focus on one
aspect of the SDDC - storage
3
Hardware
evolution
started the
storage
revolution
Simplicity: Operational / Management
5
The Hypervisor is the Strategic High Ground
VMware vSphere
x86 - HCI SAN/NAS Object Storage
Cloud Storage
6
Storage Policy-Based Management – App centric automation
Virtual Machine Storage policy
Reserve Capacity 40GB
Overview
Availability 2 Failures to tolerate • Intelligent placement
Read Cache 50%
• Fine control of services at VM level
Stripe Width 6
• Automation at scale through policy
• Need new services for VM?
vSphere • Change current policy on-the-fly
• Attach new policy on-the-fly
Storage Policy-Based Management
Virtual Datastore
Virtual SAN Virtual Volumes
7
Storage Policy Based Management – What does it look like?
If the storage can satisfy the VM
Storage Policy, the VM Summary tab
in the vSphere client will display the
VM as compliant.
If not, either due to failures, lack of
resources or other reasons, the VM
will be shown as non-compliant.
8
Virtual SAN,
what is it?
9
Virtual SAN, what is it?
Hyper-Converged Infrastructure
Software-Defined Storage
Distributed, Scale-out Architecture
Integrated with vSphere platform
vSphere & Virtual SAN
Ready for today’s vSphere use cases
10
But what does that really mean?
Exposing a single shared datastore
Virtual SAN
Leveraging local storage resources
VMware vSphere & Virtual SAN Integrated with your Hypervisor
Generic x86 hardware
VSAN network
11
VSAN is the Most Widely Adopted HCI Product
Simplicity is key, on an oil
platform there are no
virtualization, storage or network
admins. The infrastructure is
managed over a satellite link via
a centralized vCenter Server.
Reliability, availability and
predictability is key.
12
Virtual SAN Use Cases
Business End User
Critical Apps Computing DR / DA Test/Dev
DMZ Management Staging ROBO
VMware vSphere + Virtual SAN
13
Broadest Deployment Options from HCI to SDDC
Built on Industry-Leading VMware Hyper-Converged Software (HCS) EVO SDDC
vRealize
Certified Solutions Engineered Appliances NSX
VMware HCS VMware HCS VMware HCS
Virtual SAN + vSphere + vCenter Virtual SAN + vSphere + vCenter Virtual SAN + vSphere + vCenter
Lifecycle Management EVO SDDC Manager
EMC Federation
Virtual SAN Ready Nodes HCI Appliance
Certified Partner
Hardware
14
Tiered Hybrid vs All-Flash
Hybrid All-Flash
Caching
SSD PCIe Ultra DIMM SSD PCIe Ultra DIMM
Read and Write Cache Writes cached first,
Reads from capacity tier
Data
Virtual SAN Persistence
Capacity Tier
Capacity Tier Flash Devices
SAS / NL-SAS / SATA Reads go directly to capacity tier
100K IOPS per Host
40K IOPS per Host +
sub-millisecond latency
15
Really Simple Setup
Manual or
Automatic Disk
Claiming
Deduplication
and
Compression
Enable?
Fault Domains,
2 node or
stretched
cluster?
16
Virtual SAN,
a bit of a deeper dive
17
Virtual Machine as a set of Objects on VSAN
VM Home
• VM Home Namespace
VM Swap • VM Swap Object
VMDK • Virtual Disk (VMDK) Object
Snap delta • Snapshot (delta) Object
• Snapshot (delta) Memory Object
Snap memory
Snapshot
18
Define a policy first…
Virtual SAN currently surfaces multiple storage capabilities to vCenter Server
What If APIs
New
Capabilities
in VSAN 6.2
19
VMDK Object
Virtual SAN Objects and Components RAID-1
VSAN is an object store!
• Object Tree with Branches RAID-0 RAID-0
• Each Object has multiple Components
stripe-1a stripe-2a
– This allows you to meet availability and
stripe-1b
performance requirements stripe-2b
• Here is one example of “Distributed RAID” using Mirror Copy Mirror Copy
2 techniques:
– Striping (RAID-0) ESXi Host ESXi Host
– Mirroring (RAID-1)
• Data is distributed based on VM Storage Policy witness
ESXi Host
20
Number of Failures to Tolerate/Failure Tolerance Method
• Defines the number of hosts, disk or network failures a storage object can tolerate.
• RAID-1 Mirroring used when Failure Tolerance Method set to Performance (default).
• For “n” failures tolerated, “n+1” copies of the object are created and “2n+1” host contributing
storage are required!
esxi-01 esxi-02 esxi-03 esxi-04
~50% of I/O ~50% of I/O
RAID-1
vmdk vmdk witness
Virtual SAN Policy: “Number of failures to tolerate = 1”
21
Assign it to a new or existing VM
When the policy is selected, Virtual SAN uses it to place / distribute the VM to guarantee
availability and Performance
22
Fault Domains, increasing availability through rack awareness
• Create fault domains to increase availability
• 8 node cluster with 4 defined fault domains (2 nodes in each)
FD1 = esxi-01, esxi-02 FD3 = esxi-05, esxi-06
FD2 = esxi-03, esxi-04 FD4 = esxi-7, esxi-08
• To protect against one rack failure only 2 replicas are required and a witness across 3 failure domains!
FD1 FD2 FD3 FD4
RAID-1
esxi-01 esxi-03 esxi-05 esxi-07
esxi-02 esxi-04 esxi-06 esxi-08
vmdk vmdk witness
23
23
Virtual SAN,
Recent Enhancements
24
Deduplication and Compression
RAID 5/6 support VSAN 6.2
Software Checksum March 2016
QoS via IOPS Limits
IPv6
Performance Service
Enhanced Capacity Views
Stretched Cluster
VSAN 6.1 Replication - 5 Minutes RPO
2-node ROBO
September 2015
Health Monitoring & Remediation
All Flash Configuration
64 node VSAN cluster
x2 Hybrid Performance
VSAN 6.0 VSAN Snapshots/Clones
March 2015 Health UI
Rack Awareness
VSAN 5.5
March 2014 25
Today
Virtual SAN – Stretched Cluster
Active-Active data centers
witness
Site Recovery Manager
vSphere & Virtual SAN
• Virtual SAN cluster split across 2 sites!
• Each site is a Fault Domain (FD)
O
RP
in
• Site-level protection with zero data loss
vSphere
witness >5
m
e
nc
y
di
st
a
and near-instantaneous recovery
An
• Support for up to 5ms RTT latency
between data sites
– 10Gbps bandwidth expectation
• Witness VM can reside anywhere
– 200ms RTT latency
VMware vSphere & Virtual SAN – 100Mbps bandwidth required at most
vmdk vmdk • Automated failover
5ms RTT, 10GbE
26
Advanced Troubleshooting with VSAN Health Check
• Cluster Health
• Network Health
• Data Health
• Limits Health
• Physical Disk Health
• Stretched Cluster
• Proactive Tests
27
Virtual SAN 6.2
specifics
28
in All Flash Only
w Beta
Ne 6.2
Deduplication and Compression for Space Efficiency
• Nearline deduplication and compression per disk group level.
– Enabled on a cluster level
– Deduplicated when de-staging from cache tier to capacity tier
– Fixed block length deduplication (4KB Blocks) esxi-01 esxi-02 esxi-03
• Compression after deduplication
– If block is compressed <= 2KB
– Otherwise full 4KB block is stored
vSphere & Virtual SAN
Significant space savings achievable,
making the economics of an all-flash
VSAN very attractive
vmdk vmdk
vmdk
29
in All Flash Only
w
Ne 6.2
RAID-5/6 (Inline Erasure Coding)
• When Number of Failures to Tolerate = 1 and Failure Tolerance Method = Capacity RAID-5
– 3+1 (4 host minimum)
– 1.33x overhead for RAID-5 instead of 2x compared to FTT=1 with RAID-1
• When Number of Failures to Tolerate = 2 and Failure Tolerance Method = Capacity RAID-6
– 4+2 (6 host minimum)
– 1.5x overhead for for RAID-6 instead of 3x compared to FTT=2 with RAID-1
RAID-5
parity data data data
data parity data data
data data parity data
data data data parity
ESXi Host ESXi Host ESXi Host ESXi Host
30
in
w
Ne 6.2
Software Checksum and disk scrubbing
Overview
• End-to-end checksum of the data to detect and resolve silent
disk errors due to faulty hardware/firmware
• Checksum is enabled by default (policy driven)
• If checksum verification fails on a read:
– VSAN fetches the data from another copy in RAID-1
– VSAN recreates the data from other components in RAID-5/6
stripe
Virtual SAN Datastore • Disk scrubbing is run in the background
Benefits
• Provide additional level of data integrity
• Automatic detection and resolution of silent disk errors
31
in
w
Ne 6.2
Other new improvements
Client Cache
• Write through read memory cache
– 0.4% of total host memory, up to 1GB per host
• “Local” to the virtual Machine
• Low overhead, big impact!
Sparse Swap
• Reclaim Space used by memory swap
• Host advanced option enables setting policy for swap to
no space reservation
IOPS limit on object
• Policy driven capability
• Limit IOPS per VM/Virtual Disk
• Eliminate noisy neighbor issues
• Manage performance SLAs
32
in
w
Ne 6.2
Enhanced Virtual SAN Management with New Health Service
Performance Monitoring Capacity Monitoring
Built-in performance monitoring
Health and performance APIs and SDK
Storage capacity reporting
And many more health checks…
33
in
w
Ne 6.2
Performance, Scale and Availability for Any Application
BUSINESS-CRITICAL
A P P L I C AT I O N S
SAP Horizon Oracle
SAP Core Ready Tightly integrated cloud Oracle RAC supported
management
Testing and validated Testing and validated
deployments Bundles Virtual SAN deployments
licenses for lowest
cost VDI storage
34
Wrapping up
35
36
o n
i si
V
VMware Virtual SAN: Generic Object Storage Platform
VMFS Block File Rest
Virtual SAN
VMware vSphere
37