Data Efficiency PDF
Data Efficiency PDF
Copyright
Copyright 2019 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws.
Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other
marks and names mentioned herein may be trademarks of their respective companies.
Copyright | 2
Data Efficiency
Contents
1. Executive Summary.................................................................................4
2. Introduction.............................................................................................. 5
2.1. Audience.........................................................................................................................5
2.2. Purpose.......................................................................................................................... 5
5. Conclusion..............................................................................................24
Appendix..........................................................................................................................25
About Nutanix...................................................................................................................... 25
List of Figures................................................................................................................ 26
List of Tables.................................................................................................................. 27
3
Data Efficiency
1. Executive Summary
Nutanix designed the Enterprise Cloud from the ground up to provide best-in-class reliability,
consumer-grade simplicity, and peerless efficiency. In this technical note, we discuss how
the Acropolis Distributed Storage Fabric (DSF) exhibits data avoidance and efficiency using
techniques such as thin provisioning, intelligent cloning, compression, deduplication, and erasure
coding. These techniques accelerate application performance and optimize storage capacity.
They are intelligent and adaptive, requiring little or no fine-tuning in most cases, which reduces
operating expenses and frees your IT staff to focus on growth and innovation. Unlike traditional
storage architectures, the Nutanix web-scale design ensures that data efficiency techniques
scale as the cluster grows—node by node, with no bottlenecks, single points of failure, or
expensive proprietary hardware and software.
1. Executive Summary | 4
Data Efficiency
2. Introduction
2.1. Audience
This tech note is part of the Nutanix Solutions Library. We wrote it for IT architects and
administrators responsible for designing, managing, and supporting Nutanix infrastructures.
Readers should already be familiar with the Nutanix Enterprise Cloud.
2.2. Purpose
This document discusses how the Nutanix Enterprise Cloud provides data efficiency, including an
introduction to the following features:
• Data transformations and utilization.
• Thin provisioning.
• Intelligent cloning.
• Compression.
• Deduplication.
• Erasure coding.
Version
Published Notes
Number
1.0 November 2017 Original publication.
1.1 July 2019 Updated Nutanix overview.
2. Introduction | 5
Data Efficiency
In the following figure, we provide an example of the data reduction achieved using compression,
deduplication, and erasure coding, as well as the overall efficiency resulting from additional data
avoidance techniques. Capacity optimization metrics are available via Prism on the Storage
Overview page and for individual containers on their detail pages.
Thin Provisioning
Thin provisioning is a simple and broadly adopted technology for increasing data capacity
utilization by overcommitting resources. The DSF enables this feature in all containers by default.
In deployments using the VMware ESXi hypervisor, containers are presented to hosts as natively
thin-provisioned NFS datastores. Although it is a widely accepted method for increasing capacity
utilization, thin provisioning traditionally has been associated with reduced storage performance.
However, on Nutanix, thin provisioning outperforms thick provisioning and is recommended for all
workloads.
Some applications, such as Oracle RAC and vSphere Fault Tolerance, require thick provisioning.
The DSF supports thick provisioning (eager zero or lazy zero thick) VMDKs via the VMware API
for Array Integration (VAAI) NAS reserve space primitive. Eager-zeroed thick VMDKs guarantee
space reservations in the DSF but do not actually write data. Instead, the DSF acknowledges
every I/O and performs a simple metadata update.
Calculations for the Overall Efficiency metric account for savings from thin provisioning.
Administrators can view current thick provisioned capacity via Prism on the Storage Container
Details page.
Intelligent Cloning
The DSF provides native support for space-efficient, offloaded VM clones, which you can choose
to provision automatically via VAAI, View Composer for Array Integration (VCAI), and Microsoft
Offloaded Data Transfer (ODX), or interactively via nCLI, REST, or Prism. Clones take advantage
of the redirect-on-write algorithm, which is the most effective and efficient implicit virtual disk
sharing technique.
On the Nutanix platform, VMs store data as virtual disk files (vDisks). Each vDisk is composed
of logically contiguous chunks of data called extents. These extents are stored in physically
contiguous groups as files on storage devices. When you clone a VM, the system marks the base
vDisk read-only and creates another vDisk as read and write. At this point, both vDisks have the
same block map, which is a metadata mapping of the vDisk to its corresponding extents.
The following figure shows what happens when you clone a VM.
The system uses the same method for multiple clones of a VM or vDisk. Clone operations are
metadata only, so no I/O takes place. You can also create clones of clones the same way;
essentially, the previously cloned VM acts as the base vDisk. On cloning, the system locks the
base vDisk’s block map and creates two clones, with one block map for the previously cloned VM
and another block map for the new clone. There is no maximum number of clones.
In the following figure, both clones inherit the prior block map, and the system creates new
individual block maps.
Any new metadata write or update occurs in the new individual block maps.
Any subsequent clones lock the original block map and create a new one for read and write
access.
Calculations for the Overall Efficiency metric account for intelligent cloning savings. You can view
this metric via Prism on the Storage Container Details page.
Compression
The DSF provides both inline and post-process compression to suit the customer’s data types
and application usage patterns. The inline method compresses sequential streams of data or
large I/O sizes (more than 64 KB) as they’re written to the capacity store (extent store), while
post-process compression initially writes the data in an uncompressed state and then uses the
Curator framework to compress it cluster-wide. A Nutanix system compresses all incoming write
I/O operations over 4 KB inline in the persistent write buffer (oplog). This approach enables you
to use oplog capacity more efficiently and helps drive sustained performance. From AOS 5.1
onward, post-process compression is enabled by default for newly created containers.
The DSF uses LZ4 and LZ4HC algorithms for data compression in AOS 5.0 and beyond. On
initial ingest, regular data is compressed using LZ4, which provides a very good balance between
compression and performance. As data cools, LZ4HC further compresses it to improve the
compression ratio.
We can characterize cold data into two main categories:
• Regular data: No read or write access for three days.
• Immutable data (snapshots): No read or write access for one day.
The following figure shows how inline compression interacts with the DSF write I/O path.
Tip: Almost always use inline compression (compression delay=0) because this
setting only compresses larger or sequential writes and does not impact random write
performance. In fact, inline compression typically increases effective performance by
increasing the usable size of the SSD tier. In addition, when the system replicates
larger or sequential data for protection, it can send compressed data, further
increasing performance because less data crosses the wire. Inline compression also
pairs perfectly with erasure coding.
When you enable post-process compression, all new write I/O follows the normal DSF I/O
path without compression. After the data meets the compression delay setting (which you can
configure in Prism), it is eligible for compression in the extent store. Post-process compression
tasks use the Curator MapReduce framework to distribute work across all nodes. Automatic
resource adjustments between front-end or user-driven and back-end operations ensure that
post-process compression tasks do not limit the performance of user applications.
The following figure shows how post-process compression interacts with the DSF write I/O path.
For read I/O operations on compressed data, the system first decompresses the data in memory,
then serves the I/O. Heavily accessed data is decompressed in the extent store and then moves
to the cache.
The following figure shows how decompression interacts with the DSF I/O path during read
operations.
To view current compression rates in Prism, hover over the Compression setting on the Storage
Container Details page.
Erasure Coding
The Nutanix platform relies on a replication factor for data protection and availability. This method
provides the highest degree of availability because it does not require data recomputation on
failure or reading from more than one storage location. However, because this type of data
protection requires full copies, it uses additional storage resources.
To provide a balance between availability and storage capacity consumption, the DSF offers the
ability to encode data using erasure codes (EC-X). Similar to the concept of RAID (levels 4, 5, 6,
and so on), EC-X encodes a strip of data blocks on different nodes and calculates parity. In the
event of a host or disk failure, the system can use the parity to decode any missing data blocks.
In the DSF, the data block is an extent group, and each data block must be on a different node
and belong to a different vDisk.
You can configure the number of data and parity blocks in a strip based on how many failures
you need to tolerate, or your preferred level of fault tolerance. The configuration is the <number
of data blocks> / <number of parity blocks>. For example, “replication factor 2–like” availability (n
+ 1) could consist of three or four data blocks and one parity block in a strip (in other words, 3:1
or 4:1). “Replication factor 3–like” availability (n + 2) could consist of three or four data blocks and
two parity blocks in a strip (in other words, 3:2 or 4:2).
You can calculate maximum usable capacity as <number of data blocks> / <number of total
blocks>. For example, a 4:1 strip has 80 percent usable capacity, and a 4:2 strip has 66 percent
usable capacity. As the total strip size increases, the resulting usable capacity benefits diminish.
The following table characterizes usable capacity using different replication factors and encoded
strip sizes.
Cluster
Replication EC- EC- EC- Replication EC- EC-
Size RAW
Factor 2 X 2:1 X 3:1 X 4:1 Factor 3 X 3:2 X 4:2
(Nodes)
“Replication Factor 2–Like” “Replication Factor 3–Like”
4 80 TB 40 TB 53 TB N/A N/A 27 TB N/A N/A
5 100 TB 50 TB 67 TB 75 TB N/A 33 TB N/A N/A
6 120 TB 60 TB 80 TB 90 TB 96 TB 40 TB 72 TB N/A
105 112
7 140 TB 70 TB 93 TB 47 TB 84 TB 93 TB
TB TB
Tip: Nutanix recommends that you always maintain a cluster size that is at least
one node greater than the combined strip size (data + parity) to allow space for
rebuilding the strips if a node fails. This sizing eliminates any computation overhead
on reads once the strips have been rebuilt (a process that the Nutanix Curator
service automates). For example, a cluster with a 4:1 strip should have at least six
nodes. The table above follows this best practice.
The Nutanix platform invokes EC-X post-process on write-cold data, using the Curator
MapReduce framework for task distribution. Because MapReduce is a post-process framework, it
does not affect the traditional write I/O path, and automatic resource adjustments between front-
end or user-driven and back-end operations ensure that EC-X tasks do not limit the performance
of user applications.
Administrators can change the replication factor for containers with EC-X enabled. This option
provides greater flexibility to help customers to achieve their desired level of data protection
during the application life cycle. Containers can transition efficiently between replication factor
2 and replication factor 3, enabling the system to create or remove a resulting parity block
automatically without additional read-modify-write operations.
As of AOS 5.1, the Nutanix platform automatically increases or decreases the size of existing
erasure-coded strips during node additions and removals. This automation ensures that
protection overhead remains limited after node removals and that capacity utilization continues
to be optimized as cluster size increases. For example, 53 TB usable capacity across four nodes
using 2:1 strips increases to 64 TB when the system dynamically increases the strip size to 4:1
after meeting the six-node minimum requirement.
Tip: You can override the default strip size (4:1 for “replication factor 2–like” or 4:2
for “replication factor 3–like”) using the nCLI, where N is the number of data blocks
and K is the number of parity blocks.
ctr [create/edit] … erasure-code=<N>/<K>
The following figure illustrates the data layout of a normal environment using a replication factor.
In this scenario, we have a mix of both replication factor 2 and replication factor 3 data whose
primary copies are local and whose replicas are distributed to other nodes throughout the cluster.
When Curator runs a full scan, it finds eligible write-cold extent groups (data not written or
overwritten for seven days) that are available for encoding. After Curator finds the eligible
candidates, Chronos distributes and throttles the encoding tasks.
The following figure shows an example 4:1 and 3:2 strip.
Once the system has created the strips and calculated parity to encode data, it removes the
replica extent groups.
The following figure shows the environment and storage savings after EC-X is complete.
To view current EC-X savings via Prism, hover over the Erasure Coding setting on the Storage
Container Details page.
Tip: EC-X pairs perfectly with inline compression; you can safely enable them
together for maximum efficiency.
Fingerprinting occurs during ingest of data with an I/O size of 64 KB or greater (either initial I/O or
when draining from the oplog). The engine uses Intel acceleration for the SHA-1 computation to
reduce CPU overhead. In scenarios where fingerprinting does not occur at ingest (for example,
with smaller I/O sizes), it is completed as a background process. As the Curator MapReduce
framework identifies duplicate data (multiple copies of the same fingerprints), it removes the
duplicates. Automatic resource adjustments between front-end or user-driven and back-end
operations ensure that deduplication tasks do not limit the performance of user applications.
Cache deduplication increases storage efficiency and can also improve performance by enabling
larger effective cache sizes. With deduplication enabled, initial read requests enter the DSF
unified cache at a 4 KB granularity. Any subsequent requests for data with the same fingerprint
pull directly from the cache.
The following figure illustrates how the Elastic Deduplication Engine interacts with the DSF I/O
path.
To view the current deduplication savings via Prism, hover over the Capacity and Cache
Deduplication setting on the Storage Container Details page.
5. Conclusion
The Nutanix Enterprise Cloud incorporates an array of powerful data-efficiency techniques that
are agile and scalable. Thin provisioning and intelligent cloning, which are native data-avoidance
technologies, provide significant space savings and very fast snapshots and replicas. Data-
reduction features such as deduplication and compression generate dramatic performance
improvements while also increasing the cluster’s effective storage capacity—all with extremely
low computational overhead. The Nutanix erasure coding (EC-X) algorithm delivers predictable
storage efficiency across all data types and workloads, with no impact on performance. The
Nutanix data-efficiency features are intelligent, adaptive, and above all simple to use, extending
the Nutanix commitment to ushering in the era of invisible infrastructure.
For more information about data efficiency with Nutanix or to review other Nutanix technical
documents, please visit the Nutanix website.
5. Conclusion | 24
Data Efficiency
Appendix
About Nutanix
Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that
power their business. The Nutanix Enterprise Cloud OS leverages web-scale engineering and
consumer-grade design to natively converge compute, virtualization, and storage into a resilient,
software-defined solution with rich machine intelligence. The result is predictable performance,
cloud-like infrastructure consumption, robust security, and seamless application mobility for a
broad range of enterprise applications. Learn more at www.nutanix.com or follow us on Twitter
@nutanix.
Appendix | 25
Data Efficiency
List of Figures
Figure 1: Nutanix Enterprise Cloud................................................................................... 6
26
Data Efficiency
List of Tables
Table 1: Document Version History................................................................................... 5
27