Build A High-Performance Object Storage-as-a-Service Platform With Minio
Build A High-Performance Object Storage-as-a-Service Platform With Minio
Service Providers
Data Center
Executive Summary
What You’ll Find in This Solution
Reference Architecture: This Emerging cloud service providers (CSPs) have an opportunity to build or expand
solution provides a starting point for their storage-as-a-service (STaaS) capabilities and tap into one of today’s fastest-
developing a storage-as-a-service growing markets. However, CSPs who support this market face a substantial
(STaaS) platform based on Minio*.
challenge: how to cost effectively store an exponentially growing amount of
If you are responsible for: data while exposing the data as a service with high performance, scalability
• Investment decisions and and security.
business strategy: You’ll learn
how Minio-based STaaS can File and block protocols are complex, have legacy architectures that hold back
help solve the pressing storage innovation, and are limited in their ability to scale. Object storage, which was
challenges facing cloud service born in the cloud, solves these issues with a reduced set of storage APIs that are
providers (CSPs) today. accessed over HTTP RESTful services. Hyperscalers have looked into many options
• Figuring out how to implement when building the foundation of their cloud storage infrastructure and they all have
STaaS and Minio: You’ll adopted object storage as their primary storage service.
learn about the architecture
components and how they work In this paper, we take a deeper look into an Amazon S3*-compatible object storage
together to create a cohesive service architecture—a STaaS platform based on Minio* Object Storage and
business solution. optimized for Intel® technology. An object storage solution should handle a broad
spectrum of use cases including archival, application data, big data and machine
learning. Unlike other object storage solutions that are built for archival-only use
cases, the Minio platform is built for CSPs to deliver a high-performance cloud
storage alternative to the hyperscalers. The Minio platform offers several benefits:
• H
yperscale architecture that enables multi-data center expansion through
federation
• H
igh performance object store to serve the most demanding cloud-native
workloads
• E
ase of use with non-disruptive upgrades, no tuning knobs and simple support
requirements
• H
igh availability so objects continue to be available despite multiple disk and
node failures
• Security-enabled storage by encrypting each object with a unique key
Minio provides a compelling STaaS object storage platform when combined with
Intel’s broad selection of products and technologies, such as Intel® Solid State
Drive Data Center Family for NVM Express* (NVMe*), Intel® Ethernet products and
Intel® Xeon® Scalable processors, augmented by Intel® Advanced Vector Extensions
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 2
Table of Contents 512 (Intel® AVX-512) single instruction multiple data (SIMD) instructions for x86
architecture. A collaborative open source developer community is also available
Executive Summary . . . . . . . . . . . . . . . . . . 1
for Minio.
High-Performance Object
Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Using the information and reference architecture (see Figure 1) presented here,
Solution Architecture . . . . . . . . . . . . . . . . 2 you can take the next step toward unleashing the power of STaaS.
Minio* Overview and Benefits. . . . . . . . . 3
Minio Object Storage Architecture. . . . 4
Linear Scaling . . . . . . . . . . . . . . . . . . . . . . . . . 4
Erasure Code. . . . . . . . . . . . . . . . . . . . . . . . . . 5 Client Client Client
Minio — The Perfect Fit for Open
Source STaaS. . . . . . . . . . . . . . . . . . . . . . . . . . 5
Better Together: Intel® Technology
Accelerates Minio Performance. . . . . . . 6 Network
Latest Intel Technology Drives
Performance Advantage for Minio. . . . 7
Read and Write Performance Minio* Minio Minio
Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
NVMe*
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
NVMe
Linear Scalability Testing . . . . . . . . . . . . . . 8
Architecture Design Considerations. . . 9 *NVMe = Non-volatile Memory Express
General Guidelines. . . . . . . . . . . . . . . . . . . . . 9
Recommended Configuration. . . . . . . . . . 9 Figure 1. Our tests were run on an eight-node Minio* cluster.
Obtaining Support and Fixing Bugs. . 10
Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
High-Performance Object Storage
Appendix A - System Tuning Details. . . 11 STaaS is the second-fastest growing cloud workload worldwide, representing a
Network and Kernel Tuning USD 4.8 billion annual market1. And yet a mere handful of CSPs control the majority
Parameters in /etc/sysctl.conf . . . . . . 11 of that market. With data growing exponentially every year—by 2025, experts
Setting IRQ Affinity. . . . . . . . . . . . . . . . . . 12 predict that the world will create and replicate 163 zettabytes (ZB) of data2—there
Solutions Proven by Your Peers . . . . . 12 is tremendous potential for emerging CSPs to benefit from the 30-percent annual
Learn More. . . . . . . . . . . . . . . . . . . . . . . . . . 12 growth of STaaS3.
Fueling the growth is an increasing focus on big data applications, Internet of
Things (IoT) and artificial intelligence (AI) workloads. Object storage is the primary
medium for storing big data because it is designed to provide high rates of
throughput, offers excellent data integrity, and has a cost-effective deployment
model. Although file and block storage solutions are being shoehorned into use
with big data workloads, these solutions are limited in their ability to provide what
is required. They were designed for enterprise applications like databases and
file shares. High-performance object storage has a completely different design
goal, which is to provide the extreme rates of throughput required by big data
workloads, along with namespaces that span data centers. Neither file nor block
storage solutions can perform as fast as, nor scale as large as, object storage.
Minio is an object storage solution that provides performance and scalability
without suffering from the compromises of file and block storage. By following
the methods and design philosophy of hyperscale computing providers, Minio
is an object storage solution that provides both high performance and massive
scalability. With Minio, which is optimized for Intel® architecture, you can build a
fast, reliable STaaS platform with the scalability and flexibility you need to thrive
in a data-centric world.
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 3
Solution Architecture
Minio consists of a server, an optional client, and an optional software development kit (SDK):
• Minio Server. Minio is a distributed object storage server released under Apache* License v2.0. It is
compatible with the Amazon S3 API. Minio is feature-complete, providing enterprise-grade encryption,
identity management, access control and data protection capabilities, including erasure code and bitrot
protection.
• Minio Client. Called mc, the Minio Client is a modern and cloud-native alternative to the familiar
UNIX* commands like ls, cat, cp mirror, diff, find and mv. This client provides advanced
functionality that is suitable for web-scale object storage deployments. For example, powerful object
mirroring tools that synchronize objects between multiple sites and tools for generating shared, time
bound links for objects.
• Minio SDKs. The Minio Client SDKs provide simple APIs to access any Amazon S3-compatible object
storage. Minio repositories on Github offer SDKs for popular development languages such as Golang*,
JavaScript*, .Net*, Python* and Java*.
• M
etadata architecture. Minio has no separate metadata store. All operations are performed atomically
at object-level granularity. This approach isolates any failures, containing them within an object, and
prevents spillover to larger system failures. Each object is strongly protected with erasure code and
bitrot hash. You can crash a cluster in the middle of a busy workload and still not lose any data. Another
advantage of this design is strict consistency, which is important for distributed machine-learning and
big-data workloads.
• G
eographic namespace. Multi-data center scaling is no longer limited to hyperscalers. Minio enables
enterprises to adopt a scaling model that starts small and keeps expanding a cluster across racks and
data centers around the globe, using Minio’s federation feature. Minio is deployed in units of scale
where the size is limited by the failure domain.
• C
loud-native design. The multi-instance, multi-tenant design of Minio enables Kubernetes*-like
orchestration platforms to seamlessly manage storage resources just like compute resources. Each
instance of Minio is provisioned on demand through self-service registration. Traditional storage
systems are monolithic and compete with Kubernetes resource management. Minio is lightweight and
container-friendly so you can pack many tenants simultaneously on the same shared infrastructure.
• L ambda* function support. Minio supports Amazon-compatible Lambda event notifications, which
enables applications to be notified of individual object actions such as access, creation and deletion.
The events can be delivered using industry-standard messaging platforms like Kafka*, NATS*, AMQP*,
MQTT*, Webhooks* or a database such as Elasticsearch*, Redis, Postgres* and MySQL*.
Linear Scaling
The designers of the Minio platform believe that building large-scale architectures is best done by
combining simple building blocks. They learned this approach by studying the hyperscalers and their
methods of scaling. A single Minio cluster can be deployed with anywhere from four to 32 nodes and
Minio federation allows many clusters to be joined into a single global namespace (see Figure 2).
There are multiple benefits to Minio’s cluster and federation architecture:
• Each node is an equal member of a Minio cluster. There is no master node.
• Each node can serve requests for any object in the cluster, even concurrently.
• Each cluster uses a Distributed Locking Manager (DLM) to manage updates and deletes to objects.
• The performance of an individual cluster remains constant as you add more clusters to the federation.
• Failure domains are kept within the cluster. An issue with one cluster does not affect the entire
federation.
When deploying a cluster, it is recommended that you use a programmable domain name service (DNS),
such as coreDNS*, to route HTTP(S) requests to the appropriate cluster. Also, use a load balancer to
balance the load across the servers in a cluster. Global configuration parameters can be stored and
managed in etcd (an open-source distributed key-value store).
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 5
Figure 2. Minio* federation provides heterogeneous scalability and a planet-scale namespace with
federation; the failure domain is limited to 32 servers maximum.
Erasure Code
Minio protects the integrity of object data with erasure coding and bitrot protection checksums (see
Figure 3). Erasure code is a mathematical algorithm used to reconstruct missing or corrupted data. By
applying a Reed-Solomon code to shard objects into data and parity blocks, and hashing algorithms to
help protect individual shards, Minio is able to guard against hardware failures and silent data corruption.
Erasure code helps protect data without the high storage overhead of using RAID configurations or data
replicas. For example, RAID-6 helps protect only against a two-drive failure whereas erasure code allows
Minio to continue to serve data even with the loss of up to 50 percent of the drives and 50 percent of
the servers. Minio applies erasure code to individual objects, which allows the healing of one object at a
time. For RAID-protected storage solutions, healing is done at the RAID volume level, which impacts the
performance of every file stored on the volume until the healing is completed.
P P
P P
are thoroughly reviewed by the Minio engineering team before deciding whether to commit the change
into the master branch. The large size of the Minio community also helps the Minio engineering team to
quickly detect source code errors and potential intrusion vulnerabilities.
Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) software optimizes core
storage functions. Storage Performance Development Kit (SPDK) is used to optimize
application I/O. Additional software: Intel® Cache Acceleration Software (Intel® CAS),
Intel® Memory Drive Technology, Intel® Virtual RAID on CPU (Intel® VROC).
Intel® networking products offer remote direct memory access (RDMA) for data
access across multiple systems at speeds very close to local access.
Processors are optimized for data throughput, with increases in memory, I/O and
instructions applicable to storage environments.
Figure 4. From hardware to software, Intel offers many technologies that can benefit storage-as-a-
service (STaaS) solutions.
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 7
Object size 10 MB 20 MB 32 MB 64 MB
The read performance of the eight-node Minio cluster shows a sustained level of approximately 20
GB/second for all object sizes (see Figure 5). This performance is approximately half of the aggregated
bandwidth of all eight servers4. Based on these results, it is clear that Minio is able to fully utilize all
available bi-directional bandwidth on the single 40 GbE NIC port in the servers.
20.0
THROUGHPUT GB/S
15.0
10.0
5.0
10 MB 20 MB 32 MB 64 MB
0.0
OBJECT SIZE
Figure 5. The Minio* cluster achieved almost 20 GB/s, regardless of object size (10 MB, 20 MB, 32 MB
and 64 MB).
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 8
During object writes, the Minio server calculates erasure codes, parity blocks and bitrot hash, then
distributes this data across multiple disks and across multiple servers. When the COSBench test was run
with large object sizes of 32 MB and 64 MB, the write performance approached 10 GB/s (see Figure 6). With
smaller object sizes, the performance achieved was 5 GB/sec (10 MB object) and 7 GB/sec (20 MB object).
10.0
THROUGHPUT GB/S
8.0
6.0
4.0
2.0
10 MB 20 MB 32 MB 64 MB
0.0
OBJECT SIZE
These test results demonstrate that Minio clusters increase read and write performance linearly as
the number of servers in the Minio cluster is increased (see Figure 7). One reason for this linear scaling
capability is a design that stores metadata with the object itself instead of using a metadata database
server. Minio was purposely designed to avoid the bottleneck that a central metadata database can
create. As a result, there is no need to consider the performance, or lack of performance, of a central
metadata database when scaling Minio object storage.
Scalability beyond a single cluster is achieved by federating multiple clusters together to create a global
namespace. Expansion of the federated namespace is possible simply by adding another Minio cluster,
whether the servers are located in the same data center, or in another data center across the globe.
20.0
THROUGHPUT GB/S
15.0
10.0
5.0
0.0
2 4 6 8 10
CLUSTER SIZE
Figure 7. Minio* read and write performance increases linearly as the cluster size increases.
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 9
General Guidelines
A high-performance Minio object storage cluster is achieved only when the underlying components are
capable of providing high performance. Using Intel Xeon processors, Intel SSD Data Center Family for
NVMe and high-speed Intel® networking products makes delivering high performance possible. Before
describing the specific reference architecture for Minio, here are several concepts common to Minio
deployments:
• S
erver architecture. The foundation of a Minio cluster starts with a high-performance server
architecture. Intel Xeon processor-based platforms combined with a high-bandwidth PCIe bus
provides a strong foundation.
• Memory. 128 to 192 GB of RAM per server provides sufficient memory for Minio.
• D
isk. Minio servers can take advantage of as many disk drives as can be deployed in a server.
The choice of disk type depends on the type of Minio cluster you want to create. For the highest
performance Minio cluster with the highest throughput, choose NVMe-based SSDs. In most cases,
SSDs provide the perfect balance of performance and cost when compared to low-performing hard
disk drives (HDDs).
• Network. The Minio servers communicate between servers and clients using Ethernet. For this
reference architecture we used a single 40 GbE Intel Ethernet Network Adapter per server. However,
Minio can take advantage of multiple NICs and network speeds up to 100 GbE. Some deployments
use a separate NIC for internode communication. Note that the top-of-rack switch should be sized
appropriately to support the maximum throughput speeds that you expect to achieve.
• Failure domain. Minio clusters can include up to 32 servers. This means it is possible to build a large
and dense Minio cluster with tens of petabytes (PBs) of object storage in a single rack. However, we
recommend starting with a cluster that keeps the failure domain small, and then scale by federating
clusters into a global namespace. A moderately sized cluster is in the range of 8 to 16 nodes.
• Server chassis. Special care should be taken to ensure that the server chassis layout is balanced and
sufficient PCIe* bandwidth is allocated for the devices.
• Hardware support and personnel costs. Minio uses erasure code parity calculations to provide
data durability. The number of drives that are used for parity writes for each object can be as high as
N/2. This parity level means you can lose up to 50 percent of the drives and continue to serve data.
Therefore, you can balance hardware support and personnel costs by changing your operational
model and elect to ignore disk failures. This can reduce your operational costs and avoids human
error when replacing failed disks.
Although we did not test them, there are several ways that performance could be increased:
• Add an additional dedicated 40 GbE NIC for the Minio internode communications.
• Keep the existing network configuration but add additional 40 GbE NIC ports to a multi-chassis link
aggregation (MLAG) pair.
• Or, if 100 GbE is desired, up to 4x25 GbE NIC ports could be combined into a link aggregation group (LAG).
• Cache remote S3 objects for faster access using an Intel Optane SSD. Once caching is configured, a
remote object will be cached locally once fetched. Each subsequent request for that object gets served
directly from the Minio cache until it expires.
Recommended Configuration
This performance test used an eight-node Minio cluster (see Figure 1). Although we used eight servers,
Minio distributed clusters can be deployed on anywhere from four to 32 servers. The configuration
details for our eight-node cluster are provided in Table 3. See Appendix A for tuning details.
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 10
Component Details
Summary
STaaS offerings can help you expand your customer base and achieve economies of scale. SDS solutions
such as Minio, running on Intel architecture and technology, provide many benefits, including:
• Improved resource utilization and performance
• Shortened application time to market
• Efficiencies associated with automated deployment and operations
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 11
Minio is easy to use and fast enough to support transactional workloads such as
big data analytics, streaming workloads, AI and machine learning. The reference
architecture shown here will enable you to confidently offer a simple yet powerful
STaaS solution using Minio. As a result, you can successfully compete against much
larger CSPs and tap into a marketplace worth billions of dollars.
###---------------------------------------------------------###
### Settings to tune 40Gb NICs and system perf.
###---------------------------------------------------------###
kernel.pid _ max=4194303
fs.fle-max=4194303
vm.swappiness = 1
vm.vfs _ cache _ pressure = 10
net.core.rmem _ max=268435456
net.core.wmem _ max=268435456
net.core.rmem _ default=67108864
net.core.wmem _ default=67108864
net.core.netdev _ budget=1200
net.core.optmem _ max=134217728
net.ipv4.tcp _ rmem=67108864 134217728 268435456
net.ipv4.tcp _ wmem=67108864 134217728 268435456
net.ipv4.tcp _ low _ latency=1
net.ipv4.tcp _ adv _ win _ scale=1
net.core.somaxconn=65535
net.core.netdev _ max _ backlog=250000
net.ipv4.tcp _ max _ syn _ backlog=30000
net.ipv4.tcp _ max _ tw _ buckets=2000000
net.ipv4.tcp _ tw _ reuse=1
net.ipv4.tcp _ tw _ recycle=1
net.ipv4.tcp _ fn _ timeout=5
net.ipv4.udp _ rmem _ min=8192
net.ipv4.udp _ wmem _ min=8192
net.ipv4.conf.all.send _ redirects=0
net.ipv4.conf.all.accept _ redirects=0
net.ipv4.conf.all.accept _ source _ route=0
net.ipv4.tcp _ mtu _ probing=1
vm.min _ free _ kbytes=1000000
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 12
An STaaS platform, powered by Intel technology, provides high performance and • I ntel® Solid State Drives
easy manageability. This and other solutions are based on real-world experience • I ntel® Converged Network
gathered from customers who have successfully tested, piloted, and/or deployed Adapters
the solutions in specific use cases. The solutions architects and technology experts
• I ntel® Rack Scale Design
for this solution reference architecture include:
• Daniel Ferber, Solutions Architect, Intel Sales & Marketing Group
• Karl Vietmeier, Solutions Architect, Intel Sales & Marketing Group
Intel Solutions Architects are technology experts who work with the world’s largest
and most successful companies to design business solutions that solve pressing
business challenges. These solutions are based on real-world experience gathered
from customers who have successfully tested, piloted, and/or deployed these
solutions in specific business use cases.
Find the solution that is right for your organization. Contact your Intel
representative or visit intel.com/CSP.
Reference Architecture | Build a High-Performance Object Storage-as-a-Service Platform with Minio* 13
1
I DC, 2017 H1, “Worldwide Semiannual Public Cloud Services Tracker.”
https://www.idc.com/tracker/showproductinfo.jsp?prod_id=881
2
eagate, April 2017, “Data Age 2025.”
S
https://www.seagate.com/files/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf
3
usinessWire, 2016, “Global Storage as a Service Market 2016-2020 - Market to Grow at a CAGR of 29.59% - Research and Markets.”
B
https://www.businesswire.com/news/home/20160715005348/en/Global-Storage-Service-Market-2016-2020---Market
4
Calculation: (8 servers * 40 Gbit * 50% = 160 Gbit/s = 20 GB/sec)
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation.
Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system
manufacturer or retailer or learn more at intel.com
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
For more complete information visit www.intel.com/benchmarks.
Configurations: See Table 3 for details
Performance results are based on Minio and Intel testing as of February 14th, 2019 and may not reflect all publicly available security updates.
See configuration disclosure for details. No product or component can be absolutely secure.
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that
are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations.
Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific
to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice.
Notice Revision #20110804
Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data
are accurate.
All information provided here is subject to change without notices. Contact your Intel representative to obtain the latest Intel product
specifications and roadmaps.
Intel, the Intel logo, Xeon, and Optane are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
© Intel Corporation 0319/JS/CAT/PDF 338821-001EN