0% found this document useful (0 votes)

64 views4 pages

The Bathtub Curve and Data Center Equipment Reliability

The document discusses how reliability data from over 500,000 devices collected over 15 years by Service Express shows that the failure rates of data center equipment do not always follow the traditional Bathtub Curve model. The data showed failure rates for critical servers, non-critical servers, and storage components remained relatively stable over time rather than increasing at the end of the product life cycle as the Bathtub Curve would predict. The data supports refreshing equipment every 7-10 years based on compatibility, capacity and reliability needs rather than assuming higher failure risks later in the lifespan.

Uploaded by

Asrar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views4 pages

The Bathtub Curve and Data Center Equipment Reliability

Uploaded by

Asrar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

The Bathtub Curve and Data Center Equipment

Reliability
datacenterfrontier.com/bathtub-curve-data-center-equipment/

Voices of the Industry

When it’s a question of spending tens of thousands of dollars on a refresh, you should
evaluate your needs and access the facts to make the right decision for your
environment. (Photo: Service Express)

Jake Blough, Chief Technology Officer for Service Express, explores the Bathtub Curve
theory, its limitations, and data center equipment reliability and maintenance.

When digging into reliability engineering theories, you

will quickly find the widely used Bathtub
Curve. According to this theory, when a product is new
to the market, there are substantial rates of early
failures – which commonly result from an error with
handling or installation. As the end of
product life approaches, the rate increases due to a
second and final wave of wear-out failures. Although
the Bathtub Curve, pictured below, accurately
reflects the failure behavior of many
products, we have found it does
not universally apply to data center Jake Blough, Chief Technology
Officer, Service Express
equipment.

Examining Reliability Data

1/6
At Service Express, we’ve collected over 15 years of equipment data from over half a
million devices. The data tracks when equipment breaks, how it breaks and how often it
breaks. The common assumption is that these devices should have a higher failure rate
in their infancy and then again toward the end of life. However, in looking at non-
critical and critical server and storage failures, our data shows that equipment failure
rates do not follow the Bathtub Curve as expected.

Critical Server Failures

A critical failure occurs when something like a CPU or system board fails. Critical server
failures result in the loss of access to applications or data – impacting business
productivity. In the graph below, you will see that most machines exhibit a
failure rate between 0% and 0.2% with an outlier having an early
production issue of 0.3%. These rates stay almost identical over a 10-15-
year life span.

Non-Critical Server Failures

A non-critical failure occurs when a component like a disk drive or power supply fails.
Modern data center equipment has built-in redundancy for these components, so no
loss of data or access occurs in these instances. In the graph below, you will see a data
set tracking non-critical server failure of several models over 13 years.

2/6
You can see that the non-critical failures barely increase over time with a
failure rate of less than 0.5%; this is consistent with the number of
components installed in the system. The more components in a system, the more
chances of part vulnerability. The increase in failures toward the end of life seen here is
attributed to the number of components in the system versus the wear-out factor
associated with the Bathtub Curve. Systems in a blade form factor show much lower
non-critical instances compared to large 4U 4-CPU form factor systems.

Critical & Non-Critical Storage Failures

Storage devices are comprised of three types of components including critical, non-
critical and disk drives. Critical parts typically include storage processors, whereas non-
critical parts include cache batteries, power supplies and fans.

Storage systems are built to be incredibly resilient and tolerant of multiple failures
before data is impacted. We consider storage processors to be the most critical
components as a loss of a service processor will affect overall performance. In the graph
above you can see critical, non-critical and drive failures for a popular OEM storage
system. Note that over five years, critical storage failures occur between 0.1%
and 0.2% – resulting in about one failure out of 1,000 systems per month.
Non-critical faults are typically caused by cache battery sets which must be
replaced every 3-5 years.
3/6
Disk Drive Failures
The graph above represents data for all disk drive failures over six years. You can see
that disk drives experience a failure rate between 0.2% to 0.3%. Meaning
that over time, disk drives are far more resilient than “common
knowledge” would have you believe.

The long-term equipment reliability as illustrated by the data is a source of good news
for IT departments. This failure data counters the traditional recommendation for a
hardware refresh based on the expectation of increased failures as equipment ages. You
can factor in longer equipment reliability and cost-savings when considering the timing
of your refresh.

Your Next Data Center Refresh

Of course, there are valid reasons for taking on the cost and
time of a hardware refresh. Primary factors that should
determine when a hardware upgrade is needed include:

Software compatibility
Hardware compatibility between devices
Performance capacity has been exceeded

If your equipment is meeting your immediate needs,

consider delaying your refresh instead. Delaying an unneeded refresh can help you
reduce your CapEx spend and improve the value of your original investment.

When it’s a question of spending tens of thousands of dollars on a refresh, you should
evaluate your needs and access the facts to make the right decision for your
environment. Based on our reliability data that shows stable failure rates over time for
server and storage equipment, we recommend a refresh every 7-10 years. Your refresh
cycle should always be driven by compatibility, capacity and reliability.

Jake Blough is the Chief Technology Officer for Service Express.

4/6

Disk Failures in The Real World: What Does An MTTF of 1,000,000 Hours Mean To You?
No ratings yet
Disk Failures in The Real World: What Does An MTTF of 1,000,000 Hours Mean To You?
16 pages
Equipment Criticality1
No ratings yet
Equipment Criticality1
6 pages
Advanced Maintenance and Reliability Best Practices 1707784686
No ratings yet
Advanced Maintenance and Reliability Best Practices 1707784686
12 pages
Disk Storage Reliability Insights
No ratings yet
Disk Storage Reliability Insights
12 pages
Storage Reliability Assessment
No ratings yet
Storage Reliability Assessment
7 pages
PM For Ageing Equipment
No ratings yet
PM For Ageing Equipment
16 pages
Equipment Criticality White Paper
No ratings yet
Equipment Criticality White Paper
6 pages
Tent Steps To Avoid Unnecesary Plant Shutdowns
No ratings yet
Tent Steps To Avoid Unnecesary Plant Shutdowns
44 pages
Acmtos 13
No ratings yet
Acmtos 13
26 pages
Higher: Achieving A Level of
No ratings yet
Higher: Achieving A Level of
2 pages
Data Center Availability Strategies
No ratings yet
Data Center Availability Strategies
5 pages
Attributes of An Effective Maintenance Program For Data Center Physical Infrastructure
No ratings yet
Attributes of An Effective Maintenance Program For Data Center Physical Infrastructure
12 pages
Upkeep - Asset Criticality
100% (2)
Upkeep - Asset Criticality
29 pages
Eee 535
No ratings yet
Eee 535
51 pages
An Introduction To Equipment Failure Patterns
No ratings yet
An Introduction To Equipment Failure Patterns
5 pages
Advanced Predictive Maintenance Guide
No ratings yet
Advanced Predictive Maintenance Guide
10 pages
Machine Condition Monitoring Guide
No ratings yet
Machine Condition Monitoring Guide
65 pages
MECH826 Week01 MaintenanceStrategies
No ratings yet
MECH826 Week01 MaintenanceStrategies
65 pages
The Security Risk of Database Deployment Failures
No ratings yet
The Security Risk of Database Deployment Failures
29 pages
10 Reliability Maintenance
No ratings yet
10 Reliability Maintenance
97 pages
Bathtub Curve
No ratings yet
Bathtub Curve
4 pages
Maintenace and Safety (Unit-I)
No ratings yet
Maintenace and Safety (Unit-I)
15 pages
Elevating Maintenance Reliability Practices Financial Business Case
No ratings yet
Elevating Maintenance Reliability Practices Financial Business Case
14 pages
Disk Failures
No ratings yet
Disk Failures
13 pages
Overview - Reliability Centered Maintenance RCM
No ratings yet
Overview - Reliability Centered Maintenance RCM
47 pages
Utilising Holistic Condition Monitoring On MV/HV Motors To Reduce Unplanned Downtime
No ratings yet
Utilising Holistic Condition Monitoring On MV/HV Motors To Reduce Unplanned Downtime
22 pages
Reliability Engineering Basics
No ratings yet
Reliability Engineering Basics
21 pages
Failure Trends in Hard Drives
No ratings yet
Failure Trends in Hard Drives
13 pages
Reliability in Petrochemical Ops
100% (8)
Reliability in Petrochemical Ops
48 pages
A More Reliable Way To Predict Potential Disk Failure: White Paper
No ratings yet
A More Reliable Way To Predict Potential Disk Failure: White Paper
18 pages
Sankar 2013
No ratings yet
Sankar 2013
24 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
04-Teknik Keandalan Dan Pewaratan
No ratings yet
04-Teknik Keandalan Dan Pewaratan
28 pages
Mod3-Wk3 CSG2132 Module 3 Data Storage Strategies and RAID 2020
No ratings yet
Mod3-Wk3 CSG2132 Module 3 Data Storage Strategies and RAID 2020
33 pages
Equipment Failure Insights
No ratings yet
Equipment Failure Insights
3 pages
Introduction - E (Compatibility Mode)
No ratings yet
Introduction - E (Compatibility Mode)
77 pages
Reliabilityweb How Failure Occurs
100% (2)
Reliabilityweb How Failure Occurs
1 page
Critical Equipment Identification and Maintenance
No ratings yet
Critical Equipment Identification and Maintenance
4 pages
Determine Business Risk
No ratings yet
Determine Business Risk
62 pages
Chapter 5 Types of Maintenance Programs:, Qwurgxfwlrq
No ratings yet
Chapter 5 Types of Maintenance Programs:, Qwurgxfwlrq
9 pages
A Practical Guide For Evaluating Maintenance Modernization Options
No ratings yet
A Practical Guide For Evaluating Maintenance Modernization Options
30 pages
Focus On Reliability To Improve Availability, Profitability and Safety
No ratings yet
Focus On Reliability To Improve Availability, Profitability and Safety
9 pages
The Purpose of Industrial Plant and Equipment Maintenance
No ratings yet
The Purpose of Industrial Plant and Equipment Maintenance
11 pages
How Integrated Asset Health Systems Improve Plant Reliability and Reduce Costs
No ratings yet
How Integrated Asset Health Systems Improve Plant Reliability and Reduce Costs
5 pages
Maintenance Engineering Lecture Module I PDF
No ratings yet
Maintenance Engineering Lecture Module I PDF
52 pages
1 - Maintenance Strategy Strategy
100% (1)
1 - Maintenance Strategy Strategy
27 pages
Evolution of Maintenance Practices
No ratings yet
Evolution of Maintenance Practices
10 pages
Preventive Corrosive
No ratings yet
Preventive Corrosive
15 pages
Reliability Strategy and Plans
100% (1)
Reliability Strategy and Plans
59 pages
Effective Maintenance
100% (1)
Effective Maintenance
19 pages
CP 1 CBM Strategy
100% (1)
CP 1 CBM Strategy
4 pages
Reliability Maintenance Operational Management
100% (9)
Reliability Maintenance Operational Management
97 pages
CH 01 Introduction To Maintenance
No ratings yet
CH 01 Introduction To Maintenance
77 pages
Hotel Leadership Certificate Program
No ratings yet
Hotel Leadership Certificate Program
2 pages
Hotel Real Estate Investments and Asset Management Certificate
No ratings yet
Hotel Real Estate Investments and Asset Management Certificate
2 pages
REST 2275 - Real Estate Finance and Appraisal
No ratings yet
REST 2275 - Real Estate Finance and Appraisal
4 pages
Hospitality Digital Marketing Certificate: Who Should Enroll Inside The Program
No ratings yet
Hospitality Digital Marketing Certificate: Who Should Enroll Inside The Program
3 pages
The Bathtub Curve and Data Center Equipment Reliability
No ratings yet
The Bathtub Curve and Data Center Equipment Reliability
4 pages
RICS APC Enrolment Guide India
No ratings yet
RICS APC Enrolment Guide India
9 pages
FM Audit Plan for Managers
No ratings yet
FM Audit Plan for Managers
45 pages
2011 - Innovation in Facility Management Services - White Paper PDF
No ratings yet
2011 - Innovation in Facility Management Services - White Paper PDF
20 pages
The FM's Guide To: Reopening Your Workplace
No ratings yet
The FM's Guide To: Reopening Your Workplace
11 pages
Tenant Satisfaction Survey Results 2018
No ratings yet
Tenant Satisfaction Survey Results 2018
94 pages
Indian History PDF
No ratings yet
Indian History PDF
6 pages
Modern India PDF
No ratings yet
Modern India PDF
16 pages
Training Plan Template
100% (3)
Training Plan Template
60 pages
The FM's Guide To: Reopening Your Workplace
No ratings yet
The FM's Guide To: Reopening Your Workplace
11 pages
Lighting Spare Parts Catalog
No ratings yet
Lighting Spare Parts Catalog
27 pages
Weekly Report - FMAC-Week 41
No ratings yet
Weekly Report - FMAC-Week 41
3 pages
Installation Guide SAP NW PI7.1
No ratings yet
Installation Guide SAP NW PI7.1
168 pages
Veritas Storage Foundation For Windows Admin Guide v51 SP1
No ratings yet
Veritas Storage Foundation For Windows Admin Guide v51 SP1
824 pages
B.Sc. Software Syllabus 2023-24
No ratings yet
B.Sc. Software Syllabus 2023-24
66 pages
OAT BBA 1 BCom 1 Notes (1) Trading
No ratings yet
OAT BBA 1 BCom 1 Notes (1) Trading
32 pages
Mass Storage Structure
100% (1)
Mass Storage Structure
35 pages
Lesson 1.2 Hardware and Software
No ratings yet
Lesson 1.2 Hardware and Software
9 pages
SSRN Id2525667 Code1342081
No ratings yet
SSRN Id2525667 Code1342081
13 pages
Osy Unit 2
No ratings yet
Osy Unit 2
16 pages
NVR Series User Manual-201405.14
No ratings yet
NVR Series User Manual-201405.14
64 pages
Looking Inside The Computer System
No ratings yet
Looking Inside The Computer System
90 pages
Fujitsu Amilo Xi3670
No ratings yet
Fujitsu Amilo Xi3670
98 pages
ICDL Theory Practice 2 - ProProfs Quiz
No ratings yet
ICDL Theory Practice 2 - ProProfs Quiz
6 pages
ETT MCQs 500 Merged
No ratings yet
ETT MCQs 500 Merged
179 pages
Mercedes PDF
100% (1)
Mercedes PDF
7 pages
Operarating System Fuction 2
No ratings yet
Operarating System Fuction 2
8 pages
OSY 1st Unit Notes
No ratings yet
OSY 1st Unit Notes
23 pages
24 Itt Questions
No ratings yet
24 Itt Questions
451 pages
Information Systems Course Overview
No ratings yet
Information Systems Course Overview
54 pages
Celcus DLED50272FHDCNTD
No ratings yet
Celcus DLED50272FHDCNTD
31 pages
Nasscom Defining IT Industry
No ratings yet
Nasscom Defining IT Industry
4 pages
TEAC Jumpering EMUHDD - Hard Disk MO ZIP Emulator - EMUFDD - Floppy Disk Emulator - TEAC Jumpering
No ratings yet
TEAC Jumpering EMUHDD - Hard Disk MO ZIP Emulator - EMUFDD - Floppy Disk Emulator - TEAC Jumpering
6 pages
What Is Digital Forensics by Rodney McKemmish
No ratings yet
What Is Digital Forensics by Rodney McKemmish
6 pages
Pro Inter
No ratings yet
Pro Inter
89 pages
Chapter7 SecondaryStorage
No ratings yet
Chapter7 SecondaryStorage
2 pages
Windows Glossary for Beginners
50% (4)
Windows Glossary for Beginners
15 pages
Toshiba T1000SE - Maintenance Manual
No ratings yet
Toshiba T1000SE - Maintenance Manual
100 pages
ICT Essentials for Educators
No ratings yet
ICT Essentials for Educators
285 pages
OS Module 1 Notes
No ratings yet
OS Module 1 Notes
60 pages
MT-XU+XA Multiplex Temperature Tester User Manual
No ratings yet
MT-XU+XA Multiplex Temperature Tester User Manual
17 pages
Clevo tn120r
No ratings yet
Clevo tn120r
90 pages

The Bathtub Curve and Data Center Equipment Reliability

Uploaded by

The Bathtub Curve and Data Center Equipment Reliability

Uploaded by

The Bathtub Curve and Data Center Equipment

Voices of the Industry

When digging into reliability engineering theories, you

Examining Reliability Data

Critical Server Failures

Non-Critical Server Failures

Critical & Non-Critical Storage Failures

Your Next Data Center Refresh

If your equipment is meeting your immediate needs,

Jake Blough is the Chief Technology Officer for Service Express.

You might also like