Reliability Culture
Reliability Culture
Cover
Title Page
Copyright Page
Series Editor’s Foreword by Dr. Andre Kleyner
Acknowledgements
Introduction
1 The Product Development Challenge
Key Players
Follow the Carrot or Get Out of the Race
It’s Not That I'm Lazy, It's That I Just Don't Care
Product‐specification Profiles
Product Drivers
Bounding Factors
Reliability Discipline
References
2 Balancing Business Goals and Reliability
Return on Investment
Program Accounting
Rule of 10s
Design for Reliability
Reliability Engineer's Responsibility to Connect to the Business
Case
Role of the Reliability Professional
Summary
References
3 Directed Product Development Culture
The Past, Present, and Future of Reliability Engineering
Reliability Is No Longer a Luxury
Understand the Intent
Levels of Awareness
Summary
References
4 Awakening
Stage 1
Stage 2
Stage 3
Stage 4
The Ownership Chart
Communicating Clearly
Summary
5 Goals and Intentions
Testing Intent
Transferring Ownership
What Transferred Ownership Looks Like
Guided by All the Goals All the Time
Summary
References
6 New Roles
Role of Change Agents
Reliability Czar
Role of Facilitators
Role of Reliability Professionals
Summary
7 Program Assessment
Measurements
Using Reliability Testing as Program Guidance
Reliability Maturity Assessments
Summary
References
8 Reliability Culture Tools
Advancing Culture
Reliability Bounding
Strategy Bounding
Bounding ROI
Anchoring
Focus Rotation
Working in Freedom and with Ownership
Summary
9 Guiding the Program in Motion
Guidance Bounding
Guidance Bounding ROI
Using Bounding
Program Risk Effects Analysis
Summary
10 Risk Analysis Guided Project Management
Failure Mode Effects Analysis Methodology
Design Failure Mode Effects Analysis
Reliability Design Risk Summary
Process Failure Mode Effects Analysis
Use Failure Mode Effects Analysis
Failure Reporting and Corrective Action System
Root Cause Analysis
Brainstorming
Summary
References
11 The Reliability Program
Reliability Program Plan
Common Reliability Program Plan Pitfalls
Major Elements of a Reliability Program Plan
Summary
12 Sustained Culture
Lasting Change
The Seven‐stage Process
Summary
Index
End User License Agreement
List of Illustrations
Chapter 1
Figure 1.1 Grandfather's tools vs my tools.
Figure 1.2 Reliability timeline.
Chapter 2
Figure 2.1 Reliability ROI.
Figure 2.2 The rule of 10s.
Figure 2.3 Traditional design process.
Figure 2.4 Improved design process.
Figure 2.5 Comparing design processes.
Chapter 4
Figure 4.1 Ownership chart.
Figure 4.2 Accountability notation.
Figure 4.3 Accountability chart.
Chapter 6
Figure 6.1 Reliability czar.
Figure 6.2 Shirt and phone ranked factors.
Chapter 7
Figure 7.1 Reliability bathtub curve.
Figure 7.2 Stress strain overlap.
Figure 7.3 Increasing margin.
Figure 7.4 Bathtub curve.
Figure 7.5 Real bathtub curve.
Figure 7.6 Critical bathtub curve elements.
Figure 7.7 Maturity matrix (part 1 of 2).
Figure 7.8 Maturity matrix (part 2 of 2).
Chapter 8
Figure 8.1 Bounding return and investment tables.
Figure 8.2 RG Bounding table.
Figure 8.3 Tools Bounding tables.
Chapter 9
Figure 9.1 PREA tables.
Figure 9.2 PREA balance equation.
Figure 9.3 Warranty table.
Figure 9.4 Sales table.
Figure 9.5 Features table.
Figure 9.6 Time to market table.
Figure 9.7 Wear‐out percentage.
Figure 9.8 Piezo tables.
Figure 9.9 Piezo balance equation.
Chapter 11
Figure 11.1 Uptime stack.
Figure 11.2 DFMEA table.
Figure 11.3 Allocation model structure.
Figure 11.4 Allocation model.
Figure 11.5 Test type by improve and measure ratio.
Figure 11.6 HALT testing stepped stress.
Figure 11.7 Stress margins.
Figure 11.8 Wear‐out distribution.
Figure 11.9 Tashiro chart.
Figure 11.10 Reliability growth plot.
Wiley Series in Quality & Reliability
Engineeringe
Dr. Andre Kleyner
Series Editor
The Wiley Series in Quality & Reliability Engineering aims to provide a
solid educational foundation for both practitioners and researchers in the
Q&R field and to expand the reader’s knowledge base to include the latest
developments in this field. The series will provide a lasting and positive
contribution to the teaching and practice of engineering. The series
coverage will contain, but is not exclusive to,
Statistical methods
Physics of failure
Reliability modeling
Functional safety
Six‐sigma methods
Lead‐free electronics
Warranty analysis/management
Risk and safety analysis
Adam P. Bahret
This edition first published 2021
© 2021 John Wiley & Sons Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from
this title is available at http://www.wiley.com/go/permissions.
The right of Adam P. Bahret to be identified as the author of this work has been asserted in
accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley
products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some
content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
In view of ongoing research, equipment modifications, changes in governmental regulations, and the
constant flow of information relating to the use of experimental reagents, equipment, and devices, the
reader is urged to review and evaluate the information provided in the package insert or instructions
for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the
instructions or indication of usage and for added warnings and precautions. While the publisher and
authors have used their best efforts in preparing this work, they make no representations or
warranties with respect to the accuracy or completeness of the contents of this work and specifically
disclaim all warranties, including without limitation any implied warranties of merchantability or
fitness for a particular purpose. No warranty may be created or extended by sales representatives,
written sales materials or promotional statements for this work. The fact that an organization,
website, or product is referred to in this work as a citation and/or potential source of further
information does not mean that the publisher and authors endorse the information or services the
organization, website, or product may provide or recommendations it may make. This work is sold
with the understanding that the publisher is not engaged in rendering professional services. The
advice and strategies contained herein may not be suitable for your situation. You should consult with
a specialist where appropriate. Further, readers should be aware that websites listed in this work may
have changed or disappeared between when this work was written and when it is read. Neither the
publisher nor authors shall be liable for any loss of profit or any other commercial damages,
including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication Data
Names: Bahret, Adam P., 1973– author.
Title: Reliability culture : how leaders can create organizations that create reliable products / Adam P.
Bahret, Apex Ridge Reliability, Massachusetts, USA.
Description: Hoboken, NJ, USA : Wiley, 2021. | Series: Quality and reliability engineering series |
Includes bibliographical references and index.
Identifiers: LCCN 2020032930 (print) | LCCN 2020032931 (ebook) | ISBN 9781119612438 (cloth) |
ISBN 9781119612445 (adobe pdf) | ISBN 9781119612452 (epub)
Subjects: LCSH: Quality control. | Reliability (Engineering) | Corporate culture. | Leadership.
Classification: LCC TS156 B335 2021 (print) | LCC TS156 (ebook) | DDC 620/.00452–dc23
LC record available at https://lccn.loc.gov/2020032930
LC ebook record available at https://lccn.loc.gov/2020032931
Cover Design: Ana Faustino
Cover Image: drawn by Adam Bahret
Series Editor’s Foreword by Dr. Andre
Kleyner
The Wiley Series in Quality & Reliability Engineering was launched 25
years ago and has since grown into a valuable resource of theoretical and
practical knowledge in the field of quality and reliability engineering,
continuously evolving and expanding to include the latest developments in
these disciplines.
With each year engineering systems are becoming more and more complex
with added functions, capabilities, and longer expected service lives;
however, the reliability requirements remain the same or even grow more
stringent due to the rising expectations on the part of the product end user.
With the new generation of transportation systems, such as autonomous
vehicles, the expectations have grown even further. It will require the
highest degree of reliability to convince people to entrust their lives into the
hands of a driverless vehicle. Only with new visions, methods, and
approaches to product development will this become a reality.
The book you are about to read is written by an expert in the field of
product development and reliability and provides the methodology,
guidance, and suggestions on how an organization should evolve to make a
transition to the next level of maturity in regards to reliability‐focused
culture. It is my pleasure to introduce the author, Adam Bahret, a reliability
consultant, who during his professional career wore a number of
engineering and management hats, giving him the perfect opportunity to see
the “big picture” of how product reliability is handled in various
organizations. He has built upon this experience to produce a recipe of how
to develop a reliability‐focused corporate culture and emphasizes the
important role management and leadership plays in this process. It is
important that product reliability becomes the objective of the whole
organization and not just an afterthought of a design process. Even though
the ultimate goal of any organization is to deliver a fully functional, reliable
product to the hands of the consumer, the intermediate goals and objectives
may vary between the different parts of the organization, and it is the role of
the leadership to align these objectives to achieve this ultimate goal.
However, despite its obvious importance, quality and reliability education is
paradoxically lacking in today’s engineering curriculum. Very few
engineering schools offer degree programs or even a sufficient variety of
courses in quality and reliability methods. The reason for this is hard to
explain. Perhaps engineering schools prefer educating their students on how
the systems work, but not how they fail? Therefore, the majority of
reliability professionals receive their training from colleagues, professional
seminars, and technical publications. This book is another step in closing
this educational gap and providing additional learning opportunities for a
wide range of readers with the emphasis on decision‐makers.
We are confident that this book, as well as this entire series, will continue
Wiley’s tradition of excellence in technical publishing and provide a lasting
and positive contribution to the teaching and practice of reliability and
quality engineering.
Acknowledgements
I would like to thank my wife, Beth, and daughters, Katie and Natalie, for
being so supportive during the creation of this book. I know many authors
thank their family. I suspect many do it from guilt of having used so much
family time to work on the book. But in my case, it was not just that. They
really helped me when I was frustrated or needed feedback and a sounding
board. It's love if someone listens to reliability engineering content and isn't
in the field. So thank you for so much, love.
My sister, Abigail, was a great virtual coffee pot colleague. We live on
separate coasts but would meet for coffee online and commiserate about
writing. She is a script writer and I'm an engineer pretending to be a writer.
She knew all the tips and tricks for getting through the hard times. You're
the best, kid sister.
My good friend and writing coach Mark Levy. Mark, you are a great friend
and I am absolutely sure no executive in the world will read this book
unless you have put your magic into it. The best way to describe our edit is
to simply say, “That is the first time this manuscript was in English, and not
Engineer.” You really saved this from being a book that couldn't
communicate what it wanted to say.
I would like to also thank all of my colleagues and customers. I am
extremely grateful to the executive leaders that allowed me to experiment
and develop the techniques covered in this book with their teams and
organizations. Your willingness to explore made this possible.
Introduction
When it comes to product development, most technology companies
understand the importance of reliability. In particular, the engineering teams
usually have everything they need to design a reliable product, including
the right testing tools and analysis methods.
At times, though, there can be problems: a product doesn't ship on time or,
if it ships on time, it fails in the field. Usually, and perhaps surprisingly,
these problems aren't caused by engineering. The engineers have done what
they're supposed to do, given the circumstances.
The problems come from higher up. They're generated by the organization's
hierarchy. That is, the leaders caused the product breakdowns through their
leadership decisions. Or, more precisely, they caused problems through bad
communication of their decisions, and sometimes simply by making the
wrong decisions.
For instance, when it comes to talking to their product team, a leader might
give each role on the team a very different goal. They might tell the project
manager that their goal is a launch at the end of Q1… and the R&D
engineer's goal is a particular new killer feature… and reliability's goal is a
reliability of 99.99%.
What happens then? Conflict! The project becomes a scramble.
Instead of the team collaborating on the overall product, they're competing
with one another in narrow, limited ways. Each part of the team needs to
achieve their particular goal at the expense of the other program goals.
(After all, it's personal. Each member thinks, “I've got to win. My job is on
the line!”)
The result is a product launch that's ridiculously unbalanced. The product
may hit the market on time, but the important new feature fails in the
customer's hands.
What's more, the company has lost the opportunity to have a product that,
originally, had promise, and the leaders have lost the opportunity to become
the owners of a successful product program.
What I've done, then, is to try to right these wrongs.
I've written this book specifically for senior leaders. It's for those of you
who want to achieve the goals of a highly reliable product, released on time,
with the best new features. It'll show you how to build a culture that can
generate impressive, product‐based profits – while that culture is
simultaneously centered on reliability. (If that sounds paradoxical or
impossible, read on!)
When it comes to creating and releasing products that are innovative,
popular, and financially successful, there's an odd belief out there. Leaders
think that when it comes to weighing factors – such as cost point, time to
market, and product features – there have to be harsh compromises.
Fortunately, that's not true.
Often, the reliability advancements that will help you produce a home‐run
product are “free.’” In other words, if you more effectively connect the
engineering tools being used in the program with your company's business
goals, the reliability initiative will pay for itself. No additional costs or
length. Hence, “free” in investment and profitable in return.
Before we go further, I'd like to tell you why this work is so important to
me.
Consider the following scenario: imagine you're responsible for developing
not just any product but one with life‐and‐death consequences. Let's say
your product is a surgical device that cannot be allowed to fail. No way, no
how. Failure means human death.
Yet what happens? The product development program shortcuts the
reliability process, knowingly. On top of that, every announced schedule
compression and budget cut strikes the reliability team first.
The consequences of such behavior are obvious, foolhardy, and painful. I
mean, the FDA has walked in and shut down the operation before. History
it seems is about to repeat itself. But why? Why are we headed into this
horrible situation again?
In real life, I've experienced similar scenarios many, many times before,
from nearly every side. As a design engineer… a reliability engineer… a
reliability manager… and a leader who built entire reliability departments
from scratch.
I've also seen it throughout the years as an independent reliability
engineering consultant working on numerous projects in parallel in multiple
industries.
We so often do things that just don't get good results. But we do them in
that same way over and over again. The reason? We don't know why what
we're doing isn't working, so we don't know how to do it differently.
Change, however, is possible. We can understand why we're messing up,
and we can learn to do things not just differently but correctly.
In this book I'll show you the “whys” behind common reliability mistakes,
and I'll also show you a better way. Or even many better ways.
I created most of the tools and techniques you'll read about here, but they
didn't come from inspiration. As my wife will tell you, I'm not some genius.
Instead, I created them simply by having walked more miles in more types
of shoes than most. I also keep my eyes open, tend to obsess, and am
tenacious. These factors have yielded solutions that pack a formidable
punch.
As a reliability consultant, I have the unique position of being a “trusted
advisor,” not only to an organization's management but to its executive
leadership, too. Many of these leaders have been willing to give me latitude
on the methods and strategies that have yielded great results. I'm
appreciative for their willingness to go on these adventures with me.
What you'll find in this book are simply the methods I've seen work to
ensure the product you develop is the product you planned.
“Reliability culture” is a study on how strong reliability and profitable
business connect. More specifically, it will teach you how reliability plays a
role in product success in the field.
The culture required to develop the Mars Rover's robotic assembly is very
different than the culture needed to develop a toy robot for this year's
holiday push.
When the Mars rover program began, the design team determined that each
one of the vehicle's DC motors must have a reliability of 99.99999%. Why
that extreme? Because if those motors made it to Mars and even one failed,
the entire $2.1 billion mission would be a washout.
From a reliability standpoint, what that meant was that every single design
decision the team made had to be done with the reliability of those motors
in mind. Nothing could compromise those motors – not scheduling, not
budget, not extra features.
That toy robot, on the other hand, better make it to the store shelves by third
week of November or sales will be cut in half. If the reliability isn't perfect,
so what? Most kids will have forgotten about the toy by Easter.
Until now, the reliability engineering discipline has been heavily focused on
improving design processes. The problem is that design processes are only
half the story.
The methods in this book come from having the ability to take a step back
and connect the pieces. In reading how they work, you'll be able to sit down
with your team and discuss the ones that will create the product you
intended, are in line with your brand, and gain the company its greatest
market share.
This is the beginning of an exploration into a new type of product
development process. One that will allow your products to meet their full
potential.
Let's look at the book's flow, so you’ll know where we're headed.
Chapter 1, The Product Development Challenge, is an overview of common
program difficulties. These include budget and schedule compressions that
force leaders to cut necessary program steps. Such omissions leave
management to make decisions blindly. The major factors that typically
drive program decisions are outlined in detail. These product factors and
how they interact with the program are critical for a successful project
execution.
Chapter 2, Balancing Business Goals and Reliability, is about a major
conflict: the fight between long‐term and short‐term business gains. The
relationship between the modern business model and the reliability toolset
is complex. Modern business methods want returns that happen fast, while
reliability methods go for strong performance over the long term. See the
conflict? These two don't match in strategy or execution. What can we do?
Is there a way to bring the short‐term and long‐term thinking together, and
make the result work for everyone? There is. You do it by cutting out some
gut decisions and replacing them with decisions made quantitatively.
Chapter 3, Directed Product Development Culture, is about what drives
organizational behavior. By exploring culture both inside and outside of an
industry, we can dissect how it works and how it can be changed
effectively. Just as important as change, is ensuring that the change takes
root, so it can't be displaced by the “normalizing” forces of the daily
operation.
Chapter 4, Awakening: The Stages to Mature Product Development, is
about identifying where ownership and accountability lay for specific
functions. The chapter discusses language and how teams communicate. By
identifying the intent of language and opening the paths to direct
communication tremendous jumps in effectiveness can occur immediately.
Chapter 5, Goals and Intentions, identifies the real reasons we do what we
do. Without knowing why we are doing the program activities we are, we
can't clearly connect value. Many activities have clear justification for the
significant investment they require when we start a program, but the
reasons get a big foggy as we progress through a program. Little can be
achieved without goals. Even with defined goals, they can still fail to direct
us if they are not well documented and in an accessible format.
Chapter 6, New Roles, outlines three necessary roles for a product
development program. These roles, “reliability czar,” “facilitators,” and
“change agents” are necessary if the correct accountabilities and paths of
information flow are going to exist. These roles are not necessarily new
hires for a department but an additional “hat” existing team members can
wear at given times.
Chapter 7, Program Assessment, is about methods for evaluating your
product development program's performance. By defining and measuring
key success factors, we create a closed feedback loop to manage our
program actions.
Chapter 8, Reliability Culture Tools, introduces the fundamental tools that
ensure your product reliability culture is connected to the business's goals.
Methods like “Bounding” and “Focus Rotation” are described in terms that
allow the reader to implement them immediately.
Chapter 9, Guiding the Program in Motion, provides an overview of tools
that can assist with keeping a well‐directed program on track. “Guidance
Bounding,” “return on investment (ROI),” and “Program Risk Effects
Analysis” enable the “closed Loop feedback” process necessary to apply
regular and small course corrections so a smooth program gaining
maximum return on investment occurs.
Chapter 10, Risk Analysis Guided Project Management, provides an
overview of risk analysis tools and handling failure data. The different
flavors of Failure Mode Effects Analysis (FMEAs) are reviewed with an
emphasis on how they integrate into programs. Controlling failure data and
root cause methods is fundamental to ensuring all the “free” knowledge
floating around both in‐house and the field are used to the greatest extent.
Chapter 11, The Reliability Program, discusses the strategies and
implementation tactics you'll be using with your reliability tools. It'll guide
you through the product and business factors that shape the reliability
initiative. The major elements of a complete program plan are covered.
Chapter 12, Sustained Culture, is all about how to make your new reliability
changes permanent. Change is good. Lasting change is how we win.
1
The Product Development Challenge
Rather than looking at concepts in the abstract, let's get down and dirty. I
want to share a story with you about how product development goes wrong.
In uncovering these traps, we'll then be better set up to talk about fixes.
What follows, then, is a kind of case study that highlights problems. And
it’s not based on a single company. Instead, the particulars are drawn from
dozens of real life situations, which I've disguised.
Key Players
You and I work for Amalgamated Mechanical Incorporated. The company
is hot about creating a new medical procedure robot. We're on the reliability
team.
It's critical that our robot get out there quickly, because our #1 competitor is
on the verge of introducing a similar product. To get a jump on them, our
product needs to hit the market in 10 months’ time.
According to the program plan, the Accelerated Life Testing (ALT) for the
robot's arm should start next week. As far as we know, we expect to receive
the arm samples in three months. If the ALT testing begins in three months,
however, there's little chance we'll have accurate predictions on how and
when the product may wear once it hits the market.
For all practical purposes, even if we provide that test‐based prediction for
the arm's wear‐out failure rate a few months before release, it will be of
little value. There's no alarm big enough to sound that would delay release,
based on premature wear‐out. Even if we discovered that the product wore
out, not in the promised five years of normal use but in one solitary year,
the leaders would still release on schedule. (After all, they'd reason, we
have to beat the competition to market.)
It's been said before by our VP: “We'll release it now and get a fix out there
quickly. We already have a punch list for version 2.0.”
In many ways, the project has been designed to fail. For me, it feels like
trying to stop a freight train that's built up a head of steam. Stepping in front
of it creates a mess, and the train still pulls into the station on time.
Half the program's reliability testing was to provide input for program
decisions:
“How confident are we that the arm will reach its life‐and‐reliability
goals?”
“What's the robot's end‐of‐life failure mode?”
“Should we create a preventative maintenance cycle or shorten the
robot's promised life to customers?”
From Engineer #2's perspective, when all was said and done her proposed
fix would save $1.85 million, and she should have been rewarded for her
thoughtful solution. But you already know where I'm going with this, and
so did the leadership team reviewing her suggestion.
The problem with big companies, especially those with impersonal
management teams sitting far away from engineering, is that the boss's
directive of on budget and on schedule seems frighteningly rigid by the
time it reaches the worker bees. The human resources department already
has performance appraisal processes drawn up, based on budget and
schedule. And because the directive came from so high up, there's not much
scope for change at the lower levels.
So Engineer #1, who played ball with the leadership when the program was
originally scoped, wins the bonus and a good performance appraisal. And
they set an example for the rest of the workforce. They've become a symbol
of good behavior.
What also happened, however, is that leadership told Engineer #2 (and
everyone else watching) that this reliability stuff is unnecessary, even if it
saves the company $1.85 million.
Engineer #2 will now disappear. If she's really good and prioritizes her
professional happiness, she will simply leave and go to another company. If
her value system doesn't align with the leadership's value system, and she
has options elsewhere (and all the very best engineers do), she'll just
disappear. That, or she'll simply turn into Engineer #1.
Anyway, that's what I put into my report (hey, they had paid me up front).
These leaders, I wrote, had created a culture “allergic to reliability.” If I
were forced to be an employee at this company, I'd drop product reliability
goals as a way of ensuring I received a paycheck.
The leadership had filled the company with people like Engineer #1, and all
the opportunities to save $1.85 million were never taken.
Engineer #1 got promoted, but profits decreased and times grew tougher.
And the organization is now conditioned to fixate on cost and schedule. It
reinforces things that don't work even more. The culture gets worse. The
talent leaves. Frustration reigns supreme.
Peter Gibbons:
It's a problem of motivation, all right? Now if I work my ass off and
Initech ships a few extra units, I don't see another dime; so where's the
motivation? And here's something else, Bob: I have eight different
bosses right now.
Bob Slydell:
I beg your pardon?
Peter Gibbons:
Eight bosses.
Bob Slydell:
Eight?
Peter Gibbons:
Eight, Bob. So that means that when I make a mistake I have eight
different people coming by to tell me about it. That's my only real
motivation: not to be hassled; that, and the fear of losing my job. But
you know, Bob, that will only make someone work just hard enough not
to get fired.”
You're smiling right now, I know it.
Product‐specification Profiles
When the program begins, we formulate the product objectives, which
sound like this: “faster robot X axis motion of 10 mm second−1, and a retail
cost point of $23 500, and a first‐year reliability of 99.9%.”
Then, we feed these objectives into what ultimately becomes what's called a
“product‐specification profile.” Until the product hits the market, this
specification profile is supposed to be the project's touchstone. It drives all
the decisions.
But how were the product objectives that went into the profile formulated in
the first place?
Why, those came from the organization's business goals!
Really, everything we do at work is driven by the business goals. It might
be one degree or five degrees of separation, but at the end of the day all the
activity has to line up with the organization's business goals.
Your product's reliability, then, needs to be an adhered‐to part of your
business goals.
Do you see what I mean? Reliability is a part of your brand. What happens
to the sales of all your product lines if your brand's reputation becomes
tarnished?
It's not that every product needs to be 100% reliable, or that reliability is the
sole factor in developing your product. You have to be clear about what
you've decided, and stay true to it. After all, in many markets you're only as
good as your last product.
As I've mentioned, a culture built on reliability is one where business and
reliability connect. People often think of reliability as being part of the
design process only. That's not true. It's only half of the story.
Executives at all levels understand how the reliability discipline affects
market share. If the new coffee machine their company designed fails, they
suffer personally. They're the ones held accountable.
Let’s say this is the first coffee machine to market with Wi‐Fi
connectability, that it can email you when your cup of coffee is ready. This
should mean it will grab significant market share and make the brand a
household name.
But none of this happens, because it immediately has issues when
customers try to get that desperately needed caffeine fix.
Uncaffeinated people who are denied coffee are quick to leave bad online
reviews. Who cares about a coffee machine that can email you if the
machine's water pump breaks when it is three days old? “Thanks for
nothing.”
So how is it even possible that a smart leader would blindly cut back on
reliability initiatives when the schedule is tight or a new feature isn't ready
for prime time? The reason is simple: when it comes to reliability, the direct
effects on product or program performance investment aren't apparent – in
the short term.
Businesses work on a quarterly system and individuals move to new
programs and roles in short timeframes. Very few long‐lead actions make it
to the top of the daily action list. Most companies can't keep this high‐level
perspective when the day's urgent matters arise. Many that I have seen
which hold this far‐sighted perspective are privately held by a founder who
was an engineer.
This lines up with McGregor's Theory Y, in that people want to do good
work. But this type of situation occurs when “good work” is measured only
in short, three‐month increments.
An engineer who is also left holding the business accountability long term
is the person who we should look to for best practices when long‐term
growth is the objective. This is one of the positions that seem to find a great
balance for all the program and product factors. Let me repeat it to be clear.
An engineer with an accountability for long‐term business success. I have
proof of this. Some of the most successful companies in the world were
founded by engineers who kept the business private. Some of the tools I
propose aim to put the leaders in this same mindset.
Executives and owners of companies make decisions based on a significant
amount of input from their council, their mid/high level management. If the
executives and mid‐level leadership's goals are not aligned then this council
becomes muddied. This becomes compounded when the executives
evaluate the mid‐level leaders on the short‐term delivery of time to market,
cost point, and new features creation. The mid‐level managers can't equally
support reliability when times get tight and the results of an investment are
far down the road.
So how do we fix this? We look at how executives make decisions. The first
question is how do they get input/information for their decisions? Many
executives work in a dashboard summary manner. The controls and
readouts are very similar company to company, which is why it is easy for
executives to move between organizations and industries. Reliability simply
needs to find a way to format what is important so it can be included on this
dashboard. And once created, this connection to reliability functions for
executives will be similar to the other critical inputs executives receive day
to day. Total Quality Management, The Toyota Way, and Lean Six Sigma
have all had a significant impact on how we do business and product
programs. They all have found ways to get on that dashboard.
Product Drivers
It was my first house, and it was a project like many first houses. Originally
constructed in 1855, it had seen many caretakers. I was confident I would
just be another name on the list and the house would be standing long after I
was gone.
Being a first‐time homeowner, I needed to begin that tried‐and‐true ritual of
depositing my paychecks directly with Home Depot. Within a short time,
I'd be meandering the aisles of the warehouse‐size hardware store, with
other homeowners, looking confused under the store's fluorescent lights,
trying to figure out solutions to the problems that brought us here with little
knowledge of how to solve them.
Before long, the cashiers knew me by name and welcomed me the same
way the bar staff welcome the barfly Norm in the classic sitcom Cheers.
“Adam! How are the kids? Were you able to finish the ceiling?”
In a large hardware store like that, there are two types of people walking
around: homeowners and contractors. From a product development
perspective, these are two very different types of customers with very
different needs.
As a “mortgage poor” person who was working on my own house, not
because it was fun but because I was trying to save money, I wanted to buy
tools that'd get the job done at the lowest possible price. Why lowest? Most
of the tools I'd buy would be used a few times, and then would sit in my
garage. My project was severely underfunded.
The other type of buyer, the contractor, was running a crew on a job site,
where the highest operational expense was hourly wage. Having a tool fail
would be devastating to the bottom line. It would leave workers sitting
around. What that contractor needed to know was that the tool was going to
work and not leave them paying idle workers.
So the power drill I'd buy and the power drill the contractor would buy had
little in common other than both tools turned drill bits.
A product is defined by many factors. I find that the four primary ones are
time to market, features, cost point, and reliability.
Imagine what I was thinking as I stood in front of the power drill display.
There were at least a dozen models, and I had a clear ranking in mind as to
what was most important to me: #1 was cost, #2 was cost, #3 was
reliability.
That contractor standing next to me with the pro‐wrestler‐size biceps was
evaluating these drills using totally different requirements: #1 was
reliability, #2 was “This better not break!”, #3 was “That quick‐release
chuck could save some time.” Reliability was #1. Time‐saving features
were #2. Cost point was the last thing on his mind, because the expense of
the tool failing was far more costly than the savings he'd enjoy in buying
the lowest‐priced drill.
Cost was my priority, because every dollar I spent on the drill hurt my bank
account. Reliability was at the bottom of my priority list because if the drill
broke I could work on something else and replace it the next day or even
borrow a neighbor's. Reliability also fell low on my priority list because my
duty cycle was a fraction of the contractor's. A design with low reliability
used on a low duty cycle produces a similar failure rate as a design with
high reliability and a high duty cycle.
I was doing many roles in my “job,” and used the drill a total of one to two
hours a day, every couple of days. The contractor may have a guy doing
drywall construction all day – every day. In one month his drill might be
used for 160 hours, while mine was used for 12. My project would be
completed in three months, while the contractor works consecutive jobs
throughout the year. That takes my yearly usage to 36 hours. The
contractor's yearly usage could be over 2000 hours. That's why I can have a
cheap drill and experience the same or even better reliability than the
contractor purchasing the top‐end model.
My whole point here is a drill is a drill, except that it's not. Each drill is
designed differently, based on the factors that are most important to its
particular market. Success isn't making the drill perfectly reliable. Success
is making the right drill for the right customer, even if that means
manufacturing a particular drill that has lower reliability.
The degree of a product's reliability should always be a conscious decision.
Bounding Factors
The design parameters I just talked about I call “Bounding factors.” In a
sense, they're the measurable factors that guide both your product and your
program.
What are the Bounding factors for a product? A design feature or a cost
point.
What are the Bounding factors for a program? Time to market or
development cost.
The Bounding factors for a product and a program have to be related,
because there are tradeoff decisions that affect both.
Here are some common Bounding factors we see in programs today:
New technology/features
Cost point
Time to market
Reliability
Serviceability
Manufacturability with a contract manufacturer
Marketability.
The four that share the primary balance in most programs are those from the
drill example: cost point, time to market, features, and reliability.
It's not possible to turn all of these up to a level of 10 (11, if you're a Spinal
Tap fan), because they compete for resources. Extremely high reliability is
not in line with quick time to market or ultralow product‐development cost.
New cutting‐edge technology is not in line with low cost point or quick
time to market, either. For those combinations, there's a give and take to get
to a budget of time and money that works.
When this negotiating between schedule and investigating a reliability issue
is done mid‐program without program tools, it is often reduced to who
makes the best argument at the time. I can think back to many programs
where the decision was in favor of the person who was most in favor with
the decision maker. “Who do I know best?” “Who do I trust?”
I don't believe the decision maker did this because of favorites. This would
be insane, because if the program fails they fail. They do this because they
don't have factual information at their fingertips. Without any quantitative
information any individual is left with making a decision on counsel and
trust.
Unless we can incorporate tools like the Bounding methodologies that will
be shared in this book there is no way to expect a leader to make decisions
on anything other than listening to trusted counsel.
The Bounding methodology derived its name from the base principle that
each factor should “bound” resource and schedule changes to ensure no
specific factor is compromised beyond the original product specifications'
margins. What is being achieved is that factors are consistently steering the
program toward the goals set in the product‐specification document.
Reliability Discipline
Why is reliability done in product development programs? This is where we
need to start. Without a “why” clearly defined we don't have a foundation to
work on. The value in this early discussion is to understand this. Why is
reliability test and analysis done? It is done to measure and improve
reliability of the product. We usually stop there.
But why are we measuring and improving? This is the question that is left
unanswered and we are without a way to make the critical decisions during
a program. A great quote on “why” is by a sci‐fi character named
Merovingian in the movie The Matrix:
“Causality. There is no escape from it, we are forever slaves to it. Our
only hope, our only peace is to understand it, to understand the why.
‘Why’ is what separates us from them, you from me. ‘Why’ is the only
real social power, without it you are powerless.”
Let's change a few words in that quote to make it specific to our mission:
“Causality. There is no escape from it, we are forever slaves to it. Our
only hope, our only chance at program success is to understand it, to
understand the why. ‘Why’ is what separates the good program choices
from the bad program choices. ‘Why’ is the force that directs product
programs, without it we are powerless. Making choices that are reactive,
based on fear and blind trust are the ‘why's’ that make us powerless.”
OK, that got intense. Let's go to the deepest “why” we can identify first.
The reliability discipline has progressed through several phases of maturity.
It was originally approached as a method of identifying areas of risk and
used “over design” as a mitigation. Look at any tool made from 1000 BCE
to 1950 CE. I have many of my grandfather's tools, and they still work fine.
More than fine, his tin snips could cut 0.5 in. (13 mm) steel cable. The
wrench could be used as a hammer, and the drill could be pulled from the
rubble of a house fire and used in the reconstruction. Take a look at the
photos of his tools next to my modern‐day equivalents (Figure 1.1).
But those product designs from many decades ago would be unlikely to
survive in today's market. Looking at the modern‐day equivalents next to
them the differences are evident even to a non‐tool enthusiast.
With rapidly advancing technologies affecting all of our lives, weight and
cost quickly became critical to maintaining a competitive product in the
market. The reliability engineering discipline took a more formal shape at
this point due to a need to have a counterforce that ensured cost and weight
didn't take the product design to a point of being, well, unreliable. The
forces for product balance emerged: lightweight, low cost to manufacture,
and at a low cost point are now balanced against reliability and developing
new technology.
Through these needs, the methods of reliability analysis, test, and design
techniques took shape. It was the military that led this initiative initially,
simply because it is far worse to have a piece of equipment that will save
your life not work than a vacuum or toaster quit at an inconvenient time.
Military customers still often measure reliability in terms of “risk of lost
life.” Even when a piece of equipment that is not intended directly for
assault or defense (rifles, missiles, shielding, etc.) fails, it may increase the
risk of loss of life. Something as simple as a surveillance camera on a tower
in a hostile area can result in death if it fails and so needs to be replaced. A
soldier doing maintenance on the top of a tower is a target. This places a
whole new complexion on designers and engineers talking about a 30%
failure rate of a camera in a meeting back home, when that failure means
not just that the camera stops working but that this, in turn, could lead to the
death of a soldier.
Figure 1.1 Grandfather's tools vs my tools.
I won't ever forget that sobering moment at that conference table when a
DARPA general made that statement about a project I was working on. The
clarity of what was at stake if I didn't succeed at the role I was brought there
to do was chilling. When I had walked into the meeting I thought the
consequence of a higher failure rate for the security camera was warranty
expense and future lost sales. People's lives were at stake if a circuit board
joint cracked.
The period for reliability from the 1940s to the 1970s was heavily analytical
in nature. It encompassed tools like reliability predictions and specialized
tests (Figure 1.2). The predictions were based on historical failure rates of
individual components. The specialized testing aimed to predict the wear‐
out of specific failure modes or to identify the margin of failure of a
primary stress. In the 1980s and early 1990s, more advanced techniques of
testing like the Highly Accelerated Life Test (HALT) and Accelerated Life
Test (ALT) became prevalent. These methods permitted more specific
statements of reliability prediction or design improvement input to be made
in very compressed timeframes early in the design process.
The 1990s to 2010 was very much characterized by the DfR initiative. This
principle is that reliability is “designed in” not “tested in.” A fundamental
shift with DfR is that reliability practice is intertwined with the full team
and design process from start to finish. This was a big shift from the
mindset that mechanical engineers do mechanical design, electrical
engineers do electrical design, and reliability engineers then make it reliable
when they are done.
Figure 1.2 Reliability timeline.
The next phase, today's phase of advancement for the reliability discipline,
is reliability culture. This will be the connection of reliability tools
techniques and philosophy to the highest levels of business and market
objectives in conjunction with DfR. Companies that embrace this next
phase of reliability evolution will quickly emerge as the leaders in their
markets.
The bottom line in most of our businesses is dollars. The metric that will be
used to measure an organization's level of reliability cultural maturity will
be dollars. The return on investment of applied reliability tools will be
measured in dollars. Companies that do not embrace a culture of product
reliability will be ill equipped to compete with those that do. – just as it
became impossible to compete without a Total Quality Management
process two decades ago. We are on the cusp of placing reliability at the
heart not just of the engineering process but of corporate culture. For this to
happen, it will be necessary for business leaders to create the correct
organizational dynamics and align reliability objectives with a business’s
financial goals.
Know your target. Make goals and make compromises. Don't commit to
high reliability without selecting the sacrifices. Something has to give, be it
schedule or new technology development or cost point or very high product
development cost. There are no worse words than a leader saying “and it
must be highly reliable” or “it will never fail” without discussing the cost of
pursuing that reliability goal.
You have to know beforehand whether you are willing to trade reliability
for growth of technology or time to market, and by how much. The Mars
rover took many years and billions of dollars to create. From a technology
standpoint the Mars rover is the equivalent of a high‐school robotics science
project. It has off the small digital cameras and a small DC motor like in
radio controlled cars driving little wheels. There are servomotor‐driven
arms based on decades‐old technology. I build stuff like this in my
workshop with my kids.
But what was special about it is it could never fail. It truly was a “This
design cannot fail or we are wasting billions of dollars.” That was a
quantifiable statement. The many years and billions of dollars spent to
accomplish that perfect reliability was the cost. It would be a mistake to
create your commercial or consumer product with that type of reliability
goal. Nobody wants your perfectly reliable flip phone 10 years after the
market has moved on to smartphones. So let's figure out what goals you
should have for each program and how to correctly structure a program to
accomplish them.
References
1. Edwards Deming, W. (2000). Out of the Crisis. Cambridge, MA: MIT
Press.
2. McGregor, D. (1960). The Human Side of Enterprise. New York:
McGraw‐Hill.
2
Balancing Business Goals and Reliability
Return on Investment
It's difficult for reliability advocates to negotiate resources. Why?
Everyone knows that reliability affects sales, marketing, warranty expense,
and future program resources. But those things happen downstream.
How can you compare them to arguments regarding return on investment
for “time to market, “new features,” and “cost point,” which have such
definitive returns sooner rather than later?
Arguments for investment in these types of factors are based on a short‐
term return that is immediately tangible: “For $25 000, we will reduce the
cost to manufacture each unit by $4.25, a 12% cost reduction. This
reduction initiative will be completed in 10 weeks, leading to a saving in
manufacturing cost over two years of $2.3 million.”
That's a clear request with an explicit return.
Compare that to the promise attached to investment in a reliability
initiative: “We request $25 000 for testing, which in eight months will tell
us we're 60% sure that we won't have a high failure rate.”
“Huh? I get what for what? If I give you $25 000, I get to find out at the end
of the project you're kinda sure things will be OK in the far future? Sign me
up! No, just kidding. I'm giving that $25 000 to the cost reduction guy.”
And that is what reliability faces on a daily basis. We offer “probability” in
return for investment, not guaranteed cost savings, or a new feature that will
increase sales, or getting the product to market six months faster.
Reliability engineers should start a support group that includes life
insurance salesman and the guys with the signs on the street corner that say,
“The end is near.” We all have the same problem. It could be Ambiguity
Anonymous.
Ambiguity makes it difficult to bring reliability into the conversation. The
reliability work can potentially save you $15 million in lost warranty costs,
missed market share, and brand tarnish. But I can't guarantee that or even
give you a really accurate projection.
It's akin to selling seatbelts in 1940. “This new device is a restraint. It will
cost you an extra $15 per seat, is uncomfortable to wear, and forces you to
remain stationary inside the car. But it may save you from injury or death at
some later date.” In 1940, a lot of people would have opted to use the extra
cash to put toward a two‐tone paint job, thinking, “I've never used a seat
belt and I'm fine.” (Well, the reason that's the case is because we don't often
have the opportunity to suggest this option to people who've died in a crash
already.)
But sometimes the person who has had the horrible experience of a loved
one dying in a car crash will see the value in the $15 investment. Sound
familiar? How many times do we see investment in reliability increase
dramatically after a major product disaster?
Here, our mission is to get people to understand the value without
experiencing the disaster first.
Change didn't happen until the department of transportation created a voice
for all those dead people. They did it through marketing campaigns, based
on the statistics of death and injuries in auto accidents. That approach
worked, and today we know riding without wearing a seatbelt is crazy.
In any product and program, four factors are always present. They are:
The people who lose out on this lack of investment more than anyone else
are the individuals who set up the program and process. These are also the
ones who pay the highest price. It's the executives that have to face the
music when the products released are driving high warranty cost or
customer dissatisfaction.
But somehow they're the ones that unknowingly structured a program that
guarantees the customer doesn't get the product balance, which was so
carefully planned when the project began. We have to conclude that they're
not getting the information they need. No one consciously self‐sabotages.
This is the problem to be solved with our reliability culture initiative. “Why
can't we understand the meaning of the information being presented at
moments of critical decision?”
Program Accounting
One of the best ways to understand reliability's true cost is to study the
practice of activity‐based accounting. It's accounting that doesn't look at a
product's materials and operational costs alone. Instead, it studies a
product's complete cost.
Figure 2.1 Reliability ROI.
I'll explain what I mean by citing one of my favorite things: donuts.
When you're 12 years old, a donut costs $1.75. When you're 45 years old,
the real cost includes spending two hours on the treadmill to burn off the
empty calories. Which also means I have to buy a gym membership.
The activity‐based costing of the donut is now $1.75 + two hours of my
time + a portion of my $125‐a‐month gym membership. Seen this way,
donuts don't cost $1.75. They're expensive. I'll pass (or more likely I'll eat a
few and pretend it doesn't matter).
Seriously, why do we humans do this? Let’s look at a different cost path,
bad health instead of the cost of exercising. Bad eating habits, like regularly
bingeing on donuts, put me at a higher percentile risk for diabetes, heart
attack, and knee and back problems from being overweight.
What is the difference in thinking between someone who knows all this
“probabilistic” information and doesn't get the donut vs the one who does
anyway?
This is exactly the same as the person who doesn't invest in reliability
design robustness or reliability measurement because the outcomes are
probabilistic.
Programs often budget based on what it costs to do something. “The cost of
building four prototypes is $230 000.” “The cost of developing this new
technology is $2 million.” Activity‐based costing will measure the time a
piece of equipment is not being used. That could be a significant add to
both of those numbers. For a product development program this would be
estimating the cost of having to do a redesign late in the program in
comparison to doing that same redesign early. The initiative to include such
assessments is often abandoned because of the difficulty of making it
quantitative. “How much does it cost to do a redesign?” Not to mention this
is usually occurring after the fact, and what's the point of counting how
much money you have lost and can do nothing about? We also often don't
do this evaluation early because “It's not going to happen to us.”
Rule of 10s
The “rule of 10s” says simply: “To fix any design issue, you have to spend
10 times more than you would have if you had fixed it at the previous
stage.” In other words, the longer you wait, the more it's going to cost you
(Figure 2.2). And the cost isn't going to be pennies or dollars. It's going to
jump exponentially. If you wait long enough – through level after level after
level of development – the fix will eventually cost you your shirt.
If it costs $100 to fix an issue in an early design, you can expect that fix to
cost you $1000 at “design freeze,” $10 000 at “first prototype,” $100 000 at
“product release,” and $1 000 000 as a “field failure.” That's a tremendous
difference in cost to fix the same issue. These numbers aren't made up;
they've been proven again and again. In fact, those of us who have been
down this road are painfully aware that the rule of 10s is actually quite
conservative.
How about a real‐life example? If a team identifies a poor gear design early
in the design phase, maybe a design review or sub‐assembly test, it's a quick
fix. In this case the engineer might, with some slight embarrassment, go
back to his standard gear design practices handbook and make the
correction. We can say that the fix took approximately a half hour of work,
and it cost the program, say, $100.
Now, if this same gear issue drives a 10% field failure rate in customers'
hands, we're in a situation that's painfully different. To get things back on
track, you're going to have to call in many people. The whole process will
start with a root cause analysis, so you can get a handle on what's happening
in the field. It'll involve the quality team, design team, and field
representatives, at a minimum.
Summary
Early reliability integration into the product development process has been
shown with a 100% success rate to be the most efficient and profitable
configuration. But it is still not done that way in many programs. Why
would companies not implement a proven method?
We have to look at the roles and how they relate to the program. The roles
are where the actions are executed. Now that you have read this chapter,
take a moment and think about your organization. Does the company hold
reliability in a higher regard than individual roles do? How do the roles
align with the company's objectives? Why would they differ?
This exercise will only have to be done for a few moments before a few
“Ah‐has” come to the surface. Share them candidly with the team. More
than one leader has found that they have a team that is desperately wanting
to “do it right” but haven't been able to because of role constraints. How
wonderful would that be? Doing an exercise and having a few candid
discussions and products begin to be developed better, quicker, and for less.
I've seen that happen, more than once.
References
1. Rajagopal, K. (2005). An industry perspective on the role of
nondeterministic technologies in mechanical design. In: Engineering
Design Reliability Handbook (eds. E. Nikolaidis, D.M. Ghiocel and S.
Singhal). New York: CRC Press.
2. Liker, J.K. (2004). The Toyota Way: Fourteen Management Principles
from the World’s Greatest Manufacturer. New York: McGraw‐Hill.
3
Directed Product Development Culture
I have two goals for you. I want you to: (i) advance how you manage your
reliability process, so it's efficient, and (ii) capture the highest possible
return on investment (ROI) by creating a product that's balanced for your
market.
With the way reliability is practiced today, however, reaching either goal is
difficult. The question is “Why?” Why do well‐intentioned reliability teams
frequently fail in every type and size of industry?
The answer falls to a hidden force: your organization's culture.
Culture defines how an organization acts and works to maintain its status
quo. Culture is like the organization's subconscious in how it directs our
thinking and actions. Our culture implants within us the norms, and what's
expected of us. It's not about overt rewards and punishments. Those are
directives. Directives are inputs for conscious consideration.
Again, what we're talking about here is how the environment teaches us
how to behave in a way that happens below the level of consciousness.
There's a theory that highlights how even slight changes in our surroundings
give us unconscious clues about how to behave. That theory is called the
Broken Window Theory. It goes like this: imagine you're in a crime‐ridden
area, and you see a series of buildings. They all look similar, except for one
building. It's the only one sporting a broken window. The theory suggests
that this single broken window will act as a trigger, signaling to criminals
that the building itself is a target. It's more likely to have its other windows
broken. It may even prompt more serious crimes.
That building's appearance is part of the block's culture.
Do you believe something as simple as a broken window can act as a
catalyst for deviant behavior? I'm convinced, because I've seen the theory in
action. More than once, I've been that deterred vandal.
It looks like this: it's nearly eight in the morning, and I've just eaten a piece
of leftover pie. I need to rush out the door to my office, and I intend on just
dropping the plate into the sink. I've told myself that, given my hurry,
leaving the dish dirty makes sense.
When I walk up to the sink, however, and see it spic‐and‐span clean,
without a single dirty dish, guess what I do? I rinse my plate and place it in
the dishwasher. I even wipe down the sink to rid it of residual crumbs. The
vandal has been deterred.
We're so easily influenced by our surroundings and their projected
expectations.
By contrast, what do you think the result would have been if someone had
demanded that I put my plate in the dishwasher? I might have said, “I'm in a
rush. I'll get to it later” and then dropped the dish in the sink and run out the
door.
For reliability to be an important part of the design process, the desire has to
come from within each individual. Having reliability forced on people as a
REQUIREMENT creates resistance. It's just human nature.
Now that we know what culture is, let's go deeper.
Influences
This transformation is definitely worth analyzing, because the recovery was
so quick. What were the influences? What was the culture?
The first question we should ask is this: “Why does the Japanese culture, as
well as other Asian cultures, have the core ability to perfect skills and
maintain high quality?” Or specifically: “Why do Asian cultures perfect
designs for cost, quality, and reliability?”
To answer this question, I'll pose a theory first read in Malcom Gladwell's
thought‐provoking essay “Rice Paddies and Math Tests” [2]. Says
Gladwell: “If you look at Eastern cultures, a staple crop is rice. If you look
at Western cultures, a staple crop is wheat. This difference in staple crops
may explain why Easterners take time for perfection while Westerners find
it easier to create ideas, designs, and technology that are brand new.”
Gladwell made a correlation between the behaviors needed for rice farming
with the habits for good math grades. We're all familiar with the notion that
Asian students are better at math than Western students, right? It's a
stereotype. But the data does in fact show that, in general, students raised in
Asian households have higher math grades than their Western counterparts.
How might we explain the correlation? Both rice farming and math mastery
require diligence and a “never stop trying” mindset. Both talents demand
similar skills.
The following experiment was carried out to prove that, in mathematics, the
quality that directly correlates to success is tenacity.
A math test was given to a group of students from mixed backgrounds. The
test had several questions of varying difficulty. The time it took each
student varied, based on their abilities. If you plotted the time it took the
group as a whole, you'd get a standard deviation. So now we had time to
completion of the test and grade for each student.
At a separate time the same group was given a new test. It had a single
problem. But this single problem test was a different kettle of fish. Solving
it wasn't only hard; it was impossible. The problem had no answer, only the
test‐takers weren't told that. What happened?
Secretly, the observers were measuring how long each student would work
before they quit. In other words, they weren't really measuring people's
ability to solve the problem. They were measuring their stick‐to‐it‐iveness.
The conclusion: there was a direct correlation between tenacity and actual
math ability. Those who worked at the impossible problem longer had in
fact the higher math grades on the real test.
We could conclude that, tenacity equals good at math. Rice farmer equals
tenacity. We can then conclude that good rice farmers are going to be good
at math.
Many of our cultural behaviors are rooted in practices handed down from
our ancestors. These behaviors were passed along unintentionally, yet they
profoundly shape how we operate in the here and now.
Levels of Awareness
Awareness is everything. It's the first element of control. How important is
awareness?
I'd say it's so important that I'd prefer to be in a bad place and know why I
am in it than be in a good place and not know why.
If I'm in a bad place and know why, I can take actions to improve the
situation. If I'm in a good place and don't know why, it leaves me vulnerable
to change over which I have little control.
I'm sure you've seen a situation that degraded quickly because the people
involved didn't understand why things were going well. (When I used to
race cars, it seemed like every crash story I heard began with the phrase, “I
was feeling pretty good, then…”) If you were feeling good and then
crashed, I'm pretty sure you were missing something.
For a reliability culture to work, then, people have to be open and honest.
They have to be willing to share. When I begin an engagement with a new
team or organization, I feel a sense of relief when the leadership describes
not just their reliability strengths but their deficiencies as well. I'm
confident that this type of team will grow to a high standard of product
development.
What are some of the phrases I look for to see if they are aware?
“We do/don't have good control of our field data, because of…”
“We aren't able to differentiate between true design‐based failures and
ambiguous customer complaints or misuse.”
“Our products release without having completed reliability testing and
we don't know how to make that happen. The programs move too
fast.”
When we hear statements like these, we're working within a team that can
create a strategy that can be implemented.
I'm wary of projects that are under leaders who make statements such as:
“We know what the problems are. We just need someone else to prove
it.”
“We don't have time to do full reliability programs. Is there a test that
can say it is reliable?”
“Do you do ‘certificates of reliability’ for products?”
All of the above statements imply that the leaders are blindly confident.
They think there are no significant problems. Or if there are problems, they
can be understood by some shoot‐from‐the‐hip analysis. These are the
teams that are baffled at how they got sideswiped with all these “rogue”
issues that came from nowhere. “I was feeling pretty good, then…”
A company actually asked me that last question: “Do you do reliability
certificates?” I was like, “Yeah, here.” I took a Sharpie to a sticker, wrote
RELIABILITY across it, and handed it to him saying, “Stick this on your
shirt.” Then I left. OK, I didn't do the sticker part, but I did leave.
Just go ahead and make your own sticker, many companies do.
“Trail Rated”
“Field Proven”
“#1 in Quality”
“5 star reviews on websites”
Summary
Intent and awareness are the two keys to success. Both are easy to have, and
just as easily to lose. Like your keys and you phone. Grab them in the
morning and you have them for the day. Or do you? Not if you don't keep
track of them. We all do the pocket pat as we leave a room for a reason.
Without that extra step it's easy to find yourself missing one or both by
lunch.
The difference is keeping their status (keys and phone or Intent and
awareness) on a dashboard, and looking at the dashboard (or touching it in
the phone/key analogy). What would the inclusion of an intent and
awareness status look like for your dashboard? Would it be something new
or simply including them on one you already use?
References
1. Liker, J.K. (2004). The Toyota Way: Fourteen Management Principles
from the World’s Greatest Manufacturer. New York: McGraw‐Hill.
2. Gladwell, M. (2008). Chapter 8. In: Outliers: The Story of Success.
London: Penguin.
4
Awakening: The Stages to Mature Product
Development
Stage 1
This kind of reliability testing usually goes under the name “verification
and validation testing.” It tests if the design does what it is supposed to do.
Here, variability isn't much considered. Unfortunately, variability is what
reliability is all about.
The intent of these Stage 1 tests isn't about learning. The intent is about
passing. Of course, just trying to pass a test doesn't improve the product,
almost the opposite. When we're trying to pass a test, we do everything we
can to pass. We look at the easiest possible conditions, so things go well.
What can we learn? Nothing.
During this stage, the organization often experiences large field failure
surges. The pain of this experience – in terms of dollars, market image, and
lost resources – awakens them to the benefits of incorporating reliability
tools early.
Stage 2
In this stage, reliability tools and methods become a key part of the
program. You may bring aboard a reliability engineer from the outside. Or,
you could develop someone for the role internally.
Unfortunately, a significant number of planned reliability tasks get
truncated or postponed when time and money get tight mid‐program.
Because of this, the impact of the reliability tools becomes reduced.
The company still experiences unexpected field failures regularly.
It's a difficult stage to push through, because they're investing in reliability
and not getting much back. It's akin to someone who has started to exercise
for the first time. The effort seems high and the results are low. This is
when most people quit.
In this stage, there are indicators that show improved product reliability –
things like fewer issues late in the design process and easier transitioning to
a manufacturable product.
It's like standing on the scale and seeing you've dropped a few pounds even
though you don't yet look buff.
This kind of encouragement is what gets the team to Stage 3.
Stage 3
Here's where the importance of reliability comes into clear focus. It brings
together how the reliability program fits the product program and the
company's overarching business goals.
What happens in the field? We see products that perform better than any
previous generation.
We're now seeing clear muscle‐tone and the return is matching our
investment.
Stage 4
In this final stage, reliability becomes fully integrated into the program
process and culture of the teams. An efficiency emerges that makes the
reliability activities effortless while requiring little added investment.
We now have solid datasets. This helps analysis models drive design
decisions. This is design input without testing, which is very powerful.
The test methods and tools needed in development are already available and
do not require the expense of added time and cost to integrate. This is
similar to when product development programs create in‐house services,
like machine shops and prototyping labs, which were previously contracted.
The service is now right down the hall ready to serve the program. No red
tape, no waiting.
Other departments are familiar with the inputs and outputs to reliability.
Because of this, there is fluidity of information flow between team
members and departments.
Comparing Charts
Soon enough I was sharing these sketches with executives to help them
understand how things were currently operating, and more importantly how
I thought they should be operating. Next thing I knew, these charts were a
standard part of my assessment reports.
Simply put, these charts bring the intention of an organizational structure
into focus. (There's that word again, “intention.”) In other words, when it
comes to the reliability process, what is Jennifer accountable for? How do
her accountabilities affect Sam? What information flows between them to
support these accountabilities?
A standard org chart doesn't have much to do with accountability. Its main
job is to diagram the interactions of the team, based on managerial needs
and hierarchy.
An ownership chart, however, is all about accountability. That's its sole
function. The chart identifies who generates certain types of information,
who makes specific decisions, and who's able to direct, block, or pass all
this information.
If an individual can create information, they need to be identified as an
owner. A test technician's lab generates information. In a traditional
organization chart or role description, this technician may not be considered
accountable for test results. But what happens in reality?
I've seen test technicians report information directly to middle management.
The information they were generating was so critical to the program's day‐
to‐day operation that they, the test technician, would attend the Friday
program steering meeting, hosted by the CEO.
This critical lifeline of information wasn't documented. You couldn't find it
in any program plan. Unfortunately, undocumented means unprotected.
Ever wonder why a new program can have a rough start when the previous
program, using the same team, was a well‐oiled machine?
It's because things that were happening to generate meaningful results, like
a technician reporting information to a CEO, were never documented. A
small, undocumented practice like that likely won't continue in the next
program. That's because to write it down seems ridiculous. Those involved
may not realize its importance. Or, they may be embarrassed that a CEO
had to go to a technician to get critical information.
The CTO needs to stop both asking for updates and giving directives to
the manufacturing engineer and reliability engineer (first level people).
The R&D engineer needs to stop telling the CTO what is going on and
the CTO needs to stop acting like he likes it by asking more questions.
The CEO needs to stop asking everybody for updates on everything
and get a dashboard or something.
Why is a manufacturing engineer waltzing into the CEO's office to tell
him what he did today?
The accountability for information is all over the place. Obviously, the roles
as they are defined do not satisfy this program's needs. The information
flow doesn't have to follow the org chart, but it should make sense. In this
example, the second‐level managers are out of the loop on a good deal of
information on topics they are held accountable for. But you can also see
that it doesn't look like they are asking for much if their subordinates are
trying to find people in the chain of command to listen to them.
Communicating Clearly
In any organizational assessment, it's critical to listen closely to language.
Big organization, small organization – it doesn't matter. The thing that
moves information along is language, or dialogue.
Listen for the dialogue's intent (there's that “i” word again). The intent is
not the content. Take a step farther back and listen behind the words.
When you're handed a task list, it often has an underlying theme. But you
may not pick up on the theme right away. Simply completing the list's
action items, without looking deeper into things, won't make the list owner
satisfied. (“Did I really listen to what was asked?”)
OK, let's do a little “listening behind the words.” Here's a quiz.
In the following example, what's being communicated? What's the intent?
“Go straight for two blocks. At the stop sign, make a left and go two
more miles. On your right will be the store you were looking for. The
building is white, with a red sign.”
The intent is to provide information. Every word provides information
concisely. Nothing extra was added.
Now, listen to these directions.
“Go straight for two blocks. On the corner is a cafe with the best
pastries. I don't know how long you’re in town, but if you can get there
some morning, I recommend you grab a croissant or even just a bagel.
Anyway, take a left and go two more miles, and you'll see the store on
your right. But, in case you're interested, I'd also look in Simon's, which
is three blocks farther. Their selection is bigger.”
With those directions, the intent isn't as clear. There's an interest in
socializing. Giving directions was of secondary concern. Their clarity was
sacrificed with so much peripheral information added.
The listener will miss important details. The value is diminished, because of
the secondary objective. This flips the priorities around.
What if the direction giver answered this way? “Go straight for two blocks
and turn left.” They then abruptly walk away. I'd say they don't like
strangers or are tired of all the tourists in town. They are getting rid of the
listener.
Passing along information was never their intention. What was their
objective? They used authoritative jargon like a protective shield. That's
what a politician does. They're careful not to say anything that they could
be held accountable for later.
My conclusion is that this person, who could have considerable knowledge,
won't actually contribute, because they're scared. Scared of one of two
things.
They're either operating in a culture in which those most likely to survive
are “yes men” (an authoritative culture). Or, they may feel they're not a
valued contributor, so their confidence is low. I've seen this in situations
where an innovator is in a non‐innovative process – a fish out of water.
I can determine which of the two reasons is at play through a facial tell.
That tell is a smile.
If there's no smile, this person is likely concerned about management's
imminent reprimands. They can reference their seriousness when issues
arise. It works like a “Get out of jail free” card. It's an “I told you so.”
If they do smile, they're concerned about not being a true contributor. They
want you to listen to their important contribution, but they then want you to
know that they're just trying to help, “I'm not looking for any trouble, just
offering some wise counsel.” They're effectively backing down from any
challengers in advance with that smile.
The next clue to understanding the full situation is to look for repetition of
this empty talk in the group. The repetition indicates if this is being driven
by the person themselves or by the culture. The culture may be the result of
a particular leadership style.
There are many examples of this leadership style outside of corporate life.
Politics is a place where the “strongman” technique is often found. One
example that was captured on tape is when Saddam Hussein made his move
to take over his political party. The video seems like a scene from Scarface
or The Godfather. It's not indicative of what happens in corporate dynamics,
but it's easy to see the parallels.
In the Hussein video, there is an auditorium full of men. They're listening to
speakers from their political party. Hussein approaches the podium and
introduces an individual who'll speak next. It is a party member who's going
to confess to plotting to overthrow Saddam Hussein. He then lists those
working with him. As he recites a name, the corresponding individual is led
out of the room by two armed guards.
While this happens, other party members stand. They declare their loyalty
to the party and to Hussein. Saddam sits quietly, leans back, and puffs on a
cigar. He just took control of the party.
Have you been in a meeting surrounded by yes men? “I agree.” “Me, too.”
“I agree even more, and I'm going to restate what was said already, but with
more conviction.” You can almost smell the lit cigar.
If you witness multiple people talk in this manner there's a strongman in
your midst. Scanning the field for people to make an example of. If you
hear talk of this type and you're the leader, you may want to take a look at
how you deliver your leadership. I've met leaders who have no idea this is
what they've turned their product culture into.
My Personal Case
I have hired many consultants while being a consultant: writing consultants,
graphic art and web consultants, mathematical model consultants, market
positioning consultants, and business strategy consultants.
A single moment of growth surprised me more than the rest. It came from a
business strategy consultant. He told me, “Your biggest growth problem is
how you speak.” It turned out I wasn't such a good talker.
The consultant told me there was no reason for us to continue with our
business strategy work if I didn't improve how I spoke to leaders. He was
going to quit on me if I didn't fix that first.
He was right. I had enlisted his services to improve making direct
connections with senior executives. I figured what was needed was to
simply create a message that would help executives “get” these principles.
But that was backwards. They didn't need help listening. I needed help
explaining.
My consulting company began as an engineering firm solely. I sold to and
worked with engineers. Easy‐peasy, I can talk to engineers all day long.
Hey, as I'm writing this, I'm wearing a t‐shirt that reads, “MAY THE
d/dt(mv) BE WITH YOU.” Get it? You might not, but my people would.
This is how I pull them out of the crowd.
I spoke the engineers' language. I knew what they were thinking. I was
them and they were me.
On the other hand, executives weren't “my people.” I was never a CEO or
had managed a thousand‐person organization. I didn't know what they were
thinking. But that didn't mean I didn't have the information they needed.
Katie's success was based on beautifully articulating the message. She knew
she had to speak “adult.”
My Own Experience
My job was to ensure the best reliability practices were in use. I was
operating at about the equivalent of a director. This engagement was
initiated by the senior VP who had sought me out personally. But in the
end, I should have been advising him. Directing the team wasn't enough for
this program to succeed. We were not operating in the best program culture.
I think today he would agree with this statement.
I was directly engaged with multiple engineering teams. They were in
Massachusetts, New York, Utah, and France. Only the New York and
Massachusetts teams had worked together previously. The team in France
was part of a recent European acquisition. An added challenge: they were
not only in a different geographical culture but they had a business culture
that was different from the rest.
The Utah team was a separate division that, for the most part, hadn't
engaged with other divisions before. They in fact had excelled without
much input from the outside. Having to work with these other divisions,
then, bothered them. They responded with a bit of hostility.
If you've ever been to Utah, such a response is a bit surprising. They're
mostly outdoorsy people who love nature and talk about the natural aura
connecting us all. At least that's who I engage with when I go to the
Canyons. These people I was working with didn't seem like Canyon hikers.
They gave off more of an NYC taxi driver vibe.
The disdain from this group was so palatable that after my first conference
call with them the Massachusetts (Boston) team apologized for the Utah
team's behavior. Stop for a moment and think about that. These are Boston
people thinking that the other team was rude. Have you ever been to
Boston? We punch people in the face for messing up our coffee order.
I was the first person brought in to begin the reliability initiative for this
project. The organization had no precedent for this type of project. They
had a reliability department, but they had only focused on less complex
products. For them, this was new territory. I had experience with a similar
type of technology in reliability programs.
I started the engagement by having them do a few direct hires to build the
group up. Some were dedicated reliability/design engineers, a few
technicians, and a reliability project manager to be the day‐to‐day leader.
By about a year all of those people, except for one technician, had left or
been asked to leave. The project manager had a meltdown and was asked to
leave shortly after. It was a “show results or you're out” kind of atmosphere.
I don't know why they kept me and the technician on. But we did have a
whole office area to ourselves, so that was nice.
After the purge/exodus, I built the team up again. I pulled in some day‐to‐
day leadership from another program and a few new hires. The technician
that stayed eventually ended up leaving on his own accord as well. He had
had enough. He said he just couldn't stand working with his hands tied. I get
it, the ropes chafing me, too.
So I was the only person who was consistently part of the reliability
initiative from the beginning. When the reliability program continuity is
based on an individual who technically doesn't even work at the company,
that's a big red flag.
The design doesn't work and in this phase there won't be a redesign.
The design engineers had been told to work on tasks other than
reliability.
There was no money allocated for units to test.
The contract manufacturer making the test fixture was late.
The safety team stopped us and wants a redesign of the test fixture.
But at the end of the day the VP knew that these were all localized reasons
(excuses) for specific instances. He wanted to know why this was occurring
across the board. It couldn't be that there weren't units to test, no engineers
with bandwidth to participate, no designs working well enough to be tested.
What was it?
It wasn't until about three‐quarters of the way through the project that I
realized what the reason was. Why the VP could walk around every facility
and see almost zero tests being run. The issue here was that the pressure
from above was shaping what “failures in test” meant: “engineering
failure.” Failures weren't valuable information to assist with improving the
design, as they should be. Failures were a spotlight with a flashing neon
sign that stated “You are holding up the program!”
This was why there were almost no tests being executed when systems were
available. As a designer or engineering manager, the perception of a failure
in a test was a serious deterrent. Why start an activity, testing, that almost
certainly was going to result in your having a target painted on your back?
Summary
Awakening is not an absolute. It is a term that describes a comparison
between a past state and a new one. Before, we had a level of observation of
our surroundings we considered representative of reality. But what we can
see changes. Eventually, our perspective of what is real is different.
This is just part of the human condition. We can all look past to 1 year ago,
2 years ago, or 10 years ago and see how we have progressed in wisdom,
awareness, and understanding. What I hope you take away from this chapter
is that with the action of measuring the degree of awakening we can track
progress and then change what we are doing to keep it at a steady or even
faster pace.
Side note (I can't resist): would tracking the rate of progress of awakening
be Awakening: d(awareness)/dt. So a team's total value(score) of awakening
would be
Calculus joke, but I actually think that could be used in comparing if a team
is improving the improvement process. So if the integral of your current
awareness growth is greater than your past period of awareness growth, you
are “improving” your improvement process. I say we slap that one on the
program dashboard. Hmm… look for that in the second edition.
5
Goals and Intentions
Testing Intent
In many program cases I see teams “testing to pass” when they should be
“testing to improve.” Testing to pass is putting your best foot forward. Put
on your best suit or dress, comb your hair, smile big, and always give a firm
handshake. There is a “mark” and you are going to hit it so you can advance
to the next stage.
Testing to Improve
Testing to improve is looking for defects and “performance response to
variability.” By “performance response” I mean failures. Let's not sugar
coat it. These failures occurring in the team's hands make up a key element
to improving the design.
It's more akin to going to the doctor's office and wearing that gown that is
very convenient for examination. Not taking off your street clothes in that
situation is just wasting everyone's time and makes the whole learning
process take a lot longer. That is what I see many design teams do. Test the
perfectly hand‐built units, use nominal stress, and throw out data that
indicates issues have occurred, often calling it an “outlier.” You just threw
out gold! That failure is what we were hoping to find. The customer is
going to find it next.
Quick Question
If the program pressure is to release on time and your personal annual
review is right after product release would you…
or
B. Make sure it passes each test with flying colors, not being accountable
for delays in product release?
OK, now let's say you are designing a new plane and your family is going
to be on the maiden flight. Would you?
or
B. Make sure it passes each test with flying colors, not being accountable
for delays in product release?
That's the difference between “test to pass” and “test to learn.” The
challenge is for leadership to manage their teams to be owners of the
design, the same as they are. If the team is testing to pass, it is usually due
to no error of their own.
A team's personal success or even household income depends on it. The
difference between most product development programs and the airplane
example is their family isn't on the plane. Their family just needs that
paycheck to pay the mortgage.
I originally created the “family on a maiden flight of a new airplane”
example to demonstrate the “test to pass” principle many years ago. But in
a way that example became real life.
In 2019, there were two Boeing 737 crashes that killed all on board. The
737 was a newly released design. If you are reading this far in the future
and are unaware what happened, this is what happened. The root cause is
simply that Boeing pushed product development programs so hard with
“time to market” goals that poor and untested design was going out the
door. At the end of the day all the analysis on software and system errors
boils down to one simple statement, “Boeing let market pressure drive bad
engineering practices.”
These product time‐to‐market pressures grew to become such a gargantuan
force in development programs that not even a situation where families'
lives being at risk was able to counter it. We need to change how we
balance our product goals through the product development process now
more than ever.
It sounds ridiculous to suggest the words “fear” and “ownership” could ever
be mistaken or considered equivalent. But so many leadership staff seem to
act as if they are. When they get all “tough guy” and demand a subordinate
accomplish a task, are they transferring ownership or just creating fear? It's
just fear, fear of being held accountable, fear of expressing needs to their
leaders to accomplish the goal, focusing on collecting evidence of no
wrongdoing to defend themselves in an upcoming trial when things go
wrong.
Ownership
Ownership is different. Ownership of an outcome results in an individual
doing all that's possible to ensure the result is achieved. Uneducated leaders
think the fear route is a quicker path to the desired result. It's not.
I've seen this dynamic drive a team's testing. They test and collect evidence
solely as a means of covering their backsides when things go wrong. That's
messed up!
If the company owns the design and ultimately its performance, why hasn't
that same ownership been transferred to the design team? Simple: it wasn't
transferred. Laziness, uneducated leadership, doing it the way we have
always done it. It really doesn't matter why. Rice farmers don't act this way;
they can't.
Fear‐driven Testing
There are some hallmarks of fear‐driven reliability initiatives that are easy
to spot. A classic indicator of fear‐driven testing is, “We need to do
vibration/temperature/humidity testing.” Why? Why do you need to do
something so nondescript? To say you did it. They're just checks in a box
you can later point to: “We did vibration testing.” It's the equivalent when
business strategies are based on “synergy,” “collaborative style,” “crowd
sourcing,” or “interconnectivity.” It's empty.
Leading a department earlier in my career my colleagues and I were
submerged in a culture of empty initiatives. I knew we had totally given up
when we started playing “Bullsh** Bingo.” This is how we played it.
Each of us would pick five popular buzzwords we expected to hear in the
next meeting. If all five of your words were mentioned, you won that round.
The game had two additional rules: you weren't allowed to say the words
yourself and you couldn't lead someone into saying them.
If you're a part of a leadership strategy initiative, ask yourself the following:
“Can we connect the initiatives to a clear intent that delivers a product or
program goal?”
If this connection isn't clear then it should be pointed out immediately, and
addressed. Here is a way I direct a team to re‐evaluate what they're doing
when a test initiative is laid out and I suspect we aren't doing it for the right
reasons.
Team:
Should we do vibration testing?
Me:
I can't answer that without knowing why we'd do it. Let me ask you a
few questions to get us there.
Until we start at the beginning and identify what you want to know, we're
doing testing for testing's sake. That's called a “phantom testing program.”
Transferring Ownership
Peter Drucker said, “Leadership and management are entirely different
concepts: Leadership is doing the right things. Management is doing things
right” [1].
If we want to become leaders, then, we have to do the right things. And to
do the right things, we must take ownership.
A funny thing, though? As leaders, we often must transfer ownership to the
people closest to the process. The reason? We're too far away from the day‐
to‐day workings, and we can't provide enough guidance. But the people
working closest to the process can.
In transferring ownership, we can't make assumptions and keep important
directives to ourselves. We have to make it clear that nothing matters if the
program goals aren't met. Game over for the product and the success of the
business.
There is no trial or inquiry after the failure. Simply, they must make it
happen, not because of perceived consequences but because of real
consequences.
Successful Transference
I was invited to work with a team that was developing a next‐generation
product. It was centered around a laser system that did things I only
believed existed in quantum physics books. I know lasers. I still couldn't
believe this thing existed as I lay in bed that first evening. Manufacturing
this system cost $2.5 million. Then, there'd be a substantial markup for
anyone who would like one. They had someone who wanted one and they
wanted it NOW!
The design wasn't entirely new. It was their flagship product with these
recent advancements added. The new model worked great in the lab, but
was it mature enough for the field?
The executive team knew me from several previous engagements. They
wanted me to assist with this pressure‐cooker of a project. Why was it so
intense? Why did they want me to contribute at every level, from the
technology side of things to its project management?
This project had the hallmarks of a project management nightmare. The
program had to be completed rapidly, it couldn't be allowed to fail, and it
featured new technology that was only marginally stable. That old adage
“Fast, reliable, cutting edge: pick two” wasn't going to be upheld here. They
wanted all three.
My assignment was to answer these two classic reliability questions:
“When will the new features wear out?” and “What is the rate of random
failure?”
The product was commissioned by their largest customer. Simply put, the
stakes were high. Their product was used in their customer's production
process. If their product failed in the customer's hands the production line
would stop. And this production stop could be measured in millions of
dollars per hour.
Obstacles to Transference
Here is where the insanity of this situation became evident. The effort to
release this next‐generation product – which held my client's very brand
value – was in a program with an almost comical limitation on time and
material for testing.
Simply put, I thought leadership was joking when they said we couldn't
have a complete prototype system to test. No, they wanted their very
important customer to receive the first unit ever produced. Otherwise, we
wouldn't make the deadline.
For a split second I thought, “Wait, was I hired to run an impossible
program and be a fall guy?”
This unrealistic program framework was created because the CEO didn't
see the program the way I did. He saw a mature field‐proven product that
was getting “a simple upgrade.” He believed that the calls for “all this
demonstration testing” were just nervous engineers wanting to cover their
butts.
The company had been producing products like this for decades and this
one was 80% of a well‐proven design. In addition, the new technology had
already been shown to work on the bench in the lab, so what was the
problem? Just put it in.
What the CEO knew was that a competitor was breathing down their necks,
waiting to slip into their spot with the customer. He wasn't going to give
them that chance.
So the product development team was left with no opportunity to explain
that “just because a new technology works on a bench and constitutes a
technology change of only 20%, this in no way guarantees reliability.”
And here's the kicker. This product, which costs $2.5 million to produce,
had 30% of its cost in the new technology components. You may not know
this, but to demonstrate even the most minimal confidence in a reliability or
life goal, you have to test not just one product to full failure but over 20!
Clearly, this was not going to happen. We needed a plan B.
Successful Intervention
How do we do this? What do we ask for? Why am I sweating so much?
The test program wasn't going to work if the yeses and noes of reliability
came from leadership. Whether the product was truly reliable had to come
from the team.
If I hadn't intervened, leadership would have ordered the team to follow
their edicts, and the team (having no choice) would have obliged. The result
would have been an incomplete test program, yielding no genuine result.
Reliability engineers call these “window dressing test programs.” They look
nice but don't accomplish anything.
It was clear to me that the only way to create authentic product confidence
was by transferring ownership to the team. What, in this case, does
“transfer” mean?
Remember, you can't transfer something you still hold. When it came to
directing how to reach the goal, leadership would have to let go. The team
would have to be given freedom. It shouldn't have to answer to anyone as to
how it did what it did. The responsibility should be theirs.
In this case, the result was one of the most unique testing programs I've ever
been a part of. The technical team knew where the risk was better than
anyone, so they became the goal's new owners. The program they created
was spot on. Following someone else's directive would have bypassed their
valuable knowledge.
The moral of all those stories: good things can happen when ownership
extends to individuals on the frontline. The writers know we have all been
frustrated when leaders don't listen. That's why we love those stories.
Kneejerk Reactions
That's how programs are managed when we aren't monitoring all the goals
all the time. Guiding by all the goals all the time is a driver who just
smoothly maintains the center of their lane. It almost looks effortless, like
they aren't doing anything. But in fact they're making thousands of small
corrections continually.
An example of a program with the elements not correctly monitored is
when we have a spike in field failures due to a rushed design process.
Someone may have given time to market too much attention earlier. What's
the kneejerk reaction to this? Pull resources off of the new program to
support fixing the field failures. What's the consequence?
The new program is now off track, under‐resourced, and behind schedule. If
leadership doesn't let up on the time to market, we can expect a poorly
developed new product to be sent to the customer.
Are we really going to act surprised when the new, underdeveloped, product
has a sudden spike in field failures? Of course we are. Wash, rinse, repeat.
Summary
The problem here is not just about being frantic. The lack of efficiency is
brutal, and grows exponentially. Getting programs back on track is 20 times
more resource‐intensive than if the same tasks were planned.
For some reason we decide this is all normal and just keep moving forward.
It was all in such slow motion that we didn't connect the poor profits, top
talent leaving the company, and lost market opportunities as the casualties
of the poorly steered process.
The solution is that all the goals have input through the entire program.
There are two key elements to this. The first is the entire team absorbing
responsibility for the complete program goal. The second element is the
Bounding methodology that ensures all of the needed inputs for even the
smallest steering input are present all the time.
We talk about the Bounding method in upcoming chapters. You're really
going to like it. It's easy to apply, and you can put it in motion immediately.
References
1. Drucker, P. (2011). The Practice of Management. New York: Routledge.
6
New Roles
For an organization to deliver reliability effectively, it must create a few key
roles. Most of these roles will be temporary, and will only exist for the life
of the program.
Which roles am I talking about? These three:
None of these three roles is its own job. In other words, you're not hiring
change agents, reliability czars, and facilitators. Each role is filled by a
current team member who already has another primary role, like engineer,
department manager, or program manager. They're ready to put on one of
these secondary hats, however, when the need arises.
Let's look at each of these roles, one at a time.
Reliability Czar
The czar is cool. They're a conduit through, or even a portal to, the
organization chart. They make sure critical information gets shared between
departments and levels, so important packets get to the places they need to
be, immediately. A czar, for instance, will take a message from lab
technicians and make sure it gets directly to the CEO. Without the czar,
such a seamless transfer of information isn't possible. Especially when it
comes to jumping levels like that.
An important part of the czar's role: they need to be steeped in technology.
Why? They'll need to have meaningful conversations with engineers,
technicians, scientists, and designers, and unless they're technologically
sophisticated themselves, these people won't want to speak with them.
The czar also has to be trusted by the organization's leaders. I should be
clear in who I mean by “leaders.” The leaders I mean are decision makers.
They don't have to check with anyone to make decisions regarding schedule
or resource changes. If a leader asks that two months be added to the
schedule or $100 000 be added to the budget, that's what happens. A leader
may be the CEO or the president. They could also be a VP of R&D. It all
depends on how the organization is set up.
Let's look at the czar's role more closely. It serves three functions:
Function #1: it acts as a link between the test bench and leadership, so
observations, discoveries, and issues can be easily shared.
Function #2: it helps leaders give the hands‐on team members direct
input.
Function #3: it ensures that the raw information is distilled down to its
essential elements. (How does this happen? Well, if information is
being passed back and forth, the parties involved have to get good at
finding and articulating the crux of an idea, so it gets shared
accurately.)
I think it'll pay off if we examine each of those functions, one at a time.
Direct Input
OK, onto the second function: “helps leaders give the hands‐on team
members direct input.” What's this about?
It's Function #1, but in the opposite direction. Function #1 was about how
the czar and project team gets information to leadership, and Function #2 is
about how leadership asks for and gets the information they want. It's push
vs. pull.
The second function of allowing the decision makers to have direct input to
the hands‐on team is critical. That's the reason CEOs and presidents should
be particularly interested in creating this position.
Recall that story about the VP in Chapter 4? That VP couldn't find out why
the engineers and technicians wouldn't test the new sub‐systems, no matter
how many times he asked or how serious his threats were. That's a
frustrating position to be in. Even after the project, he still couldn't figure
out why he was always in the dark.
If he had had a czar, he would have found his answer within a few hours of
having asked the question.
In many organizations, a request that a leader makes will be translated two
or more times before being received by the individual who can take the
necessary action. That's a lot of filtering and adjusting for personal interests
before anything meaningful happens.
Distilling Information
The third function is about collecting and distilling information. Some
people call it “data scrubbing.” You don't change the information, but you
organize it and make the discrete messages clear, so that leaders can
understand them. This, too, is the czar's job. They don't pass along an
informational blob. They present it in a way that makes sense to the leader.
Figure 6.1 shows how the czar transmits information.
There is also the scenario where leadership trusts the czar but feels they are
not providing valuable information. Like when my daughter used to
interrupt me when I was on an important conference call to tell me things
like, “Some cats like milk but Nicole just said her new kitten won't drink
milk.”
I am sure that is a factually correct piece of information but talking with the
VP on the other end of this call about why the FDA wants to know why our
“essential performance factors” don't match the definition in Standard
60601 is the conversation I need to be having right now… I just say, “Hold
on. That's really unusual, honey.” Steve's either not going to take my
counsel too seriously going forward or wonder why I called him “honey.”
Leadership has skin in the game with having the czar because they are
potentially creating some strain with their direct team. The czar is someone
providing information that their team cannot easily control or filter. The
return must exceed the cost to be maintained.
For leadership I can't give any specific guidance on what to do. It is simply
a personal call if you want this role to continue in your operation. It may
have greater value in some programs than others. With regard to the value
of the czar to leaders it is up to you to monitor and adjust if they perceive
the role as valuable. One metric to monitor is the push vs. the pull of
information. It is a valuable relationship if leadership is “pulling”
information. If the relationship transitions to the czar “pushing” information
on leadership then it needs to be re‐evaluated and adjusted before it is
terminated.
I discussed this dynamic a bit in an earlier section but it is worth reviewing
again. The difference between a leader “pulling” information and having it
“pushed” is simply this: if leadership is “pulling” information, a majority of
what the czar is sharing is based on requests, from leadership. If the czar is
providing standardized reports to leadership for most sessions then they are
“pushing” information. I was once told by a mentor that unsolicited advice
is of little value. It is only serving the person giving it. Advice you have
solicited is what you want and so has high value. This is very relevant to
this relationship because it is a relationship based on advisement.
Role of Facilitators
Facilitators are often undervalued. A facilitator should be the captain, but
are often demoted to scribe. As the team’s captain they conduct how the
team interacts with the analysis tools and each other. What's a ship without
a captain? A group of people doing stuff aimlessly. That's what we see
happen when an initiative doesn't have a facilitator: nothing. An analysis
guided by committee is rarely a good thing.
One of the reasons we create teams is that they provide differing opinions.
This is a beautiful thing. Differing opinions create diverse information.
They create more information, which is good. But we have to do something
good with all that information, but left untethered these differences are a
chaotic collaboration with tailspin after tailspin.
The facilitator is fundamental to ensuring the varying perspectives of the
team are woven into a single directive for the program. A team with a
facilitator has a captain who coordinates and extracts the valuable
difference of perspective and insightful input.
A facilitator may drive a workshop, a Failure Mode Effects Analysis
(FMEA), a brainstorming session, or a root cause analysis session. Each of
these activities has specific deliverables that are outlined in advance. The
facilitator's objective is to have arrived at those deliverables at the session's
conclusion.
Facilitation Technique
When facilitating, there are a few important techniques.
During a Design Failure Mode Effects Analysis (DFMEA), where should
the facilitator place their focus? They need to get collaboration from a
group of analytical individuals. That's really hard. Really, really hard. Each
person is deep in their own head and following an idiosyncratic train of
thought. The facilitator has to carefully extract that thought, so it can be
examined without unraveling.
Pulling those trains of thought out of each person's head and dropping it on
the table in a manner others can understand is no small task. In the
DFMEA, there are elements of data and technical analysis, vested interest
in outcome, and pride. Without a leader navigating the group toward a
common goal, an effective outcome would be impossible. It may not be
evident on the surface, but this is creative brainstorming.
The facilitator has to read personalities. They can't let one person dominate.
Especially an extrovert. If an extrovert is untethered and left unchecked, the
whole session can go down the tubes. It's not that the extrovert is being a
bully. It's just that they can seem overpowering by their sheer natural
enthusiasm. The extrovert expects others to respond as they would. Just
interrupt and blurt it out.
A talented facilitator will pull the extrovert back in a way that doesn't shut
them down. The facilitator will interlace contributions from extroverts with
participants who are more reserved and need coaxing.
We don't want to shut the extrovert down. What a good facilitator aims to
do is get the other more reserved members interacting in a manner similar
to the extrovert. It will just take a combination of the facilitator's
aggressiveness and the introvert's insight.
An introvert may give quiet visual cues that they want to participate, or are
frustrated with the dialogue. By nature, introverts are usually paying close
attention to body language and nonverbal clues, and expect others to do the
same. If an introvert strongly disagrees, they may look away or roll their
eyes. This may not be effective and may go totally unnoticed by extroverts.
A good facilitator scans the room and observes body language constantly.
They're not looking for participant happiness. What they're looking for is a
change.
A change in a participant's body language could indicate a new thought or
emotion, and it's the facilitator's job to decide if that change should become
part of the discussion.
Some people sit with their arms crossed and a stern look on their face even
if they like what they're hearing. Others will smile and nod even if they
think you're speaking garbage. Some signals are subtle, such as an
individual's blinking multiple times when they're uncomfortable.
There are a few common introvert and extrovert attributes:
Introvert Checklist:
Works best when alone.
Listens in groups and does not regularly share their own ideas.
Energy is drained by group interactions.
Speaks concisely.
Needs advance notice for requests or assignments.
Extrovert Checklist:
Includes feelings and background thoughts when communicating.
Speaks before thinking or works out idea while speaking.
Looks to others for inspiration and enjoys group work.
Energy level increases with socialization.
Willing to work on a task “live” with no advanced notice.
To be a useful facilitator, you may need to be a bit different than you are
normally. I don't mean being a phony. I just mean that you might have to be
a bit more directive, like a traffic cop. If making that kind of change strikes
you as abrupt, you could warn the attendees ahead of time. I often start a
facilitation session by letting people know that the nice friendly Adam is
going away for a while. It's “facilitating Adam” who is going to lead the
session.
In my role as facilitator, I'll interrupt and instruct others to contribute when
they don't want to. I may refer to earlier parts of the conversation and insist
we discuss them. To an outsider, I might look like an alpha jerk. That's OK,
though. You have to put aside gentle ways to get the job done.
Creating a Narrative
A facilitator also translates all the dialogue into a coherent narrative. In
essence, they're functioning as a live editor. That's one of the most
important reasons why the facilitator is not the session's scribe. This may
seem counterintuitive. But it's disabling being the scribe. The facilitator is
too busy keeping track of storylines and making sense of the whole to get
bogged down in minutiae. A separate team‐member‐as‐scribe is mandatory.
In sessions like DFMEA and brainstorming meetings, the facilitator is
actively trying to extract unique input. They are also analyzing the input in
real time, looking for patterns. The input should at times conflict. Universal
agreement among team members is suspicious. Sometimes this conflict
creates even more input – this is good! The facilitator may even stir up
conflict, but then distil it to what's essential and most valuable.
The facilitator has to sense when the play isn't advancing. If there's a
diminishing return on the energy and time going into the debate, it should
be stopped and taken offline. The team's momentum can be slowed if 80%
of the team is being dragged through a discussion they don't understand.
The interested parties can continue their discussion offline and bring the
conclusion to the next session.
The facilitator is also the PM. They have to manage the initiative's most
critical resource: time! If time runs out, the team won't be able to complete
what they need to. Time is also a cruel mistress. More time lets you do
more. More time also drains morale. We all have other things to do.
It's astonishing how much energy it takes just to be in a meeting. Simply
sitting in a room and thinking can be exhausting. I guess four walls and
humming fluorescent lights aren't natural. So read the room and suggest
breaks when a recharge is needed.
A difficult part of the facilitator's role is getting the group started. All good
facilitators have a set of starter questions to trigger engagement. These
questions can also help you keep the participation balance between
extroverts and introverts. Engaging the introverts with intriguing questions
should have them firing off ideas at the same rate as the extroverts.
What are some common starter questions? They fall into different
conceptual groups.
You have observational questions: “What about this issue do you notice?”
and “What deliverables are we aiming for?”
You have reflective questions that push individuals to think instead of
simply observe: “What does this remind you of?” and “What about this
issue do you find problematic?”
Then you have interpretive questions, which also get participants to
evaluate the task at hand: “What does this mean?”, “What could be
different?”, and “What more do we need to know?”
Summary
Roles are critical for any initiative with a team and a goal. We all know that
any personal or professional project can come to a complete standstill if we
are both stepping all over each other and at the same time there are
important actions being forgotten.
Product programs, of course, have roles. What I am asking in this chapter is
do you have the right roles? I discussed the importance of three specific
types of roles. These don't have to be applied. What is more critical is
simply evaluating the process and seeing if the tasks needing to be
accomplished and ownership required match what people are being held
responsible for.
7
Program Assessment
Measurements
“If you want something to improve, measure it.”
I first heard that saying when I was a new engineer. I'm not certain of its
origin, but it's clearly a derivative of Peter Drucker's phrase, “What gets
measured gets improved” [1].
To a new engineer, the idea of simply measuring something to improve it
sounds a bit simplistic. Similar to the homily, “An apple a day keeps the
doctor away.” That's because young engineers tend to think in logical – not
emotional – terms.
To my surprise I soon found out that it was in fact true. Apples are magic!
No, no, no. I mean that measuring does indeed create improvement.
Well, actually, both statements – the one about measuring and the other
about apples – are true and they're true for the same reason.
They get us to focus, and once we focus on something we can make that
thing better. It sets off a chain reaction of steps that brings about
improvement.
If you eat an apple a day for reasons of good health, it forces you to think
about your health for a moment each day. Doing so will almost certainly
improve your health. “If I'm bothering to eat an apple instead of a candy
bar, maybe I should go for a walk to get the most out of the apple?”
It's the same thing in engineering. The way to improve things happens as a
byproduct of human nature.
If a robot is tasked with measuring something, it simply provides the
measurement. If a human is asked to measure something, that person will
provide the measurement and also be curious about what the value should
be.
Once they find out there is an optimum value of what was measured, they
will likely want to achieve that value through improved performance.
Measurement was the spark, and human desire for excellence is the fuel.
What are some reliability performance types and how would we measure
them?
What to Measure
If what is keeping you up at night is your entire product line wearing out
and failing in the field, then you should implement an Accelerated Life Test
(ALT) (Figure 7.1).
If the customer is being too rough or the operating environment too
unpredictable then measuring with a Stress Margin Test (SMT) will provide
the information you need. “Will the product still perform in these extreme
conditions?” Figure 7.2 shows how the lack of margin for a specific
assembly will drive a percentage of failed population. The portion of the
population with the weakest features used in the highest stress application
will fail.
Often with SMT results the areas of risk are addressed in two ways. The
first is by creating more margin. This can be done by moving the
distributions further apart. Moving the distributions further apart can be
done by increasing product strength or reducing the range of applied stress.
The second is to tighten up one or both of the distributions. This would be
measured as a lower statistical standard deviation in the distribution.
Reducing the standard deviation can be done by improving manufacturing
quality or reducing the variability of stresses that can be applied when the
product is in use. Figure 7.3 demonstrates these two strategies.
Selecting who to survey is important. If you only select people who are
available currently or you choose too narrow a cross‐section of the team,
you're unlikely to gain a true understanding of how reliability functions in
the organization.
Selecting who to survey is important. If you only select people who are
easily accessible or team leads, you're unlikely to gain a true understanding
of how reliability functions in the organization.
Begin the process with a preliminary assessment of the organizational
structure. What roles exist and how do they interact with each other and
program deliverables? With this information now available, create the list of
individuals who should be interviewed first. I always find this is only the
beginning. Each interview leads to two other new interviews that get added
to the list.
I've conducted surveys where the engineering leaders were certain that the
organization's business decisions were based firmly on a product's
reliability. Only to discover that the organization's sales and marketing
leaders barely paid attention to reliability. Instead, these leaders assumed a
high failure rate was par for the course, and that slowing things down by
making the product more reliable would be a net loss. A more reliable
product that is later to market was only lost sales in their eyes.
In surveys like this, the report often looks like I have assessed two
completely different organizations and combined them into one report. One
“organization” is completely satisfied with the reliability performance,
completely unaware of the trail of disaster in its wake. The other
“organization” is exhausted from firefighting and barely keeping regulatory
organizations from shutting things down. The look of shock on each of their
faces when I share the other's perspective is something to be seen.
This leads us to the other benefit of an assessment, less tangible but equally,
or more, of value: connections. Sure, the assessment yielded insights that
improved the process. But the benefit in simply creating connections
between these groups, which although in the same building might as well
have been in different countries, is tremendous.
They not only get the full picture to see who was pushing on the other end
in the opposite direction, unknowingly, they also create long‐standing
communication paths that ensure this distancing does not occur again.
The Team
Some of the roles I try to get on the survey include the following:
Don't stop there. That is only the starting list. If you believe the person at
the receiving dock can offer insight then interview them. You really have to
think like a detective. The doorman may have something that is a key part
of the case. To prove my point the receiving dock loader was not a random
example.
In one assessment I completed, I found that many people in both quality
and engineering didn't know much about the product's journey between the
customer complaint and the returned product that arrived in the lab. This
product that made it back to the team for root cause investigation was
immensely valuable. It held all the clues. I was not satisfied with this gap in
knowledge. A lot can happen between “here and there.” “Here and there”
being any quantifiable distance. It can be one end of a lab to another or a
medical tent in Africa to a hospital in Italy. I've seen a thumbprint be a root
cause.
So what did receiving know that the rest of us didn't? It turns out that the
hub that organized the returns would just throw all of the unpackaged
devices into a large cardboard pallet‐sized box. The box was then shipped
back to the factory. These units were removed from the box and then sorted
by serial number.
The quality engineers received a nicely organized set of return items, in
serial number order. They had no idea that half the damage “clues” were
simply from the units freely bouncing in a large crate as they found their
way back to the factory.
The product was basically receiving the worst beating of its life just getting
back home. The product was a very delicate electronic device. This is a
quote from my summary, “This process is akin to dragging a patient behind
the ambulance.” The team had been chasing their tail with bad root cause
data for years.
The survey topics should be prepared in advance. Yes, I do like to let my
findings guide where I go next, but that's no reason to not have a solid plan
up front. There should be at least 30 questions prepared. The questions are
derived during the pre‐evaluation. The pre‐evaluation itself doesn't have to
be well planned. It's primarily a “survey of the land.” By simply talking to
one VP for 15 minutes, an engineer for 30 minutes, and a service person in
the hallway, much of what needs to be delved into will come into focus.
The Topics
Some categories of questions commonly incorporated are the following:
“How are reliability goals defined and communicated to the team and
leadership?”
“With a new product development program is risk in the design
accessed in advance of distributing the resources?” “How is it
assessed?” “Who is included in accessing it?”
“What are the derivatives of design for “X” (DfX) that you see
regularly incorporated into the design process?” A mechanical
engineer may list five and a VP of manufacturing may say they don't
do DfX. This would be a tremendous red flag as to whether DfM
(manufacturing) is really incorporated by the manufacturing engineers
or is just something the design engineers think they can do as
“experts.”
It is evident how some of the greatest reveals come from asking the same
question to multiple people. There answers aren't what's most interesting.
It's how different their answers are that is very telling about connectivity.
It's not that different from asking the kids separately who broke the lamp.
The differences in the stories are where the investigation will lead us.
The Scoring
A scoring system is needed to help with translating some of these opinions
into quantitative guidance. It can be as simple as:
5 = extremely effective
4 = effective
3 = moderately effective
2 = slightly effective
1 = not effective
0 = not done or discontinued
DK = Don't know.
To me, “Don't know” is the most important value in the scoring scale. Note
that “Don't know” and “Not done” are not the same. “Not done” means I
can confidently state that we do not do this. “Don't know” means it may or
may not be done: the interviewee is unaware.
The scoring system is translating their judgment into a quantitative
measure. It's important to emphasize that we are talking about performance,
not how often or thoroughly it is applied.
After an assessment the thing that everyone wants to know is, “What should
we do first?” After the effectiveness scoring has painted a general landscape
of where improvement is, we need to draw a complete landscape using the
nonquantitative data to fill in the details.
The first recommendations typically derive from the areas of greatest
discrepancies. These discrepancies can be from two far‐reaching corners of
the organization or between two individuals that sit in the same open
concept office. Side note: do “open concept” offices seem to result in
everyone pulling a hoodie over their head and wearing headphones? I don't
think the “concept” is having the intended effect.
I do notice that the more group communication has become automated and
locked to specific avenues, the more information is lost. I have solved more
than one mystery by just happening to be at the communal coffee pot at the
right time. “I'm sick of receiving all these returned units just thrown in a
box. No packaging, nothing! It takes forever for me to sort them out.”
Reviewing the results with the participants sometimes yields so much
information that I have to go back and rewrite the assessment. The shared
“Ah‐ha!” moments can yield some amazing information. I guess that's
because it's a little similar to all of us being at the communal coffee pot at
the same time.
A great deal of new information comes to light as the team realizes how
their observations are connected to how other parts of the organization
operate.
The review should be a set of meetings, even if they are short. By holding at
least two short (<1 hour) meetings the team is again forced into discussion
about the analysis. What will occur is a summary that drives specific
actions. These aren't just prescribed actions; these are actions the team has
agreed to as helpful.
It didn't exist if it wasn't documented. I also say it didn't exist if no one
wants to read it. There should be not only a clear record of all the work but
also a concise summary. For the analysis to be effective it must be
communicated and understood outside of the group that participated in it.
For this to happen we need a clear summary that can be easily shared.
Select a summary format that fits the organizational culture. If this is an
organization that transfers significant results, and initiatives in team
meetings, then make it a presentation. If this organization uses an online
virtual team software package to communicate, make sure it gets uploaded
to that platform.
Figure 7.7 Maturity matrix (part 1 of 2).
Recommend Actions
After you get the feedback and feel confident that you have captured the
essence of an organization, it is time to come up with recommendations.
This is perhaps the most difficult portion of the assessment. This is where
experience really counts. You should have a good understanding of “best
practices” in the industry to be able to discover a pattern and draw a
conclusion from the pattern. If you have never performed an assessment
before, I recommend that you call in an expert to help you with your first
assessment to make sure you perform it thoroughly and to make sure you
draw the proper conclusions. From the assessment, you are looking for
trends, gaps in processes, skill mismatches, over‐analysis, and under‐
analysis. Look for differences across the organization, pockets of
excellence, areas with good results, and areas that need work.
No one technique or set of techniques makes an entire reliability program.
The techniques must match the needs of the products and the culture.
Many companies score a Stage 2 or Stage 3 and then ask what they need to
do to reach Stage 5. First of all, achieving Stage 5 is quite rare. Secondly,
moving more than one stage within one product release is also rare. You
should set your expectations appropriately and be patient while you change
your systems to achieve better reliability. If you try to make changes too
quickly, it is likely that the changes will be rejected by your team or your
reliability program will start breaking down.
You can then add the results from these detailed assessments to your overall
recommended actions.
Golden Nuggets
“Golden Nuggets” refers to those few techniques that your organization
does well, so well in fact that these techniques become engrained into your
culture. The Golden Nuggets become part of your “secret sauce” that give
your product and company a competitive advantage in specific areas.
Sometimes your organization doesn't know that they do them well or even
know that they are doing them at all. It is your responsibility to point out
these Golden Nuggets to the organization because you should always
reinforce good behavior.
When assessing your organization, look for these Golden Nuggets. Perhaps
your organization has a keen ability to use mechanical simulation
techniques such as Finite Element Analysis (FEA) or statistical data
analysis techniques such as Design of Experiments (DOE). If you discover
this, make sure to point it out to the organization and help them use these
Golden Nuggets to their advantage.
Other times, you may discover that your organization has an excellent
ability to work as a team and to reach a consensus. Perhaps they don't even
recognize that they possess this skill. Again, you should encourage this
behavior when you recognize it.
Summary
It really is amazing how simply measuring something sparks improvement.
It's human nature. What isn't being measured in your organization that is, in
fact, an important factor in your operation? Just list five right now. What
would it take to measure them? Could it be as simple as asking an
individual with a role that has access to the pertinent data to just… measure
it?
Having a well‐defined (and well‐followed) product development process
could be another example of a Golden Nugget. Many times we have seen
companies with disjointed product development processes, and when we
followed failures back to a root cause, we often found that they bypassed a
process or process step. Consistently following a product development
process is a key part to reducing failures, and the companies we have
worked with that already had this in place were typically the companies
with more reliable products.
References
1. Drucker, P.F. (1955). The Practice of Management. Oxford: Butterworth‐
Heinemann.
2. Crosby, P. (1979). Quality is Free: The Art of Making Quality Certain.
New York: McGraw‐Hill.
8
Reliability Culture Tools
Advancing Culture
This section of the book is for the engineers, managers, and leaders alike in
product design who want to learn the tools that make reliability culture
happen.
In this particular chapter, each section presents a fundamental technique for
shifting your culture more strongly toward reliability. If you apply these
techniques sincerely, the cost and time investments required to make great
gains will be minimal.
Notice I used the word “sincerely.” The reason: we often see methodologies
applied in a manner I can only describe as “insincere.” They're learned,
box‐checked in some development plan, and applied … once. Shortly
thereafter they fade as we revert to the familiar.
If you see this kind of fade happen, call it out. After all, most people don't
revert to previous behaviors for evil reasons. It's just something that
happens when well‐intended people who are under pressure try to do too
much in too little time. Keeping up the attention required to make a change
stick at times is too much to ask.
What happened was that the organization wasn't genuine in their desire to
make the change. They didn't allow their leaders the bandwidth to keep the
new process on track.
The methodology I'm proposing is called “Reliability Bounding.” It keeps
your most cherished reliability goals front and center at all times, and
requires minimal management to make them stick. It also keeps those goals
in the proportions that they were originally set, so they guide your daily
program actions accurately.
Where did these goals and proportions come from? Your business and
marketing plan, which has one objective: to gain and hold market share.
The Bounding tools ensure that your program, reliability plan, and business
objectives stay aligned. Alignment doesn't just increase efficiency. It
reduces human conflict and the need for manipulative managing.
Manipulative Managing
What do I mean by manipulative managing?
In simple terms, it's when a leader guides their team through indirect means.
In other words, instead of coming out with what they want directly, they
only allude to it.
If leadership uses a tactic like that, it's usually based on two factors. The
first is that the staff's personal goals and the company's goals aren't
completely inline. For instance, a salesperson receives a large quarterly
commission based on sales alone. The problem: that commission doesn't
consider if their customers are satisfied. So, the salesperson gets their bonus
and is satisfied, while the company may be stuck with angry customers and
is dissatisfied.
The second factor is that the leader feels their directives are being met with
strong resistance.
An Alternative to Manipulation
In the end, great results. But give me a break! What a rigmarole, and a
waste of time and energy. Why wasn't there a way for the CEO to have his
entire staff see the benefits of stopping “B” and fully funding “A”?
He never tried to get the team to understand the “why.” Why couldn't both
initiatives continue in parallel? Why was it best to choose A?
If the teams were aligned with the big picture and the “why” was
understood, the CEO could have simply let the obvious unfold. Team B
would have agreed that their initiative was not going to be successful. But
they had too much personally wrapped up in its success to let it go.
What makes a direct method work is that the receiver understands and
agrees with the “why.” It was the “why” he believed he had to fabricate.
The Bounding method was what I advised to the CEO so going forward he
could lead in a straightforward manner.
Transfer Why
The Bounding method aims to transfer the company's objectives to the
team.
Now, of course, the team already knew the company's objectives, but did
they own them?
The team had personal objectives associated with their roles. This is what
guided them day to day. “I want that raise.” “I want that promotion and new
office.” This was solely what guided their actions. It wasn't that these
people were narcissists; it was how the system was structured. This is why
the CEO had to manipulate them. He was correcting for a system that was
ineffective.
So there is a bit of irony if we consider that he was one of the architects of
the system they operated in. He created the role structure and incentives.
CEOs measure project managers on time to market. CEOs measure R&D
teams on the success of new technology.
Reliability Bounding
Reliability Bounding is based on a single principle: there are multiple
objectives in any program and they should all drive the day to day.
When the highest‐level goals are not providing these day‐to‐day inputs,
something else is. That “something” is usually everyone's personal goals,
actual or perceived, and unforeseen outside forces. I call this type of plan
execution “fire and forget.”
Reliability Feedback
For reliability engineering in product development, what feedback are we
looking for? How do we know we are on track? Reliability engineering has
to generate this guiding information. Rarely can it be found elsewhere.
There are no readymade feedback loops for reliability. Time to market has
an automatic feedback loop and rate of progress, and a desk calendar covers
it.
A good reliability program immediately puts into motion the test methods
that aim to “measure” percentage in availability – measure confidence in
the reliability goal, measure how robust the product is, and measure when
the full population is going to wear out.
So why do the people who pay for these tests not read the report? Or even
care that the report is going to be arriving a year after the product left the
factory? At that point, skip the testing and just ask the customer how the
product is doing.
The Reliability Bounding process is in two distinct phases. The first is
“Strategy Bounding,” which lays the foundation, guiding factors. The
second is “Guidance Bounding,” which creates the closed loop control
when the program is in motion.
Strategy Bounding
Strategy Bounding is a part of program planning. The program will hit its
reliability targets through architecture. The process starts with evaluating
candidate reliability tools for best return on investment (ROI).
ROI is difficult to quantify for reliability tools. A good deal of their value is
through mitigation of future issues. These issues might even occur when the
current team is disbanded.
Other returns, like “customer satisfaction” and “contribution to future
sales,” are even more difficult. These can become completely detached if
we don't find ways to tie them in.
By using ROI estimations for tools we can select the ones that deserve the
most resources. This evaluation done early will do something else very
important: it will be the lens we use to evaluate new information throughout
the program.
Bounding ROI
Anchoring
Focus Rotation.
The targets for product factors that are established to guide the product
development program will guide the Reliability Bounding process as well.
Many product factors influence a program. It is important that they relate to
the program's business objectives as well as be firmly established at the
program start. They will likely be renegotiated and adjusted as the program
unfolds. As I outlined in earlier chapters, the four most common product
factors are:
1. Time to market
2. Product cost point
3. Target reliability
4. Features.
These four factors, with the addition of the program cost, are the guiding
forces. They affect major decisions for the duration of the process. The big
questions that are often so difficult to answer are “How do you measure the
effect of a decision on each of those factors?” and “How do you negotiate
the value and effect on each factor relative to one another?”
Midprogram Feedback
Come midprogram, you may get questions that you couldn't have planned
for at the start: “Do the three additional months requested for the
Accelerated Life Test (ALT) completion justify the later time to market?”
“What do we do when we don't have a high confidence the product is going
to last as long as promised?” “Should we release it late which may cost
market share just to do more testing for increased confidence in long life?”
Historically, these decisions are made ambiguously. Mostly based on who
yelled the loudest at the Friday steering meeting. Or, more importantly, who
can make the biggest case that their scenario leads to better market share,
whether it's quantifiable or not.
If these discussions are occurring without all the facts, which they often do,
we're basically just steered by emotion, debate, and power dynamics. None
of these leads to actions that best serve the program, product goals, or the
company.
Our first task in applying the Bounding method is to find a way to
quantitatively relate the impact and benefit of each of the factors to the
program. The debate as to which issue, or factor, is more important has to
be concluded in the program early. Changing these factors midprogram can
be very harmful to the process in terms of delays and expense. The scoring
and relationships we will define are relative and not absolute. A change in
any values will cascade to all previous and future decisions if the relative
positions get out of sequence.
Bounding ROI
OK, onto the Bounding methodology.
To begin, you'll need to create a numbering system that helps you keep
track of your product program investments.
By the way, when I use the term “investments,” I'm not talking about only
money. I'm also talking about the program's calendar time, machinery wear‐
and‐tear, work hours, time to market, and so forth.
The numbering system takes everything that could help or hurt the product
release and your company's reputation, and puts it on a scale. The scale
comes from your knowledge and judgment.
It's like when Amazon asks you to rate one of its products on a scale of one
to five stars. You'd inherently know the difference between a one‐star rating
(bad) and a five‐star rating (great). To arrive at the proper star rating, you
don't need to do much of a calculation. You don't have to watch any
explanatory videos on how to rate a product. You just know.
And a related point about Amazon's rating system: while they chose a one‐
to‐five star system, they could have just as easily have created a system of
one‐to‐a‐hundred stars. Why one system over the other?
The one‐to‐a‐hundred system would have been too much. It would have
made you do too much thinking. You'd likely not leave a rating at all,
because it would have been too much work. It would have hurt your
engagement with the site. Instead, they decided to keep things as simple as
they could, while still providing a meaningful delineation between items
being rated.
Now keep those ideas in mind as you select your program's Bounding
system. Selecting your numerical range is simply a matter of how much
resolution you believe you need.
For most projects, I find a scale of 0–25 does the job.
You'll see in the following tables what that value is for the different
parameters.
To put the Bounding method in motion we need established Bounding
values. These are the numbers we use to relate the value of investments and
returns.
Remember, the Bounding value is like a currency. The same way money
links your hard work (investment) to getting things you desire (returns).
What can you buy with 20 hours of your time? Well, because of currency it
is simple to figure that out. We convert both your work and the item you
desire into dollars. We have to set conversion rates for each.
We request $35 for each hour we are fixing cars. We are working extra
hours because we want a huge new TV. We have big plans for Saturday
nights with friends and family.
We look online and see that the price on that big beautiful TV is $980.
Looks like we have 28 hours of overtime ahead of us. So there you go, we
know exactly what amount of investment will get us what we want in
return.
The range of the Bounding scale is up to you. The only requirement is that
it is consistent across all Bounding tables. You can't have 1–10 in one table
and 1–25 in another.
If we've selected 0–25, a value of 0 as an investment means, you guessed it,
nothing is invested in the program. A zero in return means nothing was
gained by the investment. In our work analogy, zero work represents zero
dollars, zero dollars means you don't get stuff. Twenty five would represent
you spending all of your money. How much you get is dependent on what
the 25 equates to. You will see in the following tables the selected values
for investment and returns in reliability.
Invest and Return Tables
Let's make a scoring table for reliability investments and returns. The
factors of investment in reliability tools is either financial or in added time
to the schedule, or both. The first investment we will score is “time.” We
will say that investing three months in a testing program is equivalent to a
Bounding number of 5. The most we would expect to invest in a reliability
tool is 15 months. We'll put that at the top of our Bounding scale at 25. We
now have a transferable proverbial currency for time invested in reliability.
Three months is 5 Bounding units and 15 months is 25 Bounding units.
How about for financial investment? We will set a $100k investment
equivalent to a Bounding number of 5 and a $500k investment equivalent to
a Bounding number of 25.
If an ALT test will add three months to the program schedule and cost
$100k, that is a total investment of 10 Bounding units (5 + 5 = 10).
How about Bounding numbers on “returns”? We need a way to measure
what we get back, “What do I get for my three months of delayed product
release and extra $100k added to the budget?
The four things I often identify as returns for reliability investment are:
These are the typical things we want in return for our time and financial
investment in reliability. This is the TV, car, clothing, and stereo we are
thinking about while we are sweating out there with cut knuckles and
grease on our face while working on a customer's car.
We use the same 0–25 scale. (let's abbreviate Bounding number to “B#”):
You might have noticed something there. I added an “i” to the Bounding
symbol. This designates it is an “invest” Bounding number. An “r” will be
added for return Bounding numbers.
How about the return values for RG?
Deciding by Bounding
Bounding can now help us decide which reliability activities we should
include in our program. For the first time instead of going with our “gut
feel,” we can calculate which tools are the best investment.
We need to do that calculation then. Don't worry: it's so simple you could
do it on a napkin, at a bar, after a few shots (not recommended, though). We
simply sum the investment Bounding numbers and compare that to the sum
of the return Bounding numbers. Whichever is greater directs our decision.
If we get more than we invested, it's a good move. If we get less than we
invested then we may want to skip it.
In the case of RG our investment total is 20B#i + 10B#i = 30B#i Return is
12B#r + 10B#r + 5B#r + 5B#r = 32B#r. The return is greater 32 > 30. RG is a
smart move for this program. Does the gap between the numbers matter?
Definitely. In the case of this RG program, the return was only two points
higher than the investment. If things get tight midprogram, there is a good
chance some concessions may be made. But right now it's clear we can
benefit by including it.
In an upcoming section I will discuss using the Bounding number to resolve
resource issues midprogram. But for now, let's discuss “Anchoring,” a way
to keep things connected.
Anchoring
Reliability test and analysis tools are used to either measure or improve
reliability. We include them because we want some kind of results. When
the program is complete, we look back to see how well they delivered. I
think we can agree that this is often disappointing. Why do we so often not
get what we expected? There are a few reasons but one of the most
significant is “loss of synchronicity.” Maybe that seems like a strange term,
but it means exactly what it says. We were coordinated once, but now we
are not. A day late and a dollar short, so to speak. Actually it is literal, we
do end up late and underfunded.
Figure 8.3 Tools Bounding tables.
Completing a HALT test at the same time the manufacturing prototype is
ready does not serve the program well. We should have had that
information at first prototype if it was to be useful. We can't do anything
with test results at manufacturing prototype. A manufacturing prototype
exists because the team has agreed the design is done and we are looking
for ways to tweak the manufacturing process.
The primary value of HALT testing is design feedback to improve
robustness. The project manager is going to laugh at you suggesting
fundamental design changes so close to field release. So what happened?
Why are great design ideas showing up now? We planned HALT testing
well. We should have had design input much earlier.
This situation isn't just messy and a waste of resources: it destroys
credibility. What's going to happen when you ask for a HALT test budget in
the next program?
What would have been good for leadership to know throughout the program
is that the value of HALT testing was progressively becoming less valuable.
With that information a leader can decide much earlier if they should just
cut it loose or maybe increase its priority so it can still add value early on.
Why end up with the worst‐case scenario: full investment and no return.
The technique that can ensure that this does not happen is “Anchoring”. I
created Anchoring because I couldn't stand to see this same situation play
out over and over again, program after program. It is totally avoidable. As
an engineer what we needed was crystal clear: closed loop control.
Closed Loop Control
What's that? It may sound super complicated, but it is not. It's occurring
around you every day. In fact, you are the one implementing it. Closed loop
control is simply this: you look through the windshield when you drive and
use that information to make decisions on what to do next. You observe the
road, decide you should go more left or right, adjusting the steering wheel,
the angle of the tires change, the car changes direction, you observe where
the car is now headed, adjust the wheel again, the tire angle changes again,
and on and on until you get to your destination.
Intent Anchor
The Intent Anchor is the “why.” Why are we using this tool? Why are we
spending this money? Why are we extending the program? Without
constantly checking in on the “why,” we can easily just keep going past our
goal or stopping short of it.
How do we identify intent? Why are we doing this test? Why are we
investing in this tool? Risk analysis is a great identifier for intent. My
favorite risk analysis tool is Failure Mode Effects Analysis (FMEA). The
output of an FMEA is a list of risks to the product's success. But it's not just
the risks. They are listed in order of greatest to least. That's a big deal! We
have limited program resources and we need to know where we should
apply them first.
FMEAs don't just output the risks; they include mitigative actions. Many of
these risk‐driven actions are reliability tools. So the reliability tools on our
program list have a clearly identified intent. The problem was that it was
last seen in the FMEA notes in the “Next Actions” column eight months
ago. Those actions would have remained visible longer if they were written
in powdered donut dust on the conference room table.
We need to keep the “why” connected to the “what.”
Delivery Anchor
The output of a tool must be connected as well. We can't just start work and
then hope we happen to complete when it is most beneficial. This planned
connection between the tool output and the program is the Delivery Anchor.
This Anchor is the reference for assessing the tool's value decay, like an
expiration date stamped on top of a yogurt. The closer we get to, or farther
we get past, that date the less likely that yogurt is going to be satisfying. It
anchors our reference for freshness (value).
The most important word when discussing the Delivery Anchor is “decay.”
Traditionally we treat the delivery of test and analysis tools as helpful or
not, nothing in between, totally binary. We aren't catching a train here, when
getting to the platform before 2:00 is good and arriving at 2:01 and your
day is totally ruined. That is not how anchors work. Value is on a sliding
scale. We need to identify that scale. Without it we are left with either
binary good/bad or an analysis of value based in a “seems like” kind of gut
feel.
A common point that triggers that binary response of good or bad for a tool
delivery is a program phase gate. The tool intent may have been to inform
us if we have 80% confidence in the reliability goal by phase #1.
Test results don't lose all value if they are delivered a day late. They do
decrease in value rapidly after that point though. But by how much and at
what rate? If those test results are forecasting an epic disaster, they will still
hold value to leadership discussion even at the end of the program, knowing
that we can kick off a preemptive design improvement initiative ahead of
the field discovering it. Yes, there will be some customers that experience
the issue, but we were able to minimize the impact.
Focus Rotation
The Focus Rotation method is a mechanism to ensure the Bounding goals
guide the team. It also gets associated with amphibians (you'll see in a
minute).
Remember the Bounding goals are the full set of program goals (features,
time to market, reliability, and product cost) as set by the original product
program plan. The program plan was created by sales, marketing,
leadership, and R&D. The goals were carefully selected and balanced
relative to each other to create a product that ensures the company grabs
and holds as much market share as possible, i.e. makes money, big money.
Summary
These reliability culture tools are easy to understand and easy to implement.
In fact, there are likely some elements of them in your existing process. By
formally introducing them to the product development program the team
will be able to maintain control of the reliability initiative. Remember, this
isn't the reliability initiative from the reliability team. This is the initiative
set out by the business, the initiative that was identified as a key element to
gaining and holding market share.
9
Guiding the Program in Motion
A program in motion, going in the right direction, with all necessary
resources available isn't a sure thing. I mentioned “fire and forget”
strategies earlier. They don't work in project management. The landscape
we are operating in is changing. What we know about the product as it
develops is changing. The market is changing.
If we make delayed large corrections to these changes, we are swerving
wildly across the road. If we observe what is going on around us and make
daily corrections the changes are just about invisible. This is what Guidance
Bounding is about.
Guidance Bounding
We have discussed Strategy Bounding. Now we are in motion; so now, the
second part, Guidance Bounding.
We're in the middle of the program and things are changing. We made our
plan while sitting in the proverbial “safe, quiet, sunny harbor.” Now we are
out in the middle of a raging ocean with no land in sight. What do we do to
stay on course?
Yes, everything is late. My budget just got robbed by those jerks in the
science R&D group, and Rob has ramped up the politics game to 11 (and
it's always Rob who starts that). We're also on our second unplanned major
redesign. This is because no one stopped marketing from changing the
specifications. Even if everything goes perfect from here on out, we are
late. Anyone have Dramamine? These waves are making me sick.
The Guidance Bounding toolset will surely help here. It includes:
The Plan
This medical device had a full reliability growth (RG) program planned.
Twelve units would be allocated for RG in three stages. The RG program
was expected to continue even past product release. Once the units were in
the field there'd be a surge of data. We could grow our statistical confidence
with this data at an aggressive pace. Even though the product is in the field,
this confidence growth is still very valuable.
If we find the confidence not on a good trajectory we can now initiate
design changes far earlier than if we had simply waited until the customer
compiled the data for us. This was often delivered in stacks of angry
complaints. Harshly worded letters are not the same as RG test reports,
although they do hold some similar content.
The Issue
In the case of this example the RG program, it was lucky we had it in
motion as early as we did. We identified a fundamental issue. The data
predicted a 10% chance of the found issue occurring in the field.
Once identified, the issue was easily corrected in the design. The redesign
and prototype process would take four weeks. Re‐tooling and the creation
of new units would be an additional 6 weeks – a 10‐week total. So the
program would be back on track in less than three months.
When we were in meetings to discuss what to do about this, the room did
not have a lot of smiling faces. We knew we had a hard choice between
correcting a known issue or going against a solid deadline for product
delivery.
What do I mean by “solid”? This delivery date had some deep stakes
holding it down.
What triggered our new product program was an advancement in a specific
technology in another field. This occurs often.
Technology Cascade
One industry advances and its application in a second industry creates
sizable benefits. This was the case with cell phones, laptops, and electric
cars. It wasn't until the cell phone and laptop industries drove huge
advancements in electric battery size and capacity that electric cars became
truly viable.
There is a quote from GM's CEO, Bob Lutz, in 2004 stating that, “Elon
Musk was ridiculous thinking he was going to make electric cars using piles
of laptop batteries.” Fast‐forward two years and GM, Nissan, Toyota, and
everyone else was frantically rushing to create a prototype electric car with
those silly little lithium ion laptop batteries.
This is exactly what was happening here. Simply put, “The race was on!”
That delivery date was firm because all manufacturer's knew that was the
time to beat to be the first to market. But here is what happened, just last
month a competitor released their version of this new product. It was much
sooner than anyone expected. And here we were proposing slowing down
our program by three months. Like I said, at that meeting there weren't
many smiles.
Timing is Everything
Yes, the first competing product had made it to market, but this is how that
event translated to marketing's new “immovable date.” What the marketing
team knew from past experiences was that release in the same calendar year
created a blur as to who actually released first. As time passed, people
would forget which of the two companies was first if both released in, let's
say, 2012. But everyone else was after 2012. Those post‐2012 were just
copycats who likely reverse engineered what was on the market, even
though that is not the case.
More often than not, the brand considered to have created the original is
also considered the best. Remember, they invented the technology.
The end of the calendar year was eight months away. Our proposed three‐
month delay put us over that previously achievable deadline. If we made the
fix, we would be releasing next year, second place. Also known as “first
loser.”
But here's the other side. If we released a design with a reliability issue, it
could result in our leaving that market altogether. We were sure of this
because we saw that outcome with a competitor a few years back when
there was a similar type of jump in the technology.
Our Choice
Be one of the first to market and have a shot at being the brand synonymous
with the product: the Kleenex of tissues, the Xerox of copy machines, the
Band‐Aid of bandages, the Coke of colas. But… risk having field failures
early in production that would very likely result in your losing that market
entirely.
Or:
Release at a time that puts us in the middle of the pack and hope we can
fight our way to the front over the coming years.
Using Bounding
Remember those Bounding tables we created? They are about to come in
really handy again, because how do you make a decision like that?
Everyone just arguing their opinion, with resolution coming when either
someone pulls rank or there is one person left standing after the opponents
tap out – single elimination cage fight anyone?
With the previously established Bounding scoring system for risk and
investment, we can make a quantitative decision, our most thought‐out
decision. We first will adjust the scoring to represent the new landscape.
Remember “closed loop control” from Chapter 8? Since the program has
progressed, the rewards and their currency value have likely changed. We
need to adjust them.
This 10‐week investment may have had a Bounding conversion number of
5 earlier in the program. Its “cost” has gone way up now with the new
information. Now losing 10 weeks means a guaranteed hard fight from
midpack for years to come. A 10‐week loss is now a Bounding number of
20.
The team sat down and rescored the tables to reflect “today.” We then made
our decisions on ROI in the same manner we did when we were Strategy
Bounding.
The Results
So I bet you're curious as to what happened.
We decided to delay release and make the design change, and… things
turned out great. Through a stronger marketing effort and leveraging that
we had “the most robust technology” the company became the market
leader. Marketing really came through. Their campaign chose to focus on
the high investment in reliability, not letting the customer experience
failures.
It was a great strategy and a “one‐two knockout punch” because everyone
else had both minor and major issues over the coming years. We didn't.
What Now?
So here we are at the executive steering meeting all staring at each other
with this new information. The question that was just left hanging in the
air? “What do we do?” It became very clear we came with problems, and no
solutions. Not smart.
Obviously, we're going to do a redesign and run it through ALT test again.
The big questions, “Do we run it through a full ALT test or just far enough
to know it is better than before?” and, “How much program time are we
willing to lose?”
ALT testing is expensive and takes a long time. In Bounding number
language, “It's a big B#i (Investment Bounding Number)”.
So everyone is dug in and starts campaigning for what served them best.
Marketing wants it out the door, engineering wants it to be proven to be
durable, Sam is insisting that all meetings have complimentary cookies.
Why is he here? Sam works at the front desk.
This is the argument the project manager made, “We release on time, and
start testing now. If there is an issue with the new design, we will know
ahead of the field. We can then get a fix out there fast with minimal field
damage.” Sound familiar from my last example? This is the
counterargument reliability is regularly debating against.
Just Let It Go
My response, “Ummm, so basically you are saying plan for a recall?” It was
clear that no one was arguing for what was actually best for the business.
Especially Sam. He was hell bent on wrecking the snacks budget in Q1.
Why is he still here? Someone throw a cookie in the hall and get rid of him.
So clearly, I didn't have faith that these types of issues would be resolved
with the best interest of the company in mind. It will just be another team
slugging it out until a victor is declared, yours truly included. I can openly
admit that I got in there and swung as hard as anyone else once things got
going. Even if you are not an advocate of violence, when a bar brawl breaks
out you have few options.
After going through this for the 43rd time in my career I wanted to find
some kind of a solution for this predictable scenario. I mean, seriously,
when do you expect to test something and find out it is perfect in the first
round? Almost all programs only allocate enough time for a single round of
testing. There's never a mechanism for dealing with the bad results.
So I thought of an idea. I would like to say it was a spark of genius, but like
most good ideas it is a natural progression of things everyone knows. I
labeled the technique PREA (Program Risk Effects Analysis). The problem
this technique solves is, “Why don't we access the levels of risk to the
program and product in advance of big program decisions?” If we had this
we could then use it to quantitatively balance how much risk to the program
vs the product we are willing to accept.
Table Creation
Warranty will have “low warranty” associated with a value of one. “Low
warranty” represents an annual warranty cost that is 25% below what was
predicted, a nice surprise! Determining this value can of course be difficult.
A process I would use in this specific case is interviewing individuals in the
quality department. These are the individuals closest to the necessary data.
We could ask, “In a past program, ‘What was one of the best actual vs
projected warranty expense ratios?’ ”
It is valuable to include leadership when creating these scales. Leaders
know a lot about program loss. They've surprised me by saying, “It would
be a first if actual warranty matched projected warranty.” That's a red flag.
Why do they lack self‐awareness? This table creation exercise is going to
have some big peripheral benefits regarding mindset.
A word of caution: look out for the eternal optimist. These individuals who
say, “Everything is great” are the ones knitting the wool that seems to be
covering everyone's eyes. The Yes man. They exist in every organization,
because as humans we always want to hear that everything is going to be
alright. They exist out of necessity.
A score of “3” (Moderate) is a 10% over our warranty projection. A score
of “5” is 30% over our budget (Figure 9.3).
Developing the sales scoring table can get a bit heated. It's hard to associate
reliability issues to actual lost sales. For this one, I just say include as many
opinions as you can.
Figure 9.1 PREA tables.
Figure 9.2 PREA balance equation.
Evaluation
With these tables in place we were able to evaluate the impact of a full
second round of sensor ALT testing.
What were the product and program factors for the Piezo sensor life issue?
On the product side: warranty 3. This value came about based on a series of
data supported discussions with the engineers.
Even with the few life test data points we had a distribution for premature
wear‐out could be made. The distribution was a lognormal curve with a
long left tail. Placing a known good life curve showing a normal
distribution, we were able to estimate a percentage of premature wear‐out
for the first year. These failures would occur if the worst units were in the
hands of the highest cycle users (Figure 9.7).
Summary
Fire and forget worked when we used cannons in war. Actually, they didn't
really work that well: we just didn't have any other option. As soon as we
figured out how to track the progress of projectiles, vehicles, and program
initiatives, we looked for ways to have those progress reports steer the
ongoing event.
So ask yourself: are you firing any cannonballs in your programs that could
be guided rockets?
10
Risk Analysis Guided Project Management
We have limited resources and limited time. We would like to select and
adhere to a method to choose where to apply resources. I firmly believe that
assessed risk is the best way to do this. Using risk assessment to guide
program activities is a process of:
HALT Testing
The results from the analysis are ready to direct what the most critical
reliability risks are to the product. We want to have a new column that lists
what reliability tools will be used to address the risk. Highly Accelerated
Life Testing (HALT) may be used if it is found that electrical connectors
coming loose is a big concern.
We could construct a HALT test that uses incremental vibration to identify
connectors that come loose at each level of energy input. With the
connectors ranked by robustness relative to each other, the team can look
for opportunities to improve the ability of the connectors to stay in place
under stress. The solution may be to simply tie the wiring harness down in a
different manner so the free mass can't pull on the connector as easily.
And most of these were before we all became fascinated by the idea of
#lifehacks. Which basically is just a rather risky trend of misusing products.
I've never been on or contributed to a life hacks forum.
Modern FRACS systems are often Web‐based and can be accessed by the
entire team. They tend to have functions that go beyond the basic FRACAS
components. Often project management, quality assurance, and R&D
functions are included. This creates a comprehensive system that captures
an issue from first incident to validation of the final corrective action and
beyond to improvement of “best practices.”
The Olympic crashed into a warship whilst leaving harbor but was able
to make it back.
She was on the Titanic as it sank and is referenced in the Titanic film, a
stewardess that was told to set an example to the non‐English speaking
passengers as the ship sank. She looked after a baby on lifeboat 16
until being rescued by the Carpathia the following day.
When the Britannic hit a mine, triggering the “abandon ship”
command to be given, the lifeboats hit the water too early. As the ship
sank, the rear listed up and a number of the lifeboats were sucked into
the propellers. Violet had to jump out of the lifeboat she was in and
sustained a serious head injury, but survived.
She was on board for all three incidents in the space of five years.
So it's pretty safe to assume that Violet, a factor in all three events, is the
cause of ships sinking. Right? It's hard to argue the odds of that “factor” in
all three incidents being random. So since we are in a rush to improve ship
safety we are banning the carrying of passengers or crew with the name
“Violet.”
“What happened?”
“How did it happen?”
“Why did it happen?”
Actions for preventing reoccurrences.
RCA is not a single method but, in fact, a group of methods. The most
commonly used are (in order of my favorites):
Brainstorming
These are the criteria for a good brainstorming session based on the Lean
Six Sigma process [1].
Fundamentals of Brainstorming
There are several fundamental requirements for a successful session:
Select Participants
The chairperson composes the brainstorming panel. Many variations are
possible, but the following composition is suggested:
Summary
Risk analysis is a cornerstone of just about everything we do. What are the
chances of being hit by a car?: Scales 1 (lowest) to 10 (highest).
OK, 560 compared to 72 RPN. Now you don't have to do the math to come
to this conclusion. We all do that analysis in our heads instantly when asked
if we would rather go for a bike ride on the highway or in a neighborhood.
The same thing occurs when an engineer is asked if they would rather use a
needle or a ball bearing in an application. But there are significant benefits
when we put that question within a formal risk analysis process.
The first is that now the entire team can contribute. It's not possible for the
ball bearing engineer to account for every factor associated to the risk, nor
do they know every bit of history with bearings in that product.
The second is that by making it quantitative it becomes very easy to turn
that analysis into a tool that interfaces with project management and
resource allocation.
Yes, tools like FMEAs can take a significant amount of program time. But
this is when they are just listed as an activity with hours next to it that hit
our schedule. What is not accounted for is that many of these analyses of
risk were going to occur informally, and not be listed as a resource
consumption. They just secretly bleed program time bit by bit, hidden in
design reviews and redesigns.
When we formalize the risk analysis process we are acknowledging:
References
1. Wheat, B., Mills, C., and Carnell, M. (2001). Leaning into Six Sigma:
The Path to Integration of Lean Enterprise and Six Sigma. Boulder City,
CO: McGraw‐Hill.
11
The Reliability Program
Reliability Program Plan
Many different types of reliability program plans (RPPs) are deployed.
Some have been just a cut and paste of a standardized plan, while others
only a single hand‐drawn Gantt chart on a PowerPoint slide. I'm sure you
can imagine I disagree with both of those approaches, but if I had to pick
one I would prefer the hand‐drawn Gantt chart. A cut and paste of a
previous plan is the quickest way to torpedo any program. Doing that
repetitively can even sink a department.
The way to develop the correct plan is to start with the following question,
“What's the intent of an RPP?” An RPP can have multiple purposes. It's
most fundamental is to apply resources to tools that will yield the greatest
benefit. “Benefit” can be many things, but this is how I would define it.
These are the essential elements.
The reason I am being so extreme about not doing a cut and paste is
because nothing hurts an initiative more than wasted investment.
Investment in tools that do not yield value is just cause for not funding
similar initiatives in future programs: a couple of well‐documented
examples of wasting investment and it's possible to justify shrinking or
even closing a department. It's not too hard to suggest dissolving a
reliability department and distributing that functionality among other
groups. It can be justified as still being in line with the design for reliability
(DfR) philosophy, “We all own and do reliability,” but it is not. Without a
central command for reliability, DfR falls apart.
So why is a standardized plan a torpedo for an entire department? A
standardized plan is disconnected from the actual needs of the product
development program. We all know what “one size fits all” really means.
This product doesn't fit you or anyone else.
Each product development program is unique in its needs. The idea that
program decisions could be made without any knowledge of the specific
factors is equivalent to a doctor treating a patient without diagnosis. It's
statistically impossible she would do anything that actually addresses the
patient's condition.
This is the medical approach from the Middle Ages we now laugh at. “You
have a headache?” Bloodletting. “Your finger is infected?” Bloodletting.
“You have the flu?” Bloodletting. It was a cure all – or more accurately a
“cure none.” That process was the result of wanting to do something but not
having any knowledge of what to do. Good intentions, fair enough. But it's
inexcusable when there are proven techniques for good diagnosis and
correct methods that are effective. Using bloodletting in modern‐day
medicine is malpractice.
The correct approach has tools that matter. This is one of the reasons I
created the “Anchoring methodology” (covered in Chapter 8). It is not only
important to clearly document “why we need it” and “what is delivered,”
but we need to keep those updated in a changing landscape.
Using the doctor analogy again. Doctors don't make a plan for a patient's
treatment and then just stick to it no matter what happens next with the
patient. The doctor will change the treatment based on how the patient is
responding. This parallel between a doctor helping a patient and reliability
is so effective because they are in fact the same process, “Measuring and
improving performance.”
Now, it is OK if you have been working from a cut‐and‐paste plan: don't
beat yourself up. It's great that your company has become committed to the
reliability process. This chapter will assist with you now evolving to the
next level of applying the correct tools from the DfR toolset.
Common Reliability Program Plan Pitfalls
Like anything, there is a list of common pitfalls for RPP. In this section I
will cover some of the common missteps that I commonly see in plans and
reliability initiatives that can be avoided with a bit of foresight.
Too Much
With every unnecessary section in an RPP you risk losing readers. With
each section that is over explained, you risk losing readers. When Stephan
Hawking, the famous astrophysicist, wrote his bestselling A Brief History of
Time he was told by his publisher, “For every equation you put in the book
you will lose half of your readers.” The reason he said this is because the
book was meant for a broad audience, not astrophysicists.
If a reader takes great interest in a section of the book, they can research the
topic independently. On the other hand, if a reader feels confused by a
section, they may put down the book for good. They will never reach that
section that truly captured their curiosity and made them want to learn
more.
I have seen RPPs that have full mathematical reliability allocation models
in them. I even felt like putting those down. Why would I be reading an
RPP and want to know what the reliability target is for the bearing in the
joint “lev12‐lower”?
When I need information from the allocation model I'll go find the
allocation model. That's the only time an allocation model is interesting,
even to a reliability engineer.
But, Goldilocks needs to want to finish it as well.
Too Little
Don't make it so trim that there is no narrative. If there is too little, the plan
can seem disconnected. If the narrative has gaps, it will feel confusing. A
plan that lists the key activities, like a cooking recipe, doesn't provide
understanding. We go to recipes because of higher‐level strategies, like
hosting Thanksgiving. We want to be creating the Hosting Thanksgiving
manual.
There must be an introduction. The introduction captures the high‐level
strategy. This is how the RPP fits into the overall product program.
Language should be discussed. Remember, we are appealing to a broad
audience that may be new to the subject.
Connect how specific activities are grouped.
Purpose (Example)
This document details the RPP for the product. Reliability has been
identified as a key element to beating our closest competitor in market
share. Customer surveys show that “reliability” is the number two reason
for selection of our type of product. We are ranked by customers as only
having “mid‐level” reliability. It is projected that we can gain 10% more
market share by simply improving the reliability of our existing product line
without new technology.
This plan defines key program strategy, methods, goals, language, and
performance tracking methods associated to the design's reliability
initiative. Included are specific design development activities based on
previously established areas of risk. This plan will be incorporated into the
high‐level program plan and accounts for established program goals.
Scope
Scope is needed so the reader understands what will be covered. They may
be looking for information that is in a subassembly test protocol or a
marketing document. Let's not waste their time.
Scope (Example)
This plan applies to the product development process, manufacturing
process, and field usage.
Product Description
Not everyone knows what the product is or how it is used.
The product reliability goal for hard failures is 99.86% annually when
used in use case 3 in environment A, B, C. Useful life is 10 years.
The product reliability goal for hard failures is 99.99% annually when
used in use case 1, 2 in environment A, B, C. Useful life is 10 years.
The product reliability goal for soft failures is 99.99999% annually
when used in use case 1, 2, 3 in environment A, B, C. Useful life is 10
years.
Uptime (Example)
Figure 11.1 shows an uptime stack. This graphic shows how they can relate
to each other in accordance with the larger grouping definitions.
An RPN for each potential mode is used to flag critical failures for further
evaluation. For high RPN failure modes mitigating design and process
actions will be identified and executed during the design development
portion of this program.
The DFMEA team will consist of multiple program disciplines. This is a
key factor to a successful DFMEA. Required disciplines to attend:
mechanical engineering, electrical engineering, quality engineering,
compliance engineering, manufacturing engineering, software engineering,
and reliability engineering.
The expectation is that the DFMEA can be accomplished in four two‐hour
sessions with the multidisciplinary team. Analysis and reporting will be
completed independently after these sessions by the representative
reliability engineer.
Testing
The planned testing may be locked in or simply proposed for evaluation and
selection as the program unfolds. A significant factor in this is what
analysis tools and program planning have already occurred when the RPP is
created. Remember that the test program is tied to the higher‐level initiative
by risk analysis and historical data studies. If these haven't occurred then
the testing strategy is unlikely to have been finalized. Or, more specifically,
the testing strategy should not be finalized. That goes back to the cautions
about having a cut‐and‐paste plan. Keep the test programs tied to risk and
the needs of product performance measurement. Maintain integrity with the
high‐level program.
Organize the testing sections of the program into one of two methods. They
can be sectioned either by subassembly testing and system testing, or design
improvement testing and measurement testing. Seventy percent of the time
they are partitioned by subassembly tests and system tests. This is easier to
understand because many people are dividing the program by the timeline.
Subassembly tests tend to group together, while system tests occur later in
the program.
Grouping by “measurement and improvement” objectives is helpful if the
reader is reviewing the plan specifically from the perspective of their role
(Figure 11.5). A project manager will be primarily interested in the
measurement‐based activities, while a designer will be interested in the
improve activities.
Figure 11.5 Test type by improve and measure ratio.
This is how measurement and design improvement testing differ.
Let's first define what reliability testing is.
Reliability tests aim to both measure and improve product reliability.
Each specific test has a set balance of these two objectives.
For example, HALT testing is a 90% improve and a 10% measure balanced
test. ALT testing is at the other end of the spectrum with a 20% improve
and an 80% measure balance. Figure 11.5 shows this spectrum with a few
test types on it in their respective spots. They are also tied to the areas of the
product bathtub curve they are most associated with improving and
measuring the product.
HALT (Example)
For the purposes of extended failure mode identification, HALT testing will
be executed. The HALT program for the product will be categorized as
“Bench Top HALT” and “Chamber HALT.” Both categories will share the
same objective of “exposing all possible failure modes in the test unit.” The
Bench Top HALT will be a group of tests that are open to using any stress
to be believed to induce failure modes that can be created in the lab. Many
of the proposed methods include excessive versions of anticipated
operational stresses.
The Chamber HALT category of tests will be based around extreme
temperature and vibration stresses. These tests will take place in a HALT
chamber at a contracted lab. The lab will use a HALT chamber that
accommodates rapid thermal transitions and six‐axis broadband vibration.
Testing will use a combination of environments including temperature
cycling, random vibration, supply voltage variation, and power cycling.
Electronics printed circuit board (PCB) based assemblies will be subjected
to the sequence of low‐temperature step stress, high‐temperature step stress,
rapid temperature transition, random vibration, and combined
environments. These test steps will begin within specific environmental
conditions and continue until upper/lower DLs are met. In cases where it is
necessary to avoid exceeding the DLs of the test asset, testing may be
stopped at predefined limits. Figure 11.6 shows the method for applying
stepped stress tests.
Figure 11.6 HALT testing stepped stress.
One of the most common ALT testing models is the Arrhenius model. This
model aims to identify a relationship between the primary wear‐out failure
mode and elevated temperature. The wear‐out failure mode must be based
on a material property change through chemical interaction. In many cases
this chemical interaction is with oxygen.
An example would be the wear‐out of electronics over time. The electronics
will ultimately fail due to material property change. This may be a
resistance to voltage breakdown or a change in a dielectric properties. By
running the electronics in an elevated temperature profile, let's say 30 °C
higher than the highest use case, the demonstrated life of five years can be
demonstrated in three weeks.
This is clearly a very powerful tool. It lets us see into the future. Like all
powerful tools they can create destruction just as quickly when not used
correctly. If the assumptions going into the test are incorrect, life statements
will be off by orders of magnitude.
Summary
It is clear how the RPP provides all the key elements and instructions for
conducting the program. A diverse audience can use it as a resource. Each
able to extract the pertinent information to their role in the program. A good
plan should be able to be read at several levels.
Summary
The product development process and technology have evolved ever since
the first rock became a crushing tool. The speed at which it has evolved has
been exponential, or more like a flat curve for two million years and then a
vertical line over the past 100.
This is due to the strong correlation between product life cycle and
technology growth. How long were spears used before bows and arrows
took some of their market share? How long did flip phones dominate the
market before smartphones moved in on their customer base? We live in a
time where technology is evolving at the fastest pace it ever has, and today's
rate of change will look like nothing compared to tomorrow's.
Reliability in design has always been important. No one wanted a bow and
arrow that didn't work when dinner was ready to sprint at the slightest
noise. But during that time the bow and arrow would be improved in
minuscule steps over thousands of years. Reliability wasn't even a process.
It was just a natural occurrence. As an invention was used and touched by
different hands it slowly improved. For many historical designs it would be
impossible to pinpoint when an advancement in a technology actually
occurred, like trying to measure a tree as it grows.
As reliability engineering has formalized over just the last century its
integration into product development has become more intertwined, and
formal. It started with just statistics, then grew into purpose‐driven tests,
then spread to design practices, and now it is how programs are structured
and executed. The “best practices” and formalized methods have to keep
pace. That is what reliability culture is about and why the contents of this
book are so important. But this book only hits upon 1% of what is
happening to advance how reliability practices create great products.
Constantly look outside your projects, your products, technology, and your
industry. How we are doing reliability in product development is changing
faster than any book or education series can keep up with. It's up to you to
see what's new today in reliability and share what you are doing with the
world to contribute to the greater good.
Index
a
accelerated life test (ALT) 1, 13, 86, 110, 122, 147
accountability 44
chart 49
notation 48
Amazon’s rating system 106
anchoring
closed loop control 112
delivery anchor 114–115
intent anchor 113–114
open loop control 112–113
value of 115
awareness 40–41
awareness growth 59
b
bathtub curve 88
bounding factors 10–11
bounding method 103
bound resource 11
brainstorming
background memo 141
fundamentals of 140–141
list of lead questions 141
preparing for session 141
select participants 141
setting session rules 142
variations on classic brainstorming 142–143
warm‐ups 141–142
Broken Window Theory 31
business
goals 8
and market objectives 15
strategy 53
c
certificates of reliability 41
change agents 71–72
change analysis 140
closed loop control 112, 122
communication 50–55, 56–58
at organizational level 55
credibility 39
culture
defined 31
directed product development culture see directed product development
culture
d
delivery anchor 114–115
Deming, Edwards 3
design failure mode effects analysis 51, 110, 124, 132–134, 157
design for reliability (DfR) process 5–6, 145, 151–152
comparing design processes 23
improved design process 22
traditional design approach 21, 22
design parameters 10
design risk analysis 155
design robustness 40
design team 92
destruct limit (DL) 151
directed product development culture
intent test 39–40
reliability engineering 32–38
e
electronics printed circuit board (PCB) 161
engineering leadership 92
environmental stress screen (ESS) 151
executives 92
extrovert checklist 79
f
facilitators 71
extrovert checklist 79
introvert checklist 79
technique 78–79
failure mode and effects analysis (FMEA) 2, 39, 131–132, 140, 147, 150,
155
failure reporting and corrective action system (FRACAS) system 137–138
fault tree analysis (FTA) 140
fear‐driven testing 62–63
features table 128
field issues 3
field performance 58
focus rotation method 115–116
fundamental limit of technology 151
g
goals, reliability 5
group passing technique 143
guidance bounding
guidance bounding ROI
issues 120
plan 120
technology cascade 120–121
program risk effects analysis
ALT test 123
chill phase 125–126
DFMEA 124
evaluation 128–130
intended function 122
program freeze 124–125
stated conditions 122
tables and calculations 126–128
h
highly accelerated life testing (HALT) 13, 39–40, 110, 150, 160–161
highly accelerated stress screening (HASS) 151
Hussein, Saddam 51
i
improvement method 166
intended function 153
intent anchor 113–114
intent, testing
fear‐driven testing 62–63
improvement 61–62
ownership 62
introvert checklist 79
l
leadership
methods 55
reliability 5
team and 76–77
and transference
benefits of 67–68
intervention 65
obstacles 64–65
parent’s guide to objectives 66–67
true and interpreted objectives 66
leadership strategy initiative 63
leadership style 51
Lean Six Sigma 9
m
manufacturing engineers 92
market objectives 15
market pressure 62
Mars Rover program 148
measurement method 166
measurement parameter 7
n
new cutting‐edge technology 11
nominal group technique 142–143
non‐operational test 151
non‐scheduled time 154
o
open loop control 112–113
operating limit (OL) 151
operational test 151
organization
cultural dynamics 58
and process 7
and project 74
organization’s hierarchy 4
out‐of‐box failures (OBFs) 151
ownership
CEO, role of 117
intent testing 62
internal initiatives 117
organizational roles 117
peer acceptance vs. pressure 117
and transference 62–68
ownership chart
benefits 45–46, 48, 50
comparing charts 45
p
pareto analysis 140
performance appraisal 4
Piezo balance equation 130
Piezo tables 129
predictions, reliability 13
preventative maintenance (PM) 151
primary wear‐out failure mode 88, 90
probability 152
process failure mode effects analysis (PFMEA) 136
product, defined 10
product development 5, 36, 58
product development programs 11, 131
product drivers 9–10
productive time 154
product life cycle 168
product line’s development 129
product objectives 8
product reliability 4, 6
product‐specification document 5
product‐specification profiles 8–9
product’s reliability 6
program accounting 18–20
program risk effects analysis
ALT test 123
balance equation 127
chill phase 125–126
DFMEA 124
evaluation 128–130
intended function 122
program freeze 124–125
stated conditions 122
tables and calculations 126–128
Program Risk Effects Analysis (PREA) tools 27
program’s reliability 2
program tools 11
project managers (PMs) 92
q
quality engineers 92
quantitative risk assessment 27
r
random fail rate 90
reliability 38–39
awards 3
certificates of 41
culture 169
defined 150
in design 169
discipline 11–15
engineering 169
engineer’s responsibility 23–25
goals 5
inputs and outputs 44
life, defined 151
negotiation 27
product program 44
professional 26–28
timeline 14
tools and methods 43
verification and validation testing 43
warranty, defined 151
reliability allocation 150, 157–159
reliability bounding 101
feedback 104
strategy bounding
anchoring 110–115
bounding ROI 106–110
focus rotation 115–116
reliability culture
manipulative management
in action 102
alternatives 102–103
staff’s personal goals and company’s goals 102
reliability czar 71, 75
direct input 74
distilling information 74
functions 73
team and leadership 76–77
tips 77–78
reliability demonstration test (RDT) 150
Reliability Design Risk Summary (RDRS)
benefits 136
HALT testing 136
objective of 134–135
risk priority numbers 135–136
scoring and evaluation 135
three ranking factors 135
reliability engineering
design strategy 35
influences 32–33
invention and practice 33–34
Japan, emergence of 37–38
postwar influence 36–37
quality and inventing 34–35
technology‐and‐design cultural periods 32
WWII 35–36
reliability goals 152–153
reliability growth (RG) 2, 151
reliability maturity assessment
Golden Nuggets 98–99
recommendations 98
reliability maturity matrix 94–97
scoring system 94
steps 91
team 92–93
team review 95
reliability prediction 150
reliability professionals
customer dependency 81
customer margin 81
market sensitivity 81
product’s usefulness 81
resources 81
shirt and phone ranked factors 82
reliability program plan
accelerated life testing (ALT) 162–164
for broad audience 146
common pitfalls 146
common test execution 149
concise and clear goals 148–149
design failure mode effects analysis 157
design risk analysis 155
elements 145
failure mode effects analysis (FMEA) 155
HALT testing 160–161
intended function 153
non‐scheduled time 154
probability 152
product description 151
productive time 154
purpose 149–150
reliability allocation model 157–159
reliability goals 152–153
reliability growth 164–166
return on investment (ROI) 146–147
scheduled downtime 154
scope 150
standby time 154
subassembly life cycle testing 161
subassembly stress margin testing 162
system level testing 164
testing initiatives 149
testing sections 159–160
unscheduled downtime 154
reliability representatives 81
reliability testing
primary wear‐out failure modes 86
random failure rate 86
reliable products 3
return on investment 19
cost point 18
new technology and features 18
reliability 18
reliability initiative 17
time to market 18
return on investment (ROI) 39, 132
risk analysis 143–144
robotic arm assembly 2
robustness 36, 40, 57
root cause analysis
change analysis 140
fault tree analysis (FTA) 140
fishbone or cause and effect diagram 140
FMEA 140
pareto analysis 140
stages of 139–140
5 why analysis 140
rule of 10s 20–21
s
sales table 128
scheduled downtime 154
scoring system 94, 133–134
seven‐stage process 167–168
specialized tests 13
standby time 154
strategy bounding
anchoring 110–115
bounding ROI
Amazon’s rating system 106
deciding by 110
invest and return tables 107–110
focus rotation 115–116
stress margin test (SMT) 86
stress strain overlap 87
subassembly life cycle testing 161
subassembly stress margin testing 162
t
team idea mapping 143
The Toyota Way 9
thoroughness 54
time to market table 128
top‐notch programs 2
Total Quality Management 9, 15
transference and leadership
benefits of 67–68
intervention 65
obstacles 64–65
parent’s guide to objectives 66–67
true and interpreted objectives 66
u
unit functionality test (UFT) 151
unit under test 151
unscheduled downtime 154
use failure mode effects analysis 136–137
v
validation testing 43
verification 43
w
warranty table 127
wear‐out percentage 129
5 why analysis 140
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.