0% found this document useful (0 votes)

22 views

Dan Luu - How Completely Messed Up Practices Become Normal

The document discusses how poor practices in companies can become normalized over time. Some examples provided include high employee turnover accepted as normal, intentionally publishing unreproducible research results, and two managers feuding for over a decade. The author notes these issues seem normal only to people within those companies, not as an outsider. Overall, the document argues risky practices become entrenched through a focus on growth over risk, and change often requires public embarrassment before improving standards like security and reliability.

Uploaded by

Peter Petroff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Dan Luu - How Completely Messed Up Practices Become Normal

Uploaded by

Peter Petroff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1 of 11

Blog Archive (date) Archive (popularity) About

How Completely Messed Up Practices Become

Normal
Have you ever mentioned something that seems totally normal to you only to
be greeted by surprise? Happens to me all the time, when I describe
something everyone at work thinks is normal. For some reason, my
conversation partner’s face morphs from pleasant smile to rictus of horror.
Here are a few representative examples.

There’s the company that is perhaps the nicest place I’ve ever worked,
combining the best parts of Valve and Netflix. The people are amazing and
you’re given near total freedom to do whatever you want. But as a side effect
of the culture, they lose perhaps half of new hires in the first year, some
voluntarily and some involuntarily. Totally normal, right?

There’s the company that’s incredibly secretive about infrastructure. For

example, there’s the team that was afraid they’d teach a hardware vendor
how to create scalable equipment that competitors could use if they reported
bugs, so they requested the ﬁrmware to the equipment so they could ﬁx bugs
themselves. More recently, I know a group of folks outside the company who
tried to reproduce a paper the company published earlier this year. The
group found that they couldn’t reproduce the result, and that the algorithm
in the paper resulted in an unusual level of instability; when asked about this,
one of the authors responded “well, we have some tweaks that didn’t make it
into the paper” and declined to share the tweaks, i.e., the company purposely
published an unreproducible result to avoid giving away the details. Normal.

There’s the office where I asked one day about the fact that I almost never
saw two particular people in the same room together. I was told that they
had a feud going back a decade, and that things had actually improved – for
years, they literally couldn’t be in the same room because one of the two
would get too angry and do something regrettable, but things had now
cooled to the point where the two could, occasionally, be found in the same
wing of the office or even the same room. These weren’t just random people,
either. They were the two managers of the only two teams in the office.
Normal!

There’s the company whose culture is so odd that, when I sat down to write a
post about it, I found that I’d not only written more than for any other single
post, but more than all other posts combined (which is well over 100k words
now, the length of a moderate book). This is the same company where
someone recently explained to me how great it is that, instead of using data
to make decisions, we use political connections, and that the idea of making
decisions based on data is a myth anyway; no one does that. What’s not only
normal, but the only possible way to do things is to use your political capital
to push your personal agenda through.
2 of 11

There’s the company that created multiple massive initiatives to recruit more
women into engineering roles, where women still get rejected in recruiter
screens for not being technical enough after asking questions like “was your
experience with algorithms or just coding?”, as is normal in the industry.

There’s the company where I worked on a four person eﬀort with a multi-
hundred million dollar budget and a billion dollar a year impact, where
requests for things that cost hundreds of dollars routinely took months or
were denied.

You might wonder if I’ve just worked at places that are unusually screwed up.
Sure, the companies are generally considered to be ok places to work, and
two of them are considered to be among the best places to work, but maybe
I’ve just ended up at places that are overrated. But I have the same
experience when I hear stories about how other companies work, even places
with stellar engineering reputations, except that it’s me that’s shocked and
my conversation partner who thinks their story is normal.

There’s the companies that use @flaky, which includes the vast majority of
Python-using SF Bay area unicorns. If you don’t know what this is, this is a
library that lets you add a Python annotation to those annoying flaky tests
that sometimes pass and sometimes fail. When I asked multiple co-workers
and former co-workers from three different companies what they thought
this did, they all guessed that it re-runs the test multiple times and reports a
failure if any of the runs fail. Close, but not quite. It’s technically possible to
use @flaky for that, but in practice it’s used to re-run the test mutlitple times
and reports a pass if any of the runs pass. The company that created @flaky
is effectively a storage infrastructure company, and the library is widely used
at its major competitor. Marking tests that expose potential bugs as passing
is totally normal; after all, that’s what ext2/ext3/ext4 do with write errors.

There’s the company with a reputation for having great engineering

practices that had 2 9s of reliability last time I checked, for reasons that are
entirely predictable from their engineering practices. This is the second
thing in a row that can’t be deanonymized because multiple companies find it
to be normal. Here, I’m not talking about companies trying to be the next
reddit or twitter where it’s, apparently, totally fine to have 1 9. I’m talking
about companies that sell platforms that other companies rely on, where an
outage will cause dependent companies to operations for the duration of the
outage. Multiple companies that build infrastructure find practices that lead
to 2 9s of reliability to be completely and totally normal.

As far as I can tell, what happens at these companies is that they started by
concentrating almost totally on product growth. That’s completely and totally
reasonable, because companies are worth approximately zero when they’re
founded; they don’t bother with things that protect them from losses, like
good ops practices or actually having security, because there’s nothing to
lose (well, except for user data when the inevetible security breach happens,
and if you talk to security folks at unicorns you’ll know that these happen).

The result is a culture where people are hyper-focused on growth and ignore
3 of 11

risk. That culture tends to stick even after company has grown to be worth
well over a billion dollars, and the companies have something to lose. Anyone
who comes into one of these companies from Google, Amazon, or another
place with solid ops practices is shocked. Often, they try to ﬁx things, and
then leave when they can’t make a dent.

Google probably has the best ops and security practices of any tech company
today. It’s easy to say that you should take these things as seriously as
Google does, but it’s instructive to see how they got there. If you look at the
codebase, you’ll see that various services have names ending in z, as do a
curiously large number of variables. I’m told that’s because, once upon a
time, someone wanted to add monitoring. It wouldn’t really be secure to have
google.com/somename expose monitoring data, so they added a z.
google.com/somenamez. For security. At the company that is now the best in the
world at security.

Google didn’t go from adding z to the end of names to having the world’s
best security because someone gave a rousing speech or wrote a convincing
essay. They did it after getting embarrassed a few times, which gave people
who wanted to do things “right” the leverage to ﬁx fundamental process
issues. It’s the same story at almost every company I know of that has good
practices. Microsoft was a joke in the security world for years, until multiple
disastrously bad exploits forced them to get serious about security. Which
makes it sound simple: but if you talk to people who were there at the time,
the change was brutal. Despite a mandate from the top, there was vicious
political pushback from people whose position was that the company got to
where it was in 2003 without wasting time on practices like security. Why
change what’s worked?

You can see this kind of thing in every industry. A classic example that tech
folks often bring up is hand-washing by doctors and nurses. It’s well known
that germs exist, and that washing hands properly very strongly reduces the
odds of transmitting germs and thereby signiﬁcantly reduces hospital
mortality rates. Despite that, trained doctors and nurses still often don’t do
it. Interventions are required. Signs reminding people to wash their hands
save lives. But when people stand at hand-washing stations to require others
walking by to wash their hands, even more lives are saved. People can ignore
signs, but they can’t ignore being forced to wash their hands.

This mirrors a number of attempts at tech companies to introduce better

practices. If you tell people they should do it, that helps a bit. If you enforce
better practices via code review, that helps a lot.

The data are clear that humans are really bad at taking the time to do things
that are well understood to incontrovertibly reduce the risk of rare but
catastrophic events. We will rationalize that taking shortcuts is the right,
reasonable thing to do. There’s a term for this: the normalization of deviance.
It’s well studied in a number of other contexts including healthcare, aviation,
mechanical engineering, aerospace engineering, and civil engineering, but
we don’t see it discussed in the context of software. In fact, I’ve never seen
the term used in the context of software.
4 of 11

Is it possible to learn from other’s mistakes instead of making every mistake

ourselves? The state of the industry make this sound unlikely, but let’s give it
a shot. John Banja has a nice summary paper on the normalization of
deviance in healthcare, with lessons we can attempt to apply to software
development. One thing to note is that, because Banja is concerned with
patient outcomes, there’s a close analogy to devops failure modes, but
normalization of deviance also occurs in cultural contexts that are less
directly analogous.

The ﬁrst section of the paper details a number of disasters, both in

healthcare and elsewhere. Here’s one typical example:

A catastrophic negligence case that the author participated in as an

expert witness involved an anesthesiologist’s turning off a
ventilator at the request of a surgeon who wanted to take an x-ray
of the patient’s abdomen (Banja, 2005, pp. 87-101). The ventilator
was to be off for only a few seconds, but the anesthesiologist forgot
to turn it back on, or thought he turned it back on but had not. The
patient was without oxygen for a long enough time to cause her to
experience global anoxia, which plunged her into a vegetative
state. She never recovered, was disconnected from artificial
ventilation 9 days later, and then died 2 days after that. It was later
discovered that the anesthesia alarms and monitoring equipment in
the operating room had been deliberately programmed to a
“suspend indefinite” mode such that the anesthesiologist was not
alerted to the ventilator problem. Tragically, the very
instrumentality that was in place to prevent such a horror was
disabled, possibly because the operating room staff found the
constant beeping irritating and annoying.

Turning off or ignoring notifications because there are too many of them and
they’re too annoying? An erroneous manual operation? This could be straight
out of the post-mortem of more than a few companies I can think of, except
that the result was a tragic death instead of the loss of millions of dollars. If
you read a lot of tech post-mortems, every example in Banja’s paper will feel
familiar even though the details are different.

The section concludes,

What these disasters typically reveal is that the factors accounting

for them usually had “long incubation periods, typiﬁed by rule
violations, discrepant events that accumulated unnoticed, and
cultural beliefs about hazards that together prevented
interventions that might have staved oﬀ harmful outcomes”.
Furthermore, it is especially striking how multiple rule violations
and lapses can coalesce so as to enable a disaster’s occurrence.

Once again, this could be from an article about technical failures. That makes
the next section, on why these failures happen, seem worth checking out.
The reasons given are:
5 of 11

The rules are stupid and ineﬃcient

The example in the paper is about delivering medication to newborns. To

prevent “drug diversion,” nurses were required to enter their password onto
the computer to access the medication drawer, get the medication, and
administer the correct amount. In order to ensure that the ﬁrst nurse wasn’t
stealing drugs, if any drug remained, another nurse was supposed to observe
the process, and then enter their password onto the computer to indicate
they witnessed the drug being properly disposed of.

That sounds familiar. How many technical postmortems start oﬀ with

“someone skipped some steps because they’re inefficient”, e.g., “the
programmer force pushed a bad config or bad code because they were sure
nothing could go wrong and skipped staging/testing”? The infamous
November 2014 Azure outage happened for just that reason. At around the
same time, a dev at one of Azure’s competitors overrode the rule that you
shouldn’t push a config that fails tests because they knew that the config
couldn’t possibly be bad. When that caused the canary deploy to start failing,
they overrode the rule that you can’t deploy from canary into staging with a
failure because they knew their config couldn’t possibly be bad and so the
failure must be from something else. That postmortem revealed that the
config was technically correct, but exposed a bug in the underlying software;
it was pure luck that the latent bug the config revealed wasn’t as severe as
the Azure bug.

Humans are bad at reasoning about how failures cascade, so we implement

bright line rules about when it’s safe to deploy. But the same thing that
makes it hard for us to reason about when it’s safe to deploy makes the rules
seem stupid and ineﬃcient!

Knowledge is imperfect and uneven

People don’t automatically know what should be normal, and when new
people are onboarded, they can just as easily learn deviant processes that
have become normalized as reasonable processes.

Julia Evans described to me how this happens:

new person joins

new person: WTF WTF WTF WTF WTF
old hands: yeah we know we’re concerned about it
new person: WTF WTF wTF wtf wtf w…
new person gets used to it
new person #2 joins
new person #2: WTF WTF WTF WTF
new person: yeah we know. we’re concerned about it.

The thing that’s really insidious here is that people will really buy into the
WTF idea, and they can spread it elsewhere for the duration of their career.
Once, after doing some work on an open source project that’s regularly
broken and being told that it’s normal to have a broken build, and that they
6 of 11

were doing better than average, I ran the numbers, found that project was
basically worst in class, and wrote someting about the idea that it’s possible
to have a build that nearly always passes with pretty much zero eﬀort. The
most common comment I got in response was, “what kind of fantasy land is
this guy living in? Let’s get real. We all break the build at least a few times a
week”. This stuﬀ isn’t rocket science, but once people get convinced that
some deviation is normal, they often get really invested in the idea.

I’m breaking the rule for the good of my patient

The example in the paper is of someone who breaks the rule that you should
wear gloves when ﬁnding a vein. Their reasoning is that wearing gloves
makes it harder to ﬁnd a vein, which may result in their having to stick a
baby with a needle multiple times. It’s hard to argue against that. No one
wants to cause a baby extra pain!

The second worst outage I can think of occurred when someone noticed that
a database service was experiencing slowness. They pushed a fix to the
service, and in order to prevent the service degradation from spreading, they
ignored the rule that you should do a proper, slow, staged deploy. Instead,
they pushed the fix to all machines. It’s hard to argue against that. No one
wants their customers to have degraded service! Unfortunately, the fix
exposed a bug that caused a global outage.

The rules don’t apply to me/You can trust me

most human beings perceive themselves as good and decent

people, such that they can understand many of their rule violations
as entirely rational and ethically acceptable responses to
problematic situations. They understand themselves to be doing
nothing wrong, and will be outraged and often ﬁercely defend
themselves when confronted with evidence to the contrary.

As companies grow up, they eventually have to impose security that prevents
every employee from being able to access basically everything. And at most
companies, when that happens, some people get really upset. “Don’t you
trust me? If you trust me, how come you’re revoking my access to X, Y, and
Z?”

Facebook famously let all employees access everyone’s profile for a long
time, and you can even find HN comments indicating that some recruiters
would explicitly mention that as a perk of working for Facebook. And I can
think of more than one well-regarded unicorn where everyone still has access
to basically everything, even after their first or second bad security breach.
It’s hard to get the political capital to restrict people’s access to what they
believe they need, or are entitled, to know. A lot of trendy startups have core
values like “trust” and “transparency” which make it difficult to argue
against universal access.

Workers are afraid to speak up

7 of 11

There are people I simply don’t give feedback to because I can’t tell if they’d
take it well or not, and once you say something, it’s impossible to un-say it. In
the paper, the author gives an example of a doctor with poor handwriting
who gets mean when people ask him to clarify what he’s written. As a result,
people guess instead of asking.

In most company cultures, people feel weird about giving feedback. Everyone
has stories about a project that lingered on for months after it should have
been terminated because no one was willing to oﬀer explicit feedback. This is
a problem even when cultures discourage meanness and encourage
feedback: cultures of niceness seem to have as many issues around speaking
up as cultures of meanness, if not more. In some places, people are afraid to
speak up because they’ll get attacked by someone mean. In others, they’re
afraid because they’ll be branded as mean. It’s a hard problem.

Leadership withholding or diluting ﬁndings on problems

In the paper, this is characterized by ﬂaws and weaknesses being diluted as

information ﬂows up the chain of command. One example is how a supervisor
might take sub-optimal actions to avoid looking bad to superiors.

I was shocked the ﬁrst time I saw this happen. I must have been half a year
or a year out of school. I saw that we were doing something obviously
non-optimal, and brought it up with the senior person in the group. He told
me that he didn’t disagree, but that if we did it my way and there was a
failure, it would be really embarrassing. He acknowledged that my way
reduced the chance of failure without making the technical consequences of
failure worse, but it was more important that we not be embarrassed. Now
that I’ve been working for a decade, I have a better understanding of how
and why people play this game, but I still ﬁnd it absurd.

Solutions

Let’s say you notice that your company has a problem that I’ve heard people
at most companies complain about: people get promoted for heroism and
putting out fires, not for preventing fires; and people get promoted for
shipping features, not for doing critical maintenance work and bug fixing.
How do you change that?

The simplest option is to just do the right thing yourself and ignore what’s
going on around you. That has some positive impact, but the scope of your
impact is necessarily limited. Next, you can convince your team to do the
right thing: I’ve done that a few times for practices I feel are really important
and are sticky, so that I won’t have to continue to expend eﬀort on convincing
people once things get moving.

But if the incentives are aligned against you, it will require an ongoing and
probably unsustainable eﬀort to keep people doing the right thing. In that
case, the problem becomes convincing someone to change the incentives,
and then making sure the change works as designed. How to convince people
8 of 11

is worth discussing, but long and messy enough that it’s beyond the scope of
this post. As for making the change work, I’ve seen many “obvious” mistakes
repeated, both in places I’ve worked and those whose internal politics I know
a lot about.

Small companies have it easy. When I worked at a 100 person company, the
hierarchy was individual contributor (IC) -> team lead (TL) -> CEO. That was
it. The CEO had a very light touch, but if he wanted something to happen, it
happened. Critically, he had a good idea of what everyone was up to and
could basically adjust rewards in real-time. If you did something great for the
company, there’s a good chance you’d get a raise. Not in nine months when
the next performance review cycle came up, but basically immediately. Not
all small companies do that eﬀectively, but with the right leadership, they
can. That’s impossible for large companies.

At large company A (LCA), they had the problem we’re discussing and a
mandate came down to reward people better for doing critical but
low-visibility grunt work. There were too many employees for the mandator
to directly make all decisions about compensation and promotion, but the
mandator could review survey data, spot check decisions, and provide
feedback until things were normalized. My subjective perception is that the
company never managed to achieve parity between boring maintenance work
and shiny new projects, but got close enough that people who wanted to
make sure things worked correctly didn’t have to signiﬁcantly damage their
careers to do it.

At large company B (LCB), ICs agreed that it’s problematic to reward

creating new features more richly than doing critical grunt work. When I
talked to managers, they often agreed, too. But nevertheless, the people who
get promoted are disproportionately those who ship shiny new things. I saw
mangement attempt a number of cultural and process changes at LCB.
Mostly, those took the form of pronouncements from people with fancy titles.
For really important things, they might produce a video, and enforce
compliance by making people take a multiple choice quiz after watching the
video. The net eﬀect I observed among other ICs was that people talked
about how disconnected management was from the day-to-day life of ICs.
But, for the same reasons that normalization of deviance occurs, that
information seems to have no way to reach upper management.

It’s sort of funny that this ends up being a problem about incentives. As an
industry, we spend a lot of time thinking about how to incentivize consumers
into doing what we want. But then we set up incentive systems that are
generally agreed upon as incentivizing us to do the wrong things, and we do
so via a combination of a game of telephone and cargo cult diﬀusion. Back
when Microsoft was ascendant, we copied their interview process and asked
brain-teaser interview questions. Now that Google is ascendant, we copy
their interview process and ask algorithms questions. If you look around at
trendy companies that are younger than Google, most of them basically copy
their ranking/leveling system, with some minor tweaks. The good news is
that, unlike many companies people previously copied, Google has put a lot
of thought into most of their processes and made data dirven decisions. The
9 of 11

bad news is that Google is unique in a number of ways, which means that
their reasoning often doesn’t generalize, and that people often cargo cult
practices long after they’ve become deprecated at Google.

This kind of diﬀusion happens for technical decisions, too. Stripe built a
reliable message queue on top of Mongo, so we build reliable message
queues on top of Mongo1. Our co-worker live edits the production database
to run tests, so we live edit the production database to run tests. It’s cargo
cults all the way down2.

The paper has speciﬁc sub-sections on how to prevent normalization of

deviance, which I recommend reading in full.

Pay attention to weak signals

Resist the urge to be unreasonably optimistic
Teach employees how to conduct emotionally uncomfortable
conversations
System operators need to feel safe in speaking up
Realize that oversight and monitoring are neverending

Let’s look at how the ﬁrst one of these, “pay attention to weak signals”,
interacts with a single example, the “WTF WTF WTF” a new person gives oﬀ
when the join the company.

If a VP decides something is screwed up, people usually listen. It’s a strong

signal. And when people don’t listen, the VP knows what levers to pull to
make things happen. But when someone new comes in, they don’t know what
levers they can pull to make things happen or who they should talk to almost
by deﬁnition. They give out weak signals that are easily ignored. By the time
they learn enough about the system to give out strong signals, they’ve
acclimated.

“Pay attention to weak signals” sure sounds like good advice, but how do we
do it? Strong signals are few and far between, making them easy to pay
attention to. Weak signals are abundant. How do we filter out the ones that
aren’t important? And how do we get an entire team or org to actually do it?
These kinds of questions can’t be answered in a generic way; this takes real
thought. We mostly put this thought elsewhere. Startups spend a lot of time
thinking about growth, and while they’ll all tell you that they care a lot about
engineering culture, revealed preference shows that they don’t. With a few
exceptions, big companies aren’t much different. At LCB, I looked through
the competitive analysis slide decks and they’re amazing. They look at every
last detail on hundreds of products to make sure that everything is as nice
for users as possible, from onboarding to interop with competing products. If
there’s any single screen where things are more complex or confusing than
any competitor’s, people get upset and try to fix it. It’s quite impressive. And
then when LCB onboards employees, a third of them are missing at least one
of, an alias/account, an office, or a computer, a condition which can persist
for weeks or months. The competitive analysis slide decks talk about how
important onboarding is because you only get one chance to make a first
impression, and then employees are onboarded with the impression that the
10 of 11

company couldn’t care less about them and that it’s normal for quotidian
processes to be pervasively broken. LCB can’t even to get the basics of
employee onboarding right, let alone really complex things like acculturation.
This is understandable – external metrics like user growth or attrition are
measurable, and targets like how to tell if you’re acculturating people so that
they don’t ignore weak signals are softer and harder to determine, but that
doesn’t mean they’re any less important. People write a lot about how things
like using fancier languages or techniques like TDD or agile will make your
teams more productive, but having a strong engineering culture is much
larger force multiplier.

Thanks to Ezekiel Benjamin Smithburg and Marc Brooker for introducing me to the term
Normalization of Deviance, and Kelly Eskridge, Leah Hanson, Sophie Rapoport, Ezekiel
Benjamin Smithburg, Julia Evans, Dmitri Kalintsev, Raplph Corderoy, Jamie Brandon, and
Victor Felder for comments/corrections/discussion.

1. People seem to think I’m joking here. I can understand why, but try
Googling mongodb message queue. You’ll find statements like “replica sets in
MongoDB work extremely well to allow automatic failover and
redundancy”. Basically every company I know of that’s done this and
has anything resembling scale finds this to be an operational nightmare,
but you can’t actually find blog posts or talks that discuss that aspect of
it. All you see are the posts and talks from when they first tried it and
are in the honeymoon period. This is common with many technologies.
People really don’t like admitting that they based their infra on a
fundamentally bad idea, so you’ll mostly find glowing recommendations
even when, in private, people will tell you what a disaster the project
was. Today, if you do the search mentioned above, you’ll get a ton of
posts talking about how amazing it is to build a message queue on top of
Mongo, this footnote, and a maybe couple of blog posts by Kyle
Kingsbury depending on your exact search terms.

If there were an acute failure, you might see a postmortem, but while
we’ll do postmortems for “the site was down for 30 seconds”, we rarely
do postmortems for “this takes 10x as much ops effort as the alternative
and it’s a death by a thousand papercuts”, “we architected this thing
poorly and now it’s very difficult to make changes that ought to be
trivial”, or “a competitor of ours waws able to accomplish the same
thing with an order of magnitude less effort”. I’ll sometimes do informal
postmortems by asking everyone involved oblique questions about what
happened, but more for my own benefit than anything else, because I’m
not sure people really want to hear the whole truth. This is especially
sensitive if the effort has generated a round of promotions, which seems
to be more common the more screwed up the project. The larger the
project, the more visiblity and promotions, even if the project could have
been done with much less effort.↩

2. I’ve spent a lot of time asking about why things are the way they are,
both in areas where things are working well, and in areas where things
are going badly. Where things are going badly, everyone has ideas. But
11 of 11

where things are going well, as in the small company with the
light-touch CEO mentioned above, almost no one has any idea why
things work. It’s magic. If you ask, people will literally tell you that it
seems really similar to some other place they’ve worked, except that
things are magically good instead of being terrible for reasons they
don’t understand. But it’s not magic. It’s hard work that very few people
understand. Something I’ve seen multiple times is that, when a VP
leaves, a company will become a substantially worse place to work, and
it will slowly dawn on people that the VP was doing an amazing job at
supporting not only their direct reports, but making sure that everyone
under them was having a good time. It’s hard to see until it changes,
though.↩

« Big company vs. startup work and pay

@danluu on twitter RSS

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6434)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (641)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (996)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1853)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1018)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5143)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (463)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4360)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2788)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2010)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
Personal Hygiene & Grooming
100% (4)
Personal Hygiene & Grooming
16 pages
Shorncliffe Line
No ratings yet
Shorncliffe Line
2 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
OpenSim Server Commands, 0.7.1-Plus
No ratings yet
OpenSim Server Commands, 0.7.1-Plus
9 pages
The 9 Essential Habits of Mentally Strong People
No ratings yet
The 9 Essential Habits of Mentally Strong People
4 pages
The Banality of Systemic Evil
No ratings yet
The Banality of Systemic Evil
7 pages
Spacetime
No ratings yet
Spacetime
11 pages
Welcome To Meditation
No ratings yet
Welcome To Meditation
3 pages
NPCTarot Draft
No ratings yet
NPCTarot Draft
14 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Ligas, Miguel Enrico T. AR
No ratings yet
Ligas, Miguel Enrico T. AR
5 pages
Crew Change Procedures NPL
No ratings yet
Crew Change Procedures NPL
11 pages
245 - Design and Implementation of A Smart Hand Sanitizer Dispenser With Door Controller Using Atmega328p
No ratings yet
245 - Design and Implementation of A Smart Hand Sanitizer Dispenser With Door Controller Using Atmega328p
5 pages
21 CFR 111
No ratings yet
21 CFR 111
30 pages
Week 6 Bread and Pastry
0% (1)
Week 6 Bread and Pastry
18 pages
Effectiveness of Glycerin and Lemon in Making Hand Sanitizer
No ratings yet
Effectiveness of Glycerin and Lemon in Making Hand Sanitizer
10 pages
ANTT Procedure
No ratings yet
ANTT Procedure
31 pages
01 Loading, Unloading & Transportation of Heavy Equipment (Well Head)
No ratings yet
01 Loading, Unloading & Transportation of Heavy Equipment (Well Head)
11 pages
DLP Week 9 Day 1-5
No ratings yet
DLP Week 9 Day 1-5
15 pages
PIDAC Cleaning Disinfection and Sterilization 2013
No ratings yet
PIDAC Cleaning Disinfection and Sterilization 2013
117 pages
PHA-Learning Module-Compounding of Pharmaceuticals I
No ratings yet
PHA-Learning Module-Compounding of Pharmaceuticals I
204 pages
Personal Hygiene For Health Extension Workers
No ratings yet
Personal Hygiene For Health Extension Workers
46 pages
Safety Seal Certification Checklist: (DILG As Issuing Officer)
No ratings yet
Safety Seal Certification Checklist: (DILG As Issuing Officer)
3 pages
Health and Hygiene
100% (2)
Health and Hygiene
39 pages
Research Paper Likha-7
No ratings yet
Research Paper Likha-7
18 pages
G-15 Internal Transfer in
No ratings yet
G-15 Internal Transfer in
86 pages
Hand Soap Evaluation Lab
No ratings yet
Hand Soap Evaluation Lab
4 pages
Life After Lockdown
No ratings yet
Life After Lockdown
15 pages
Tanzania WASH FIT Training Report
No ratings yet
Tanzania WASH FIT Training Report
19 pages
CHN-RLE Module 1 (Hand Washing)
No ratings yet
CHN-RLE Module 1 (Hand Washing)
7 pages
Risk Management Reviewer
No ratings yet
Risk Management Reviewer
8 pages
Aseptic Gowning Qualification Program
No ratings yet
Aseptic Gowning Qualification Program
8 pages
Infection Control Guidelines in the Neonatal Intensive Care Unit NICU
No ratings yet
Infection Control Guidelines in the Neonatal Intensive Care Unit NICU
30 pages
Performance Checklist: Changing Diaper
No ratings yet
Performance Checklist: Changing Diaper
4 pages
E1procedures Software House: Haccp Food Hygiene
100% (1)
E1procedures Software House: Haccp Food Hygiene
60 pages
Infection Prevention (Including Hiv), Standard Precaution, Bio Waste Management
No ratings yet
Infection Prevention (Including Hiv), Standard Precaution, Bio Waste Management
125 pages
Blood Glucose Monitoring
No ratings yet
Blood Glucose Monitoring
3 pages
Mobile Food Facility OPERATOR'S HANDBOOK
No ratings yet
Mobile Food Facility OPERATOR'S HANDBOOK
24 pages
Risk For Infection Related To Invasive Procedures Secondary To Pneumonia (Severe) With Pleural Effusion, Empyema Thoracis
No ratings yet
Risk For Infection Related To Invasive Procedures Secondary To Pneumonia (Severe) With Pleural Effusion, Empyema Thoracis
5 pages

Dan Luu - How Completely Messed Up Practices Become Normal

Uploaded by

Dan Luu - How Completely Messed Up Practices Become Normal

Uploaded by

1 of 11

Blog Archive (date) Archive (popularity) About

How Completely Messed Up Practices Become

There’s the company that’s incredibly secretive about infrastructure. For

There’s the company with a reputation for having great engineering

This mirrors a number of attempts at tech companies to introduce better

Is it possible to learn from other’s mistakes instead of making every mistake

The ﬁrst section of the paper details a number of disasters, both in

A catastrophic negligence case that the author participated in as an

The section concludes,

What these disasters typically reveal is that the factors accounting

The rules are stupid and ineﬃcient

The example in the paper is about delivering medication to newborns. To

That sounds familiar. How many technical postmortems start oﬀ with

Humans are bad at reasoning about how failures cascade, so we implement

Knowledge is imperfect and uneven

Julia Evans described to me how this happens:

new person joins

I’m breaking the rule for the good of my patient

The rules don’t apply to me/You can trust me

most human beings perceive themselves as good and decent

Workers are afraid to speak up

Leadership withholding or diluting ﬁndings on problems

In the paper, this is characterized by ﬂaws and weaknesses being diluted as

At large company B (LCB), ICs agreed that it’s problematic to reward

The paper has speciﬁc sub-sections on how to prevent normalization of

Pay attention to weak signals

If a VP decides something is screwed up, people usually listen. It’s a strong

« Big company vs. startup work and pay

@danluu on twitter RSS

You might also like