下篇：Real-world Concurrency（真实世界的并发）翻译&笔记-CSDN博客

本文链接：https://blog.csdn.net/m0_53485135/article/details/134140228

下篇：Real-world Concurrency（真实世界的并发）翻译&笔记

这篇文章主要讨论了编写并发程序的主要原则，精读之后收获颇大，故作为笔记，记录于此。

I、翻译部分

书接中篇

Illuminating the Black Art

What if you are the one developing the operating system or database or some other body of code that must be explicitly parallelized? If you count yourself among the relative few who need to write such code, you presumably do not need to be warned that writing multithreaded code is hard. In fact, this domain’s reputation for difficulty has led some to conclude (mistakenly) that writing multithreaded code is simply impossible: “No one knows how to organize and maintain large systems that rely on locking,” reads one recent (and typical) assertion.5 Part of the difficulty of writing scalable and correct multithreaded code is the scarcity of written wisdom from experienced practitioners: oral tradition in lieu of formal writing has left the domain shrouded in mystery. So in the spirit of making this domain less mysterious for our fellow practitioners (if not also to demonstrate that some of us actually do know how to organize and maintain large lock-based systems), we present our collective bag of tricks for writing multithreaded code.

如果是你在开发操作系统、数据库或其他必须明确并行化的代码，你会怎么做？如果你是少数需要编写此类代码的人之一，那么你大概不需要被警告编写多线程代码是一件困难的事情。事实上，这个领域的困难名声已经让一些人（错误地）得出结论：编写多线程代码根本不可能： "5 编写可扩展且正确的多线程代码之所以困难，部分原因在于经验丰富的实践者的书面智慧太少：口头传统代替了正式的书面材料，使得该领域蒙上了一层神秘的面纱。因此，为了让我们的同行对这一领域不再感到神秘（如果不是为了证明我们中的一些人确实知道如何组织和维护基于锁的大型系统的话），我们将介绍我们编写多线程代码的集体诀窍。

Know your cold paths from your hot paths. If there is one piece of advice to dispense to those who must develop parallel systems, it is to know which paths through your code you want to be able to execute in parallel (the hot paths) versus which paths can execute sequentially without affecting performance (the cold paths). In our experience, much of the software we write is bone-cold in terms of concurrent execution: it is executed only when initializing, in administrative paths, when unloading, etc. Not only is it a waste of time to make such cold paths execute with a high degree of parallelism, but it is also dangerous: these paths are often among the most difficult and error-prone to parallelize.

了解冷路径和热路径。如果要给那些必须开发并行系统的人一个建议的话，那就是要知道在代码中哪些路径可以并行执行（热路径），哪些路径可以顺序执行而不影响性能（冷路径）。根据我们的经验，我们编写的很多软件在并发执行方面都是冷门：只有在初始化、管理路径、卸载等情况下才执行。让这些冷门路径以高度并行的方式执行不仅浪费时间，而且还很危险：这些路径往往是最难并行化且最容易出错的。

In cold paths, keep the locking as coarse-grained as possible. Don’t hesitate to have one lock that covers a wide range of rare activity in your subsystem. Conversely, in hot paths—those that must execute concurrently to deliver highest throughput—you must be much more careful: locking strategies must be simple and fine-grained, and you must be careful to avoid activity that can become a bottleneck. And what if you don’t know if a given body of code will be the hot path in the system? In the absence of data, err on the side of assuming that your code is in a cold path and adopt a correspondingly coarse-grained locking strategy—but be prepared to be proven wrong by the data.

在冷路径中，尽可能保持粗粒度锁定。不要犹豫，只要一个锁就能覆盖子系统中各种罕见的活动。相反，在热路径中，即那些必须并发执行才能提供最高吞吐量的路径中，你必须更加小心：锁定策略必须简单、细粒度，你必须小心避免可能成为瓶颈的活动。如果你不知道某段代码是否会成为系统中的热门路径，该怎么办？在缺乏数据的情况下，请尽量假设您的代码处于冷路径，并采用相应的粗粒度锁定策略，但要做好被数据证明是错误的准备。

Intuition is frequently wrong—be data intensive. In our experience, many scalability problems can be attributed to a hot path that the developing engineer originally believed (or hoped) to be a cold path. When cutting new software from whole cloth, you will need some intuition to reason about hot and cold paths—but once your software is functional, even in prototype form, the time for intuition has ended: your gut must defer to the data. Gathering data on a concurrent system is a tough problem in its own right. It requires you first to have a machine that is sufficiently concurrent in its execution to be able to highlight scalability problems. Once you have the physical resources, it requires you to put load on the system that resembles the load you expect to see when your system is deployed into production. Once the machine is loaded, you must have the infrastructure to be able to dynamically instrument the system to get to the root of any scalability problems.

直觉经常是错误的–要数据密集。根据我们的经验，许多可扩展性问题都可归因于热路径，而开发工程师最初认为（或希望）这是冷路径。从零开始开发新软件时，您需要一些直觉来推理热路径和冷路径，但一旦软件开始运行，即使是原型形式，直觉的时代就结束了：您的直觉必须服从数据。收集并发系统的数据本身就是一个棘手的问题。它要求你首先拥有一台能够充分并发执行的机器，以便能够突出可扩展性问题。一旦拥有了物理资源，就需要向系统施加负载，使其与系统部署到生产环境时的预期负载相似。一旦机器加载了负载，您就必须拥有能够动态检测系统的基础设施，以找出任何可扩展性问题的根源。

The first of these problems has historically been acute: there was a time when multiprocessors were so rare that many software development shops simply didn’t have access to one. Fortunately, with the rise of multicore CPUs, this is no longer a problem: there is no longer any excuse for not being able to find at least a two-processor (dual-core) machine, and with only a little effort, most will be able (as of this writing) to run their code on an eight-processor (two-socket, quad-core) machine.

其中第一个问题历来都很尖锐：曾几何时，多处理器是如此罕见，以至于许多软件开发公司根本无法获得多处理器。幸运的是，随着多核 CPU 的兴起，这个问题已经不复存在：再也没有任何借口找不到至少两核（双核）处理器的机器，而且只需稍加努力，大多数人就能在八核（双插槽、四核）处理器的机器上运行他们的代码。

Even as the physical situation has improved, however, the second of these problems—knowing how to put load on the system—has worsened: production deployments have become increasingly complicated, with loads that are difficult and expensive to simulate in development. As much as possible, you must treat load generation and simulation as a first-class problem; the earlier you tackle this problem in your development, the earlier you will be able to get critical data that may have tremendous implications for your software. Although a test load should mimic its production equivalent as closely as possible, timeliness is more important than absolute accuracy: the absence of a perfect load simulation should not prevent you from simulating load altogether, as it is much better to put a multithreaded system under the wrong kind of load than under no load whatsoever.

然而，即使物理状况有所改善，第二个问题–知道如何给系统加载负载–却在不断恶化：生产部署变得越来越复杂，在开发过程中模拟负载既困难又昂贵。您必须尽可能将负载生成和模拟视为头等问题；在开发过程中越早解决这个问题，就能越早获得可能对软件产生巨大影响的关键数据。虽然测试负载应尽可能接近生产负载，但及时性比绝对准确性更重要：即使没有完美的负载模拟，也不应妨碍您完全模拟负载，因为将多线程系统置于错误的负载下，总比置于任何负载下要好得多。

Once a system is loaded—be it in development or in production—it is useless to software development if the impediments to its scalability can’t be understood. Understanding scalability inhibitors on a production system requires the ability to safely dynamically instrument its synchronization primitives. In developing Solaris, our need for this was so historically acute that it led one of us (Bonwick) to develop a technology (lockstat) to do this in 1997. This tool became instantly essential—we quickly came to wonder how we ever resolved scalability problems without it—and it led the other of us (Cantrill) to further generalize dynamic instrumentation into DTrace, a system for nearly arbitrary dynamic instrumentation of production systems that first shipped in Solaris in 2004, and has since been ported to many other systems including FreeBSD and Mac OS.6 (The instrumentation methodology in lockstat has been reimplemented to be a DTrace provider, and the tool itself has been reimplemented to be a DTrace consumer.)

一旦系统加载完毕，无论是在开发阶段还是在生产阶段，如果无法了解其可扩展性的阻碍因素，那么该系统对软件开发就毫无用处。要了解生产系统的可扩展性抑制因素，就必须能够安全地动态检测其同步原语。在开发 Solaris 的过程中，我们对这种能力的需求是如此迫切，以至于我们中的一位（Bonwick）在 1997 年开发了一种技术（lockstat）来实现这一目标。这一工具立即成为我们的必备工具，我们很快就开始怀疑，如果没有它，我们是如何解决可扩展性问题的。它还促使我们中的另一位（Cantrill）进一步将动态仪表化推广到 DTrace 中，这是一个用于生产系统近乎任意的动态仪表化的系统，于 2004 年首次在 Solaris 中使用，随后被移植到包括 FreeBSD 和 Mac OS 在内的许多其他系统6（lockstat 中的仪表化方法已被重新实现为 DTrace 提供者，而工具本身也已被重新实现为 DTrace 消费者）。

Today, dynamic instrumentation continues to provide us with the data we need not only to find those parts of the system that are inhibiting scalability, but also to gather sufficient data to understand which techniques will be best suited for reducing that contention. Prototyping new locking strategies is expensive, and one’s intuition is frequently wrong; before breaking up a lock or rearchitecting a subsystem to make it more parallel, we always strive to have the data in hand indicating that the subsystem’s lack of parallelism is a clear inhibitor to system scalability!

如今，动态仪表不断为我们提供所需的数据，我们不仅能找到系统中阻碍可扩展性的部分，还能收集足够的数据来了解哪些技术最适合减少争用。新锁定策略的原型设计耗资巨大，而且直觉往往是错误的；在拆分锁定或重新架构子系统使其更加并行之前，我们总是努力掌握数据，以表明子系统缺乏并行性是系统可扩展性的明显阻碍因素！

Know when—and when not—to break up a lock. Global locks can naturally become scalability inhibitors, and when gathered data indicates a single hot lock, it is reasonable to want to break up the lock into per-CPU locks, a hash table of locks, per-structure locks, etc. This might ultimately be the right course of action, but before blindly proceeding down that (complicated) path, carefully examine the work done under the lock: breaking up a lock is not the only way to reduce contention, and contention can be (and often is) more easily reduced by decreasing the hold time of the lock. This can be done by algorithmic improvements (many scalability improvements have been achieved by reducing execution under the lock from quadratic time to linear time!) or by finding activity that is needlessly protected by the lock. Here’s a classic example of this latter case: if data indicates that you are spending time (say) deallocating elements from a shared data structure, you could dequeue and gather the data that need