I wrote concurrent code for years before I asked what was actually stopping the race.
Turns out the answer wasn't software.
You declare something atomic in your code. The library passes the request along. The runtime passes it along. If you're blocking on a lock, the kernel gets pulled in too. Each layer is just paperwork. The work itself happens in one place: a single CPU instruction, etched into silicon, that everything above it is built on top of.
That instruction is called compare-and-swap (CAS). It checks a memory address for an expected value, and if it matches, writes a new one. The whole thing happens as one indivisible step. Nothing can squeeze in between the check and the write.
On x86 it's CMPXCHG with a LOCK prefix. On modern ARM it's a single CAS instruction. Underneath, the CPU's cache coherence protocol enforces it: one core grabs the cache line, holds it exclusively while the swap happens, then releases.
Theoretically, this is what makes everything above it possible. Mutexes. Channels. AtomicInteger. sync.Once. Lock-free queues. Database row locks. Every concurrency primitive you've imported eventually bottoms out in CAS or a sibling instruction: fetch-and-add, test-and-set, load-linked/store-conditional. CAS is just the most general of the family, the one you can build the rest from.
In practice, it's not free. Cores fighting for the same cache line cause it to ping-pong across the memory bus, and throughput collapses under contention. CAS also has the ABA problem: a value can change from A to B and back to A between your read and your swap, and CAS won't notice.
Still, the fact that the entire tower of concurrent programming rests on a handful of CPU opcodes is something most engineers go their whole careers without noticing. Your goroutines, your async runtime, your thread-safe hashmap, all of it bottoms out in a few transistors deciding who gets to write to a 64-byte line of cache first.
#backend #concurrency #computerarchitecture
Great topic! Caching is one of the most impactful techniques for improving system performance and scalability.