Chapter 2
Chapter 2
Information
Chapter 2
• Rather than accessing individual bits in memory, most computers use blocks of 8 bits, or
bytes, as the smallest addressable unit of memory.
• A machine-level program views memory as a very large array of bytes, referred to as virtual
memory.
• Every byte of memory is identified by a unique number, known as its address, and the set of all possible
addresses is known as the virtual address space.
• As indicated by its name, this virtual address space is just a conceptual image presented to the machine-level
program.
• The actual implementation uses a combination of dynamic random access memory (DRAM), flash memory,
disk storage, special hardware, and operating system software to provide the program with what appears to be a
monolithic byte array
2.1.1 Hexadecimal Notation
• A single byte consists of 8 bits. In binary notation, its value ranges from (00000000 )2 to
(11111111)2.
• we write bit patterns as base-16, or hexadecimal numbers. Hexadecimal (or simply “hex”)
uses digits ‘0’ through ‘9’ along with characters ‘A’ through ‘F’ to represent 16 possible
values.
• common task in working with machine-level programs is to manually convert between
decimal, binary, and hexadecimal representations of bit patterns.
• Converting between binary and hexadecimal is straightforward, since it can be performed
one hexadecimal digit at a time.
• For example, suppose you are given the number 0x173A4C. You can convert this to binary
format by expanding each hexadecimal digit, as follows:
• Hexadecimal 1 7 3 A 4 C
• Binary 0001 0111 0011 1010 0100 1100
• Conversely, given a binary number 1111001010110110110011, you convert it to
hexadecimal by first splitting it into groups of 4 bits each.
• SRAM stores each bit in a bistable memory cell. Each cell is implemented with a six-transistor circuit.
• This circuit has the property that it can stay indefinitely in either of two different voltage configurations, or
states.
• Any other state will be unstable—starting from there, the circuit will quickly move toward one of the stable
states.
• Due to its bistable nature, an SRAM memory cell will retain its value indefinitely, as long as it is kept
powered.
• Even when a disturbance, such as electrical noise, perturbs the voltages, the circuit will return to the stable
value when the disturbance is removed.
Dynamic RAM
• DRAM stores each bit as charge on a capacitor. This capacitor is very small— typically around 30 femtofarads
—that is, 30 × 10−15 farads.
• DRAM storage can be made very dense— each cell consists of a capacitor and a single access transistor.
• Unlike SRAM, however, a DRAM memory cell is very sensitive to any disturbance.
• When the capacitor voltage is disturbed, it will never recover.
• Various sources of leakage current cause a DRAM cell to lose its charge within a time period of around 10 to
100 milliseconds.
• The memory system must periodically refresh every bit of memory by reading it out and then rewriting it.
• Figure 6.2 summarizes the characteristics of SRAM and DRAM memory. SRAM is persistent as long as power
is applied. Unlike DRAM, no refresh is necessary.
• SRAM can be accessed faster than DRAM. SRAM is not sensitive to disturbances such as light and electrical
noise.
• The trade-off is that SRAM cells use more transistors than DRAM cells and thus have lower densities, are more
expensive, and consume more power.
Memory Modules
• DRAM chips are packaged in memory modules that plug into expansion slots on
• the main system board (motherboard). Core i7 systems use the 240-pin dual inline
• memory module (DIMM),which transfers data to and from the memory controller
• in 64-bit chunks
Semiconductor Memory Types
Memory Type Category Erasure Write Mechanism Volatility
Random-access
Read-write memory Electrically, byte-level Electrically Volatile
memory (RAM)
Read-only
Masks
memory (ROM)
Read-only memory Not possible
Programmable
ROM (PROM)
Erasable PROM
UV light, chip-level
(EPROM) Nonvolatile
Electrically
• There are many kinds of DRAM memories, and new kinds appear on
the market
• with regularity as manufacturers attempt to keep up with rapidly
increasing processor
• speeds. Each is based on the conventional DRAM cell, with
optimizations
• that improve the speed with which the basic DRAM cells can be
accessed
1.Fast page mode DRAM (FPM DRAM).
A conventional DRAM copies an entire row of supercells into its internal row buffer, uses one,
and then discards the rest.
FPM DRAM improves on this by allowing consecutive accesses to the same row to be served
directly from the row buffer. (the row of bits is selected only once for all columns within the row.
previously each bit was accessed by pulsing its row and column.)
For example, to read four supercells from row i of a conventional DRAM, the
memory controller must send four RAS/CAS requests, even though the
row address i is identical in each case.
• To read supercells from the same row of an FPMDRAM, the memory controller sends an initial
RAS/CAS request, followed by three CAS requests. The initial RAS/CAS request copies row i
into the row buffer and returns the supercell addressed by the CAS.
• The next three supercells are served directly from the row buffer, and thus are returned more
quickly than the initial supercell
2.Extended data out DRAM (EDO DRAM).
• An enhanced form of FPM DRAM that allows the individual CAS signals to be
spaced closer together in time.
• Extended data out random access memory (EDO RAM/DRAM) is an early type of
dynamic random access memory (DRAM) chip which was designed to improve
the performance of fast page mode DRAM (FPM DRAM) that was used in the
1990s.
• It accomplishes this by holding data ready for the processor and allowing data to
be accessed for a slightly longer time, even after a new memory request is made.
• Its main feature was that it eliminated wait times by allowing a new cycle to start
while retaining the data output buffer from the previous cycle active, which allows
a degree of pipelining (overlap in operation) that improved performance.
2.Extended data out DRAM (EDO DRAM).
• Extended Data Out Random Access Memory (EDO RAM) is an improved version
of the conventional DRAM that allows faster data access by overlapping
subsequent memory read and write operations.
• EDO RAM provides enhanced performance in systems where faster memory
access is needed, such as gaming computers and laptops. It was widely used in
the mid-90s before being replaced by newer memory technologies like SDRAM
and DDR RAM.
• Despite its higher data transfer rate relative to standard DRAM, EDO RAM is now
considered obsolete due to the development of more advanced memory
technologies with superior speed and efficiency.
3.Synchronous DRAM (SDRAM).
• Conventional, FPM, and EDO DRAMs are asynchronous in the sense
that they communicate with the memory controller using a set of
explicit control signals.
• SDRAM replaces many of these control signals with the rising edges of
the same external clock signal that drives the memory controller.
• Without going into detail, the net effect is that an SDRAM can output
the contents of its supercells at a faster rate than its asynchronous
counterparts
Synchronous DRAM (SDRAM)
• Access is synchronized with an external clock and running at the full speed of the
processor/memory bus without imposing wait states.
• DRAM, the processor presents addresses and control levels to the memory, indicating that a set of data at a
particular location in memory should be either read from or written into the DRAM.
• RAM finds data (CPU waits in conventional DRAM)
• Since SDRAM moves data in time with system clock, CPU knows when data
will be ready
• CPU does not have to wait; it can do something else
• Burst mode allows SDRAM to set up stream of data and fire it out in block
• This mode is useful when all the bits to be accessed are in sequence and in the same row of the array as the
initial access.
• The mode register and associated control logic is another key feature differentiating
SDRAMs from conventional DRAMs.
• It provides a mechanism to customize the SDRAM to suit specific system needs.
• The mode register specifies the burst length, which is the number of separate units of data
synchronously fed onto the bus.
• The register also allows the programmer to adjust the latency between receipt of a read
request and the beginning of data transfer.
• The SDRAM performs best when it is transferring large blocks of data serially, such as for
applications like word processing, spreadsheets, and multimedia
• DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)
• There is now an enhanced version of SDRAM, known as double data rate
• SDRAM (DDR-SDRAM) that overcomes the once-per-cycle limitation. DDRSDRAM can
send data to the processor twice per clock cycle
4.Double Data-Rate Synchronous
DRAM(DDR SDRAM).
• DDRSD RAMis an enhancement of SDRAM that doubles the speed of
the DRAM by using both clock edges as control signals.
• Different types of DDR SDRAMs are characterized by the size of a
small prefetch buffer that increases the effective bandwidth: DDR (2
bits), DDR2 (4 bits), and DDR3 (8 bits).
DDR SDRAM
• SDRAM can only send data once per clock
• Double-data-rate SDRAM can send data twice per clock cycle
• Rising edge and falling edge
DDR SDRAM
Read Timing
Simplified DRAM Read Timing
5.Cache DRAM
• A DRAM (dynamic RAM) with an on-chip cache, called the cache DRAM, has
been proposed and fabricated.
• It is a hierarchical RAM containing a 1-Mb DRAM for the main memory and
an 8-kb SRAM (static RAM) for cache memory.
• It uses a 1.2- mu m CMOS technology.
• Suitable for no-wait-state memory access in low-end workstations and
personal computers, the chip also serves high-end systems as a secondary
cache scheme.
• Integrates small SRAM cache (16 kb) onto generic DRAM chip
• Used as true cache
• 64-bit lines
• Effective for ordinary random access
• To support serial access of block of data
• E.g. refresh bit-mapped screen
• CDRAM can prefetch data from DRAM into SRAM buffer
6.Video RAM (VRAM).
• Used in the frame buffers of graphics systems. VRAM is similar in spirit to FPM
DRAM.
• Two major differences are that
• (1) VRAM output is produced by shifting the entire contents of the internal
buffer in sequence and
• (2) VRAM allows concurrent reads and writes to the memory.
• Thus, the system can be painting the screen with the pixels in the frame buffer
(reads) while concurrently writing new values for the next update (writes).
Read Only Memory (ROM)
• Permanent storage
• Nonvolatile
• Microprogramming (see later)
• Library subroutines for frequently wanted functions
• Systems programs (BIOS)
• Function tables
Types of ROM
• Written during manufacture
• Very expensive for small runs
• Programmable (once)
• PROM
• the PROM is nonvolatile and may be written into only once.
• writing process is performed electrically
• Needs special equipment to program
• Read “mostly”
• Erasable Programmable (EPROM)
• Read and written electrically
• a write operation, all the storage cells must be erased
• Erased by UV
• erasure process can be performed repeatedly; each erasure can take as much as 20 minutes to perform
• For comparable amounts of storage, the EPROM is more expensive than PROM, but it has the advantage of the multiple update
• capability.
• Electrically Erasable (EEPROM)
• Takes much longer to write than read
• This is a read-mostly memory that can be written into at any time without erasing prior contents; only the byte or bytes addressed are updated.
• Flash memory
Organisation in detail (Chip Logic)
• For semiconductor memories, one of the key design issues is the number of bits of data
that may be read/written at a time
• A 16Mbit chip can be organised as 1M of 16 bit words
• A bit per chip system has 16 lots of 1Mbit chip with bit 1 of
each word in chip 1 and so on
• A 16Mbit chip can be organised as a 2048 x 2048 x 4bit
array
• Reduces number of address pins
• Multiplex row address and column address
• 11 pins to address (211=2048)
• Adding one more pin doubles range of values so x4 capacity
Interleaved Memory
• Main memory is composed of a collection of DRAM memory chips.
• A number of chips can be grouped together to form a memory bank
• It is possible to organize the memory banks in a way known as
interleaved memory.
• Each bank is independently able to service a memory read or write
request, so that a system with K banks can service K requests
simultaneously,
• increasing memory read or write rates by a factor of K.
Advanced DRAM Organization
• Basic DRAM same since first RAM chips
• Enhanced DRAM
• Contains small SRAM as well
• SRAM holds last line read (c.f. Cache!)
• Cache DRAM
• Larger SRAM component
• Use as cache or serial buffer
• flash memory (so named because of the speed with which it can be reprogrammed)
• First introduced in the mid-1980s, flash memory is intermediate between EPROM and
EEPROM in both cost and functionality.
• Like EEPROM, flash memory uses an electrical erasing technology.
• An entire flash memory can be erased in one or a few seconds, which is much faster than
EPROM.
• In addition, it is possible to erase just blocks of memory rather than an entire chip.
• Flash memory gets its name because the microchip is organized so that a section of memory
cells are erased in a single action or “flash.”
• However, flash memory does not provide byte-level erasure.
• Like EPROM, flash memory uses only one transistor per bit, and so achieves the high density
(compared with EEPROM) of EPROM.
• Flash memory is a type of nonvolatile memory, based on EEPROMs, that has become an
important storage technology.
• Flash memories are everywhere, providing fast and durable nonvolatile storage for a slew of
electronic devices, including digital cameras, cell phones, and music players, as well as laptop,
desktop, and server computer systems.
Accessing Main Memory
• Data flows back and forth between the processor and the DRAM main memory over
shared electrical conduits called buses.
• Each transfer of data between the CPU and memory is accomplished with a series of
steps called a bus transaction.
• A read transaction transfers data from the main memory to the CPU.
• A write transaction transfers data from the CPU to the main memory.
• A bus is a collection of parallel wires that carry address, data, and control signals.
• Depending on the particular bus design, data and address signals can share the
same set of wires or can use different sets.
• Also, more than two devices can share the same bus.
• The control wires carry signals that synchronize the transaction and identify what
kind of transaction is currently being performed.
• For example, is this transaction of interest to the main memory, or to some other I/O
device such as a disk controller? Is the transaction a read or a write? Is the information
Disk Storage
• Disks are workhorse storage devices that hold enormous amounts of data, on the order of
hundreds to thousands of gigabytes, as opposed to the hundreds or thousands of megabytes in a
RAM-based memory
Disk Geometry
Disks are constructed from platters. Each platter consists of two sides, or surfaces, that are coated with magnetic recording
material.
A rotating spindle in the center of the platter spins the platter at a fixed rotational rate, typically between 5,400 and 15,000
revolutions per minute (RPM).
A disk will typically contain one or more of these platters encased in a sealed container.
• Figure 6.9(a) shows the geometry of a typical disk surface.
• Each surface consists of a collection of concentric rings called tracks.
• Each track is partitioned into a collection of sectors.
• Each sector contains an equal number of data bits (typically 512 bytes) encoded
in the magnetic material on the sector.
• Sectors are separated by gaps where no data bits are stored.
• Gaps store formatting bits that identify sectors.
• A disk consists of one or more platters stacked on top of each other and encased
in a sealed package, as shown in Figure 6.9(b).
• The entire assembly is often referred to as a disk drive, although we will usually
refer to it as simply a disk.
• We will sometimes refer to disks as rotating disks to distinguish them from
flash-based solid state disks (SSDs), which have no moving parts.
• Disk manufacturers describe the geometry of multiple-platter drives in terms
of cylinders, where a cylinder is the collection of tracks on all the surfaces that are
equidistant from the center of the spindle.
For example, if a drive has three platters and six surfaces, and the tracks on each
surface are numbered consistently, then
cylinder k is the collection of the six instances of track k.
Disk Capacity
• The maximum number of bits that can be recorded by a disk is known as its maximum
capacity, or simply capacity.
• Disk capacity is determined by the following technology factors:
• Recording density (bits/in).
• The number of bits that can be squeezed into a 1- inch segment of a track.
• Track density (tracks/in).
• The number of tracks that can be squeezed into a 1-inch segment of the radius extending from
the center of the platter.
• Areal density (bits/in2).
• The product of the recording density and the track density.
Disk Operation
• Disks read and write bits stored on the magnetic surface using a read/write head connected to
the end of an actuator arm, as shown in Figure 6.10(a).
• By moving the arm back and forth along its radial axis, the drive can position the head over any
track on the surface. This mechanical motion is known as a seek.
• Once the head is positioned over the desired track, then, as each bit on the track passes
underneath, the head can either sense the value of the bit (read the bit) or alter the value of the
bit (write the bit).
• Disks with multiple platters have a separate read/write head for each surface, as shown in
Figure 6.10(b).
• The heads are lined up vertically and move in unison. At any point in time, all heads are
positioned on the same cylinder.
• Disks read and write data in sector-size blocks.
• The access time for a sector has three main components: seek time, rotational
latency, and transfer time:
Seek time.
• To read the contents of some target sector, the arm first positions the head over
the track that contains the target sector.
• The time required to move the arm is called the seek time.
• The seek time, Tseek, depends on the previous position of the head and the speed
that the arm moves across the surface.
• The average seek time in modern drives, Tavg seek, measured by taking the mean
of several thousand seeks to random sectors, is typically on the order of 3 to 9
ms.
• The maximum time for a single seek, Tmax seek, can be as high as 20 ms.
Rotational latency.
• Once the head is in position over the track, the drive waits for the first bit of
the target sector to pass under the head.
• The performance of this step depends on both the position of the surface when
the head arrives at the target track and the rotational speed of the disk.
• In the worst case, the head just misses the target sector and waits for the disk
to make a full rotation. Thus, the maximum rotational latency, in seconds, is
given by
• Tmax rotation = 1/ RPM × 60 secs / 1 min
• The average rotational latency, Tavg rotation, is simply half of Tmax rotation
• Transfer time. When the first bit of the target sector is under the head, the drive can begin to read or
write the contents of the sector.
• The transfer time for one sector depends on the rotational speed and the number of sectors per track.
• Thus, we can roughly estimate the average transfer time for one sector in seconds as
• Tavg transfer = 1 / RPM × 1 / (average # sectors/track) × 60 secs /1 min
Connecting I/O Devices
• Input/output (I/O) devices such as graphics cards, monitors, mice, keyboards, and disks are connected
to the CPU and main memory using an I/O bus.
• the I/O bus is slower than the system and memory buses, it can accommodate a wide variety of third-
party I/O devices. For example, the bus in Figure 6.11 has three different types of devices attached to
it.
• A Universal Serial Bus (USB) controller is a conduit for devices attached to a USB bus, which is
a wildly popular standard for connecting a variety of peripheral I/O devices, including keyboards,
mice, modems, digital cameras, game controllers, printers, external disk drives, and solid state disks.
• USB 3.0 buses have a maximum bandwidth of 625 MB/s. USB 3.1 buses have a maximum bandwidth
of 1,250 MB/s.
• A graphics card (or adapter) contains hardware and software logic that is responsible for
painting the pixels on the display monitor on behalf of the CPU.
• . A host bus adapter that connects one or more disks to the I/O bus using a communication
protocol defined by a particular host bus interface.
• The two most popular such interfaces for disks are SCSI (pronounced “scuzzy”) and SATA
(pronounced “sat-uh”). SCSI disks are typically faster and more expensive than SATA drives.
• A SCSI host bus adapter (often called a SCSI controller) can ssupport multiple disk drives, as opposed
• A solidSolid
state disk (SSD) is a storage technology, based on flash
State Disks
An SSD package consists of one or more flash memory chips, which replace the mechanical drive in a conventional rotating disk,
and a flash translation layer, which is a hardware/firmware device that plays the same role as a disk controller, translating
requests for logical blocks into accesses of the underlying physical device.
Notice that reading from SSDs is faster than writing.
The difference between random reading and writing performance is caused by a fundamental property of the underlying flash
memory.
As shown in Figure 6.13, a flash memory consists of a sequence of B blocks, where each block consists of P pages.
Typically, pages are 512 bytes to 4 KB in size, and a block consists of 32–128 pages, with total block sizes ranging from 16 KB to
512 KB
• Data are read and written in units of pages.
• A page can be written only after the entire block to which it belongs has been erased (typically,
this means that all bits in the block are set to 1).
• once a block is erased, each page in the block can be written once with no further erasing.
• A block wears out after roughly 100,000 repeated writes. Once a block wears out, it can no
longer be used.
• Random writes are slower for two reasons.
• First, erasing a block takes a relatively long time, on the order of 1 ms, which is more than an
order of magnitude longer than it takes to access a page.
• Second, if a write operation attempts to modify a page p that contains existing data (i.e., not
all ones), then any pages in the same block with useful data must be copied to a new (erased)
block before the write to page p can occur
• SSDs have a number of advantages over rotating disks. \
• They are built of semiconductor memory, with no moving parts, and thus have
much faster random access times than rotating disks, use less power, and are
more rugged.
• However, there are some disadvantages.
• First, because flash blocks wear out after repeated writes, SSDs have the
potential to wear out as well.
• Wear-leveling logic in the flash translation layer attempts to maximize the
lifetime of each block by spreading erasures evenly across all blocks. In practice,
the wear-leveling logic is so good that it takes many years for SSDs to wear out.
• Second, SSDs are about 30 times more expensive per byte than rotating disks,
and thus the typical storage capacities are significantly less than rotating disks.
The Memory Hierarchy.
• In general, the storage devices get slower, cheaper, and larger as we move from higher to lower levels.
• At the highest level (L0) are a small number of fast CPU registers that the CPU can access in a single clock cycle.
• Next are one or more small to moderate-size SRAM-based cache memories that can be accessed in a few CPU clock cycles.
• These are followed by a large DRAM-based main memory that can be accessed in tens to hundreds of clock cycles.
• Caching in the Memory Hierarchy
• In general, a cache (pronounced “cash”) is a small, fast storage device that acts as a staging area for the data objects stored in
a larger, slower device.
• The process of using a cache is known as caching (pronounced “cashing”).
• The central idea of a memory hierarchy is that for each k, the faster and smaller storage device at level k serves as a cache for
the larger and slower storage device at level k + 1.
• In other words, each level in the hierarchy caches data objects from the next lower level.
• For example, the local disk serves as a cache for files (such as Web pages) retrieved from remote disks over the network, the
main memory serves as a cache for data on the local disks, and so on, until we get to the smallest cache of all, the set of CPU
registers.
• The storage at level k + 1 is partitioned into contiguous chunks of data objects called blocks.
• Each block has a unique address or name that distinguishes it from other blocks.
• Blocks can be either fixed size (the usual case) or variable size (e.g., the remote HTML files stored on Web servers).
• For example, the level k + 1 storage in Figure 6.22 is partitioned into 16 fixed-size blocks, numbered 0 to 15.
Cache Hits
• When a program needs a particular data object d from level k + 1, it first looks for d in one of the blocks currently
stored at level k.
• If d happens to be cached at level k, then we have what is called a cache hit.
• The program reads d directly from level k, which by the nature of the memory hierarchy is faster than reading d from
level k + 1. For example, a program with good temporal locality might read a data object from block 14, resulting in a
cache hit from level k.
Cache Misses
• If, on the other hand, the data object d is not cached at level k, then we have what is called a cache miss.
• When there is a miss, the cache at level k fetches the block containing d from the cache at level k + 1,
possibly overwriting an existing block if the level k cache is already full.
• This process of overwriting an existing block is known as replacing or evicting the block.
• The block that is evicted is sometimes referred to as a victim block.
• The decision about which block to replace is governed by the cache’s replacement policy.
• For example, a cache with a random replacement policy would choose a random victim block.
• A cache with a least recently used (LRU)replacement policy would choose the block that was last accessed the
furthest in the past.
• Kinds of Cache Misses
• It is sometimes helpful to distinguish between different kinds of cache misses.
• If the cache at level k is empty, then any access of any data object will miss.
• An empty cache is sometimes referred to as a cold cache, and misses of this kind are called compulsory misses or cold
misses.
• Cold misses are important because they are often transient events that might not occur in steady state, after the cache
has been warmed up by repeated memory accesses.
2. Restrictive placement policies of this kind lead to a type of miss known as a conflict miss, in which the cache is
large enough to hold the referenced data objects, but because they map to the same cache block, the cache keeps
• For example, in Figure 6.22, if the program requests block 0, then block 8, then block 0, then block 8, and so
• on, each of the references to these two blocks would miss in the cache at level k, even though this cache can hold a
total of four blocks.
3 Programs often run as a sequence of phases (e.g., loops) where each phase accesses some reasonably constant set of
cache blocks.
• For example, a nested loop might access the elements of the same array over and over again.
• This set of blocks is called the working set of the phase.
• When the size of the working set exceeds the size of the cache, the cache will experience what are known as capacity
misses.
• In other words, the cache is just too small to handle this particular working set.
Cache Memories
• The memory hierarchies of early computer systems consisted of only three levels:
• CPU registers, main memory, and disk storage.
• However, because of the increasing gap between CPU and main memory, system designers were compelled to insert a
small SRAM cache memory, called an L1 cache (level 1 cache) between the CPU register file and main memory, as
shown in Figure 6.24. The L1 cache can be accessed nearly as fast as the registers, typically in about 4 clock cycles.
• As the performance gap between the CPU and main memory
continued to increase, system designers responded by inserting an
additional larger cache, called an L2 cache, between the L1 cache
and main memory, that can be accessed in about 10 clock cycles.
• Many modern systems include an even larger cache, called an L3
cache, which sits between the L2 cache and main memory in the
memory hierarchy and can be accessed in about 50 cycles.
• While there is considerable variety in the arrangements, the
general principles are the same.
• Consider a computer system where each memory address has m
bits that form M = 2m unique addresses.
• As illustrated in Figure 6.25(a), a cache for such a machine is
organized as an array of S = 2s cache sets.
• Each set consists of E cache lines.
• Each line consists of a data block of B = 2b bytes, a valid bit that
indicates
• whether or not the line contains meaningful information, and t = m
− (b + s) tag bits (a subset of the bits from the current block’s
memory address) that uniquely identify the block stored in the
cache line.
• In general, a cache’s organization can be characterized by the tuple (S, E, B, m).
• The size (or capacity) of a cache, C, is stated in terms of the aggregate size of all the blocks. The tag bits and valid bit are not
included. Thus, C = S × E × B.
• When the CPU is instructed by a load instruction to read a word from address A of main memory, it sends address A to the
cache. If the cache is holding a copy of the word at address A, it sends the word immediately back to the CPU.
• 6.4.2 Direct-Mapped Caches
• Caches are grouped into different classes based on E, the number of cache lines per set.
• A cache with exactly one line per set (E = 1) is known as a direct-mapped cache (see Figure 6.27). Direct-mapped caches are
the simplest both to implement and to understand, so we will use them to illustrate some general concepts about how caches
work.
6.4.3 Set Associative Caches
• The problem with conflict misses in direct-mapped caches stems from the
constraint that each set has exactly one line (or in our terminology, E = 1).
• A set associative cache relaxes this constraint so that each set holds more
than one cache line.
• A cache with 1<E <C/B is often called an E-way set associative cache.
6.4.4 Fully Associative Caches
• A fully associative cache consists of a single set (i.e., E = C/B) that contains all of the cache lines. Figure 6.35 shows the basic
organization.