0% found this document useful (0 votes)
4 views

Unit 4

The document discusses memory organization in computers. It describes how programs and data are stored in memory and accessed via an addressing scheme. It explains the basic concepts of random access memory (RAM), memory access time, and cache memory. It then provides details on the internal organization and operation of static RAM (SRAM) and dynamic RAM (DRAM) chips, including their memory cell designs and addressing.

Uploaded by

ladukhushi09
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 4

The document discusses memory organization in computers. It describes how programs and data are stored in memory and accessed via an addressing scheme. It explains the basic concepts of random access memory (RAM), memory access time, and cache memory. It then provides details on the internal organization and operation of static RAM (SRAM) and dynamic RAM (DRAM) chips, including their memory cell designs and addressing.

Uploaded by

ladukhushi09
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

ECPE13

Computer Architecture and


Organization
Dr. P. Maheswaran
Assistant Professor, Dept. Of ECE
NIT Trichy

[email protected]
Unit 4: Memory organization
Basic concepts:

Programs and the data they operate on are held in the memory of the computer.

The execution speed of programs is highly dependent on the speed of instruction transfer between the processor and
the memory.

Sufficient memory to facilitate execution of large programs having large amounts of data.

Ideal case for memory:

Fast, large, inexpensive memory.

The maximum size of the memory in a computer is determined by the addressing scheme.

A computer with 16-bit address is capable of addressing up to 2 16 = 64K (kilo) memory locations.

32-bit address, 232 = 4G (giga) locations.

64-bit address, 264 = 16E (exa) ≈ 16 × 1018 locations.

The number of locations represents the size of the address space of the computer.

The memory is designed to store and retrieve data in word-length quantities.

Consider a byte-addressable computer, 32-bit address is sent from the processor to the memory unit.

32 bits determine which word will be accessed.
Basic concepts:

The address lines are used to specify the memory location involved in a data transfer operation.

The data lines to transfer the data.

The control lines carry the command indicating:

A Read or a Write operation.

Whether a byte or a word is to be transferred.

Necessary timing information by the memory to indicate when it has completed the requested operation.

The processor-memory interface asserts MFC based on memory’s response.

Memory access time: The time that elapses between the initiation of an operation to transfer a word of data and the
completion of that operation.

Memory cycle time: the minimum time delay required between the initiation of two successive memory operations.

Eg. the time between two successive Read operations.

Cycle time is slightly longer than the access time.

Random-access memory (RAM): Access time to any location is the same.

Access time of the following devices depends on the address or position of the data:

Serial, partly serial, magnetic and optical disks.
Basic concepts:
Cache and Virtual Memory:

The processor usually process instructions and data faster than they can be fetched from the main memory.

The memory access time is the bottleneck in the system.

Reduced using a cache memory.

A small, fast memory inserted between the larger, slower main memory and the processor.

Holds the currently active portions of a program and their data.

Virtual memory:

The active portions of a program are stored in the main memory.

The remainder is stored on the much larger secondary storage device.

Sections of the program are transferred back and forth between the main memory and the secondary storage
device, transparent to the application program.

The application program sees a memory that is much larger than the computer’s physical main memory.

Block Transfers:

Data move frequently between: 1. The main memory and the cache, 2. The main memory and the disk.

Data are always transferred in contiguous blocks involving tens, hundreds, or thousands of words.
Semiconductor RAM memories:
Internal Organization of Memory Chips:
 Usually organized in the form of an array, each cell is capable of storing one bit of information.
 Each row of cells constitutes a memory word in Fig. 8.2.
 All cells of a row are connected to a common line referred to as the word line (driven by address decoder).
 Each column are connected to a Sense/Write circuit by two bit lines.
 Sense/write circuit connected to the data input/output lines of the chip.
 Fig. 8.2 gives memory circuit of 16 words of 8-bits each (16 × 8 organization).
 The R /W’ (Read /Write’) input specifies the required operation.
 The CS (Chip Select) input selects a chip.
 The memory circuit:
 Stores 128 bits.
 Address lines: 4, Data lines: 8, Control lines: 2 (total = 14 lines).
 Two lines for power supply and ground.
Semiconductor RAM memories:
Internal Organization of Memory Chips:
 1K (1024) memory cells can be organized as:
 128 × 8 format:
 Address lines: 7, Control lines: 2, Data lines: 8 (total 17 + 2 power/ground lines).
 1K × 1 format:
 Address lines: 10, Control lines: 2, Data lines: 1 (total 13 + 2 power/ground lines).
 10 bit address lines divided into 5-bit row/column address lines.
 Row address selects a row of 32 cells. One of these cells is selected by column address.
 A 1G-bit chip may have:
 256M × 4 format:
 Address lines: 28, Control lines: 2, Data lines: 4.
Semiconductor RAM memories:
Static Memories:
 Memories capable of retaining their state as long as power is applied.
 Fig. 8.4 shows a static RAM (SRAM) cell.
 Two inverters are cross-connected to form a latch.

 The latch is connected to two bit lines by transistors T 1 and T2.

 T1 and T2 act as switches, controlled by word line.

 Word line is at ground level, T1/T2 are turned off and the latch retains its state.
 Example: The logic value at point X = 1, at Y=0. This state represents 1.
 State is maintained as long as word line is at ground level.
 Read Operation:

 Close T1 and T2. If cell in state 1, b=1, b’ = 0.


 Sense/Write circuit monitors two bit lines, and sets the corresponding output.
 Write Operation:
 The Sense/Write circuit drives bit lines b and b’.
 It places b and b’ on appropriate lines, then activates word line.
Semiconductor RAM memories:
CMOS Cell:
 Transistor pairs (T3 , T5) and (T4, T6) form the inverters in the latch.
 Continuous power is needed for the cell to retain its state.
 The cell’s contents are lost when power is interrupted.
 When power is restored: The latch settles into a stable state. Previous cell state will be lost.
 SRAMs are volatile.
 A major advantage of CMOS SRAMs: very low power consumption.
 Access times for SRAMs are on the order of a few nanoseconds.
 SRAMs are used where speed is of critical concern.
Semiconductor RAM memories:
Dynamic RAMs:
 SRAMs require many transistors.
 Less expensive and higher density RAMs can be implemented with simpler cells.
 These simpler cells do not retain their state for a long period.
 They are accessed frequently for Read or Write operations.
 Memories that use such cells are called dynamic RAMs (DRAMs).
 Information is stored in a dynamic memory cell in the form of a charge on a capacitor.
 This charge can be maintained for only tens of milliseconds.
 The cell is required to store information for a much longer time.
 Its contents must be periodically refreshed by restoring the capacitor charge to its full value.
 Occurs when the contents of the cell are read or when new information is written into it.
 To store information in Fig. 8.6 cell:
 Transistor T is turned on.
 An appropriate voltage is applied to the bit line.
 After the transistor is turned off, the capacitor begins to discharge.
 T conduct a tiny amount of current.
Semiconductor RAM memories:
Dynamic RAMs:
 The information stored in the cell can be retrieved only if it is read before the charge in the capacitor drops below some
threshold value.
 Read operation:
 The transistor in that cell is turned on.

 Sense amplifier connected to the bit line: Compares QC with Qth.

 If QC > Qth, then the sense amplifier drives the bit line to the full voltage representing the logic value 1.

The capacitor is recharged to the full charge corresponding to the logic value 1.
 If QC < Qth, the sense amplifier pulls the bit line to ground level to discharge the capacitor fully.

Reading the contents of a cell automatically refreshes its contents.
 The word line is common to all cells in a row.
 All cells in a selected row are read and refreshed at the same time.
Semiconductor RAM memories:
Dynamic RAMs:
 A 256-Megabit DRAM chip, configured as 32M × 8, is shown.
 The cells are organized in the form of a 16K × 16K array.
 16,384 cells in each row are divided into 2,048 groups of 8, forming 2,048 bytes of data.

 To select a row: 14 address bits (=log2(16,384)), To select a group: 11 address bits (=log2(2,048)).
 Total 25 address bits needed to access a byte.
 The row and column addresses are multiplexed on 14 pins, to reduce number of pins.
Read or a Write operation:
 The row address is applied first. Loaded into the row address latch in response to control line called the Row Address Strobe
(RAS).
 Read operation to be initiated. All cells in the selected row are read and refreshed.
After the row address is loaded:
 The column address is applied to the address pins.
 Loaded into the column address latch, based on the Column Address Strobe (CAS).
 The information in this latch is decoded.
 The appropriate group of 8 Sense/Write circuits is selected.
Semiconductor RAM memories:
Dynamic RAMs:
If the R/W control signal indicates a Read operation:
 The output values of the selected circuits are transferred to the data lines, D7−0.

Write operation:
 The information on the D7−0 lines is transferred to the selected circuits, used to overwrite the contents of the selected cells in the

corresponding 8 columns.

 RAS’ and CAS’ are active when RAS/CAS=0.


 The timing of the operation of the DRAM is controlled by the RAS and CAS signals.
 RAS and CAS are generated by a memory controller circuit

When the processor issues a Read or a Write command.
 During a Read operation:

Output data are transferred to the processor.

A delay equivalent to the memory’s access time is incurred.

Due to this, it is called asynchronous DRAMs.
 The memory controller is responsible for refreshing the data stored in the memory chips.
Semiconductor RAM memories:
Dynamic RAMs: Fast page mode
 The contents of all 16,384 cells in the selected row are sensed when RAS is active.

 After CAS is active, only 8 bits are placed on the data lines, D7−0, selected by A10−0.

The other bytes in the same row can be accessed without having to reselect the row.

Each sense amplifier also acts as a latch.
 When a row address is applied, the contents of all cells in the selected row are loaded into the corresponding latches.
 It is only necessary to apply different column addresses to place the different bytes on the data lines.
 All bytes in the selected row can be transferred in sequential order.

Apply a sequence of column addresses under the control of successive CAS signals.

A block of data can be transferred at a much faster rate (unlike random addresses).
 The block transfer capability is referred to as the fast page mode feature.
 The vast majority of main memory transactions involve block transfers.
 The faster rate of the fast page mode makes dynamic RAMs suitable.
Semiconductor RAM memories:
Synchronous DRAMs:
 DRAMs whose operation is synchronized with clock signal.
 The cell array is the same as in asynchronous DRAMs.

The use of a clock signal distinguishes it from DRAMs.
 SDRAMs have built-in refresh circuitry with a refresh counter.

Provides the addresses of the rows to be selected for refreshing.
 The address and data connections of an SDRAM are buffered by means of registers.
 The Sense/Write amplifiers function as latches.

A Read operation causes the contents of all cells in the selected row to be loaded into these latches.

The data in the latches of the selected column are transferred into the data register.
 The buffer registers are useful when transferring large blocks of data at very high speed.

Isolates external connections from the chip’s internal circuitry.

Possible to start a new access operation while data are being transferred.
 SDRAMs have several different modes of operation.

Can be selected by writing control information into a mode register.
Semiconductor RAM memories:
Synchronous DRAMs:
 CAS line to select successive columns:

Generated internally using a column counter and the clock signal.

New data are placed on the data lines at the rising edge of each clock pulse.
1. The row address is latched under control of the RAS signal.
2. Memory takes some clock cycles to activate selected rows.
3. The column address is latched under control of the CAS signal.
4. The first set of data bits is placed on the data lines. (with one clock cycle delay)
5. SDRAM automatically increments the column address to access the next three sets of bits in the selected row.

 Synchronous DRAMs can deliver data at a very high rate.



As all the control signals needed are generated inside the chip.
Semiconductor RAM memories:
Synchronous DRAMs: Latency and bandwidth
 Memory latency: The amount of time it takes to transfer the first word of a block.

In Fig. 8.9, latency is 5 clock cycles.

If the clock rate is 500 Mhz, latency is 10 ns.
 Bandwidth:

The number of bits or bytes that can be transferred in one second.

Double-Data-Rate SDRAM:
 Take advantage of the fact that a large number of bits are accessed at the same time inside the chip when a row address is
applied.
 Data are transferred externally on both the rising and falling edges of the clock.
 Memories that use this technique are called double-data-rate SDRAMs (DDR SDRAMs).
 DDR2 and DDR3: operate at clock frequencies of 400 and 800 Mhz, respectively.
 They transfer data with clock speeds of 800 and 1600 MHz, respectively.
Read-Only memories:
 Static and dynamic RAM chips are volatile.
 Some applications require memory devices that retain the stored information when power is turned off.
 A special writing process is needed to place the information into a nonvolatile memory.
 A memory is called a read-only memory, or ROM when information can be written into it only once at the time of manufacture.
 ROM cell in Fig. 8.11:
 A logic value 0 is stored in the cell if the transistor is connected to ground at point P.
 Otherwise, a 1 is stored.
 The bit line is connected through a resistor to the power supply.
 To read the state of the cell, the word line is activated to close the transistor switch.
 The voltage on the bit line:
 Near zero if there is a connection between the transistor and ground.
 High if there is no connection to ground.
 A sense circuit at the end of the bit line generates the proper output value.
 The connection to ground in each cell is determined when the chip is manufactured.
 Using a mask with a pattern that represents the information to be stored.
Read-Only memories:
Programmable ROM (PROM):
 Programmability is achieved by inserting a fuse at point P.
 Before programmed, the memory contains all 0s.
 The user can insert 1s at the required locations by burning out the fuses.
 Burning process is irreversible.
 The cost of preparing the masks needed for storing a particular information pattern makes ROMs costeffective only in large
volumes.
 PROMs provides a more convenient and considerably less expensive approach
 Can be programmed by the user directly.
Erasable PROM (EPROM):
 It allows the stored data to be erased and new data to be written into it.
 Erasable, reprogrammable ROM.
 EPROMs are capable of retaining stored information for a long time.
 While software is being developed, memory changes and updates can be easily made.
 The transistor T is normally turned off, creating an open switch.
 T can be turned on by injecting charge into it that becomes trapped inside.
 Erasure requires dissipating the charge trapped in the transistors that form the memory cells.
 Exposing the chip to ultraviolet light erases the memory.
Read-Only memories:
Electrically Erasable PROM (EEPROM):
 An EPROM must be physically removed from the circuit for reprogramming.
 The stored information cannot be erased selectively.
 The entire contents of the chip are erased when exposed to ultraviolet light.
 EEPROM can be programmed, erased, and reprogrammed electrically.
 No need to remove chip for erasure, selective erasure possible.
 Disadvantage of EEPROM:

Different voltages are needed for erasing, writing, and reading the stored data.

This increases circuit complexity.
Read-Only memories:
Flash memory:
 A flash cell is based on a single transistor controlled by trapped charge.
 Like an EEPROM, it is possible to read the contents of a single cell.
 It is only possible to write an entire block of cells.
 Prior to writing, the previous contents of the block are erased.
 Flash devices have greater density, which leads to higher capacity and a lower cost per bit.
 They require a single power supply voltage, and consume less power in their operation.
 Used in: hand-held computers, cell phones, digital cameras, and MP3 music players.
Flash Cards:
 Mount flash chips on a small card.
 Flash cards with a USB interface are widely used and are commonly known as memory keys.
Flash Drives:
 Larger flash memory modules have been developed to replace hard disk drives.
 Designed to fully emulate hard disks.
 The storage capacity of flash drives is significantly lower.
 Flash drives are solid state electronic devices with no moving parts.
 Shorter access times, insensitive to vibration, lower power consumption.
Memory hierarchy:
 An ideal memory would be fast, large, and inexpensive.
 A very fast memory can be implemented using static RAM chips.
 Not suitable for implementing large memories (basic cells are larger).
 Consumes more power.
 Cost effective solution for voluminous data: Magnetic disks.
 Secondary storage, available at a reasonable cost.
 They are used extensively in computer systems.
 Much slower than semiconductor memory units.
 Dynamic RAM:
 Main memory is built with this.
 Large and considerably faster, yet affordable.
 Static RAM:
 Used in smaller units where speed is of the essence.
 Cache memories are built with this.
Memory hierarchy:
 The fastest access is to data held in processor registers.
 At the top in terms of speed of access.
 The registers provide only a minuscule portion of the required memory.
 Processor cache - implemented directly on the processor chip.
 Holds copies of the instructions and data from Main memory.
 A small primary cache (L1 cache) is always located on the processor chip.

Access time comparable to that of processor registers.
 Secondary (L2) cache is placed between the L1 cache and the rest of the memory.

Larger and slower.

Also placed on the processor chip.
 Some computers have L3 cache.

Larger than L1, L2.

Also implemented with SRAM technology.

May or may not be on same chip as L1 and L2.
Memory hierarchy:
Main memory:
 A large memory implemented using dynamic memory components.
 Assembled in memory modules such as DIMMs.

Much larger but significantly slower than cache memories.

Access time of main memory = 100xL1 cache access time.
Disk devices:
 Disk devices provide a very large amount of inexpensive memory.
 Widely used as secondary storage in computer systems.
 Very slow compared to the main memory.

During program execution, the speed of memory access is of utmost importance.


Bring the instructions and data about to be used as close to the processor as possible.
Memory interfacing circuits:
 Blocks of data are transferred between the main memory and I/O devices (disks).
 Techniques are needed for controlling such transfers without frequent, program-controlled intervention by the processor.
 Load instruction loads data from I/O device to register R2.

The data read are stored into a memory location.
 From the memory to an I/O device: The reverse operation takes place.
 A data transfer instruction is executed only after processor determines I/O ready by:

Polling its status register or by waiting for an interrupt request.

Considerable overhead is incurred: several instructions are executed for each word.
 When transferring a block of data:

Instructions needed to increment the memory address and keep track of the word count.
 Use of interrupts involves operating system routines to:

Save and restore processor registers, the program counter, and other state information.
 An alternative approach: Transfer blocks of data directly between the main memory and I/O devices.
 A special control unit is provided to manage the transfer, without continuous intervention by the processor.
 This approach is called direct memory access (DMA).
Memory interfacing circuits:
 The unit that controls DMA transfers is referred to as a DMA controller.

Part of the I/O device interface or a separate unit shared by a number of I/O devices.

Performs the functions that would done by the processor when accessing the main memory.

For each transfer, it provides the memory address and generates all the control signals needed.

It increments the memory address for successive words and keeps track of the number of transfers.
 A DMA controller transfers data without intervention by the processor.

Its operation is under the control of a program executed (OS routine) by the processor.
 To initiate the transfer of a block of words, the processor sends to the DMA controller:

The starting address.

The number of words in the block.

The direction of the transfer.
 The DMA controller then proceeds to perform the requested operation.
 When the entire block has been transferred, it informs the processor by raising an interrupt.
Memory interfacing circuits:
 The DMA controller registers are shown in figure.
 Two registers are used for storing the starting address and the word count.
 The third register contains status and control flags.

R’/W - bit determines the direction of the transfer (1 – read data from the memory to the I/O, 0 – write).

Done = 1, when the controller completed transferring a block of data, ready to receive another command.

Interrupt-enable (IE) = 1, causes the controller to raise an interrupt after it has completed transferring a block of
data.

IRQ = 1, when controller has requested an interrupt.
Memory interfacing circuits:
 One DMA controller connects a high-speed Ethernet to the computer’s I/O bus.
 The disk controller controls two disks:

Has DMA capability and provides two DMA channels.

Can perform two independent DMA operations.

The control registers are duplicated.
 To start a DMA transfer of a block of data from the main memory to one of the disks:

OS routine writes the address and word count information into the registers of the disk controller.

The DMA controller proceeds independently to implement the specified operation.

Done = 1, when transfer is complete.

IE=1 is also set.

The controller sends an interrupt request to the processor, sets IRQ = 1.

Status register can record:

Whether the transfer took place correctly or errors occurred.
Cache Memories:
 A small and very fast memory, interposed between the processor and the main memory.
 Makes the main memory appear to the processor to be much faster than it actually is.

Based on the property of locality of reference.
 Most of computer program execution time is spent in routines in which many instructions are executed repeatedly.

Like a simple loop, nested loops, or a few procedures that repeatedly call each other.

Many instructions in localized areas of the program are executed repeatedly during some time period.
 Temporal and spatial manifestation of this property:

Temporal: A recently executed instruction is likely to be executed again very soon.

Spatial: Instructions close to a recently executed instruction are also likely to be executed soon.
 Operation of cache memory:

The memory control circuitry is designed to take advantage of the property of locality of reference.

Temporal locality: Whenever an information item, instruction or data, is first needed, this item should be brought into the
cache.

It is likely to be needed again soon.

Spatial locality: Fetch several items that are located at adjacent addresses as well.
 Cache block or line refers to a set of contiguous address locations of some size.
Cache Memories:
 The processor issues a Read request.

The contents of a block of memory words specified are transferred into the cache.

When the program references any of the locations in this block, the contents are read from the cache.
 The cache memory can store a reasonable number of blocks at any given time.

Number is small compared to the total number of blocks in the main memory.
 The correspondence between the main memory blocks and those in the cache is specified by a mapping function.
 When the cache is full and a memory word (instruction or data) that is not in the cache is referenced:

The cache control hardware decide which block should be removed.

Create space for the new block that contains the referenced word.
 The collection of rules for making this decision constitutes the cache’s replacement algorithm.
Cache Memories:
Cache Hits:
 The processor issues Read and Write requests using memory location addresses.
 The cache control circuitry determines whether the requested word currently exists in the cache.
 If it does, the Read or Write operation is performed on the appropriate cache location.

A read or write hit is said to have occurred.

Main memory is not involved when there is a cache hit in a Read operation.

For a Write operation:

1. Write-through protocol:

Both the cache location and the main memory location are updated.

Results in unnecessary Write operations in the main memory when a given cache word is updated several times
during its cache residency.

2. Write-back, or copy-back protocol:

Update only the cache location.

Mark the block containing it with an associated dirty or modified bit.

The main memory location of the word is updated later.

When the block containing this marked word is removed from the cache.

This protocol is used most often.
Cache Memories:
Cache Misses:
 A Read operation for a word that is not in the cache constitutes a Read miss.

The block of words containing the requested word are copied from the main memory into the cache.

The particular word requested is forwarded to the processor from the block.

Load-through, or early restart approach:

The fetched word may be sent to the processor as soon as it is read from the main memory.

Reduces the processor’s waiting time at the expense of more complex circuitry.
Write miss:
 In write-through protocol:

The information is written directly into the main memory.
 In write-back protocol:

The block containing the addressed word is first brought into the cache.

The desired word in the cache is overwritten with the new information.
Cache Memories: Mapping Functions
 Methods for determining where memory blocks are placed in the cache.
Example:
 A cache consisting of 128 blocks of 16 words each (total of 2048 (2K) words).
 The main memory has 16-bit address (65536 (64K) words). 4K blocks of 16 words each.
 Consecutive addresses refer to consecutive words.
Direct Mapping:
 Block j of the main memory maps onto block j modulo 128 of the cache.
 0, 128, 256,... in main memory go to 0 in cache.
 1, 129, 257,... in main memory go to 1 in cache.
 More than one memory block is mapped onto a given cache block position.

Contention may arise for that position even when the cache is not full.

A program may start in block 1, and continue in block 129 (after branch).

Both of these blocks must be transferred to the block-1 position in the cache.

Contention resolution: Allow the new block to overwrite the currently resident block.

The replacement algorithm is trivial.
Cache Memories: Mapping Functions
Direct Mapping:
 The memory address can be divided into three fields.
 The low-order 4 bits select one of 16 words in a block.
 When a new block enters the cache, the 7-bit cache block field determines the cache position in which this block must be
stored.
 The high-order 5 bits of the memory address of the block are stored in 5 tag bits associated with its location in the cache.
 The tag bits identify which of the 32 main memory blocks mapped into this cache position is currently resident in the cache.
 As execution proceeds:

Processor generates 16-bit address. 7-bit cache block field points to a particular block location in the cache.

High-order 5 bits of the address are compared with the tag bits associated with that cache location.

If tag bits match, the desired word is in that block of the cache.

If there is no match:

The block containing the required word must first be read from the main memory loaded into the cache.
Cache Memories: Mapping Functions
Associative Mapping:
 A main memory block can be placed into any cache block position.
 12 tag bits are required to identify a memory block when it is resident in the cache.
 The tag bits from the processor are compared to the tag bits of each block of the cache to check presence of block.
 Gives freedom of choosing the cache location in which to place the memory block.

A more efficient use of the space in the cache.
 When a new block is brought into the cache:

It replaces (ejects) an existing block only if the cache is full.

Need an algorithm to select the block to be replaced.
 The complexity of an associative cache is higher than that of a direct-mapped cache.

The need to search all 128 tag patterns to determine whether a given block is in the cache.
 To avoid a long delay, the tags must be searched in parallel.
 A search of this kind is called an associative search.
Cache Memories: Mapping Functions
Set-Associative Mapping:
 Use a combination of the direct- and associative-mapping techniques.
 The blocks of the cache are grouped into sets.

The mapping allows a block of the main memory to reside in any block of a specific set.

The contention problem of the direct method is eased by having a few choices for block placement.

The hardware cost is reduced by decreasing the size of the associative search.
 A cache with two blocks per set is shown in Fig. 8.18.

Memory blocks 0, 64, 128, . . . , 4032 map to cache set 0.

Occupy either of the two block positions within this set.

The 6-bit set field of the address determines which set of the cache might contain the desired

block.

Tag field of the address must then be associatively compared to the tags of the two blocks

of the set to check if the desired block is present.

Two-way associative search is simple to implement.
Cache Memories:
Stale Data:
 When power is first turned on, the cache contains no valid data.
 Valid bit: provided for each cache block to indicate whether the data in that block are valid.

The modified, or dirty bit is different from this.
 When system power is turned on: valid bits of all cache blocks are set to 0.
 When new programs or data are loaded from the disk into the main memory: valid bits may also be set to 0.

In DMA, data transferred from the disk to the main memory, bypassing the cache.

If the memory blocks being updated are currently in the cache, the valid bits of the corresponding cache blocks are set to 0.
 As program execution proceeds:

The valid bit of a given cache block is set to 1 when a memory block is loaded into that location.

The processor fetches data from a cache block only if its valid bit is equal to 1.
 The valid bit makes sure that the processor does not fetch stale data from the cache.
Replacement Algorithms:
Direct-mapped cache:
 The position of each block is predetermined by its address.
 The replacement strategy is trivial.
Associative and Set-associative caches:
 A new block is to be brought into the cache.
 All the positions that it may occupy are full.
 The cache controller must decide which of the old blocks to overwrite.

The decision can be a strong determining factor in system performance.
 The objective is to keep blocks in the cache that are likely to be referenced in the near future.

It is not easy to determine which blocks are about to be referenced.

The property of locality of reference in programs gives a clue to a reasonable strategy.

A high probability that the blocks that have been referenced recently will be referenced again soon.
 When a block is to be overwritten:

Overwrite the one that has gone the longest time without being referenced.

This block is called the least recently used (LRU) block.

LRU replacement algorithm.
Replacement Algorithms:
LRU algorithm:
 The cache controller must track references to all blocks as computation proceeds.
 To track the LRU block of a four-block set in a set-associative cache:

A 2-bit counter can be used for each block.
 When a hit occurs:

The counter of the block that is referenced is set to 0.

Counters with values originally lower than the referenced one are incremented by one.

All others remain unchanged.
 When a miss occurs and the set is not full:

The counter associated with the new block loaded from the main memory is set to 0.

The values of all other counters are increased by one.
 When a miss occurs and the set is full:

The block with the counter value 3 is removed.

The new block is put in its place, its counter is set to 0.

The other three block counters are incremented by one.
 The counter values of occupied blocks are always distinct.
The simplest replacement algorithm: Randomly choose the block to be overwritten.
This algorithm is found to be quite effective in practice.
Improving cache performance: Hit rate and miss penalty
 The success rate in accessing information at various levels of the memory hierarchy: An excellent indicator of the
effectiveness of memory implementation.
 Hit rate: The number of hits stated as a fraction of all attempted accesses.
 Miss rate: The number of misses stated as a fraction of attempted accesses.
 Ideal case: The entire memory hierarchy would appear to the processor as a single memory unit:
 Has the access time of the cache on the processor chip with the size of the magnetic disk.
 The hit rate at different levels of the hierarchy gets us close to this ideal case.
 Hit rates > 0.9 are essential for high-performance computers.
 Performance is affected when a miss occurs:
 Extra time to bring a block of data from a slower unit to a faster unit.
 The processor is stalled waiting for instructions or data.
Improving cache performance: Hit rate and miss penalty
 Miss penalty: The total access time seen by the processor when a miss occurs.
 Consider a system with only one level of cache:
 The miss penalty is entirely due to the main memory access time.
 h – hit rate, M - miss penalty, C - the time to access information in the cache.
 The average access time experienced by the processor:
Example:
 Access times to the cache = τ , to the main memory = 10τ.
 During cache miss: A block of 8 words is transferred from the main memory to the cache.
 First word of the block = 10τ sec, remaining 7 words = τ sec every word.
 The initial access delay to the cache (due to miss) = τ
 To transfer the word to the processor from cache (no load-through) = τ
 Miss penalty is
 Read or Write operation in a program: 30%.
 130 memory accesses for every 100 instructions.
Improving cache performance: Hit rate and miss penalty
Example:
 Access times to the cache = τ , to the main memory = 10τ.
 During cache miss: A block of 8 words is transferred from the main memory to the cache.
 First word of the block = 10τ sec, remaining 7 words = τ sec every word.
 The initial access delay to the cache (due to miss) = τ
 To transfer the word to the processor from cache (no load-through) = τ
 Miss penalty is
 Read or Write operation in a program: 30%.
 130 memory accesses for every 100 instructions.
 Instruction hit rate = 0.95, data hit rate = 0.9.
 The miss penalty is the same for both read and write accesses.
 A rough estimate of the improvement in memory performance due to the cache:

 The memory appear almost five times faster than it really is.
 The improvement factor increases as the speed of the cache increases relative to the main memory.
Improving cache performance: Hit rate and miss penalty
Example:
 Assume hit rate for data and instruction = 100%.
 Estimate of the increase in memory access time due to cache miss:

 A 100% hit rate in the cache would make the memory appear twice as fast as when realistic hit rates are used.
Ways to improve hit rate:
 Make the cache larger - entails increased cost.
 Increase the cache block size while keeping the total cache size constant (due to spatial locality).
 Practical block sizes in the range of 16 to 128 bytes.
Ways to reduce miss penalty:
 Use the load-through approach.
Improving cache performance: Caches on the processor chip
 Information is transferred between different chips:
 Considerable delays occur in driver and receiver gates on the chips.
 It is best to implement the cache on the processor chip.
 Most processors include one L1 cache.
 Two separate L1 caches: one for data, one for instruction.
 High-performance processors: Two levels of caches are used.
 Separate L1 caches for instructions and data.

Must be very fast, determine the memory access time seen by the processor.
 A larger L2 cache.

Can be slower, but should be larger than the L1 caches to ensure a high hit rate.

Speed is less critical because it only affects the miss penalty of the L1 caches.
 L1 caches: Tens of kilobytes.
 L2 caches: Hundreds of kilobytes or megabytes.
Improving cache performance: Caches on the processor chip
 L2 cache reduces the impact of the main memory speed on the performance of a computer.
 The average access time of the L2 cache is the miss penalty of either of the L1 caches.
 The average access time experienced by the processor:

 The number of misses in the L2 cache is given by (1 − h1)(1 − h2).


 h1 = h2 = 0.9, (1 − h1)(1 − h2) = 0.01 (1%).
 The value of M, and the speed of the main memory less critical.
Virtual memory:
 The physical main memory is not as large as the address space of the processor.
 32-bit addresses has an addressable space of 4G bytes.
 Typical main memory size (for 32-bit computer): 1G to 4G.
 A program may not completely fit into the main memory.
 The parts of it not currently being executed are stored on a secondary storage device.
 When these parts are needed, they are brough into main memory.
 They replace other parts that are already in the memory.
 These actions are performed automatically by the OS.
 The scheme is known as virtual memory.
Virtual memory system:
 Programs, and the processor, reference an address space independent of the available
physical main memory space.
 Processor issues binary addresses called virtual or logical addresses.
 Translated to physical addresses by a combination of hardware and software.
Virtual memory:
 If a virtual address refers to a part that is currently in the physical memory:
 The contents of the appropriate location in the main memory are accessed immediately.
 Otherwise:
 The contents of the referenced address brought into a suitable location in the memory.
 Memory Management Unit (MMU) keeps track of which parts of the virtual address space are
in the physical memory.
 When the desired blocks are in the main memory:
 MMU translates the virtual address into the corresponding physical address.
 The requested memory access proceeds in the usual manner.
 If the data are not in the main memory:
 MMU causes the OS to transfer the data from the disk to the memory.
 Performed using the DMA scheme.
Virtual memory: Address translation
 Assume that all programs and data are composed of fixed-length units called pages.
 Page consists of a block of words that occupy contiguous locations in the main memory.
 Pages commonly range from 2K to 16K bytes in length.

The basic unit of information.

Transferred between the main memory and the disk, by MMU.
 Pages should not be too small:

The access time of a magnetic disk is much longer than main memory.

Considerable amount of time to locate the data on the disk.

Once located, the data can be transferred at a high rate.
 If pages are too large:

A substantial portion of a page may not be used.

Unnecessary data will occupy valuable space in the main memory.
 The cache bridges the speed gap between the processor and the main memory.

Implemented in hardware.
 The virtual-memory mechanism bridges the size and speed gaps between the main memory and secondary storage.

Implemented in part by software techniques.
Fragmentation:
 Problem that the memory blocks cannot be allocated to the processes:

Due to their small size and the blocks remain unused.
 When the processes are loaded and removed from the memory they create free space in the memory.

These small blocks cannot be allocated to new upcoming processes.

Internal Fragmentation External Fragmentation

https://afteracademy.com/blog/what-is-fragmentation-and-what-are-its-types
Paging:
 Paging is a non-contiguous memory allocation technique.

Secondary memory and the main memory is divided into equal size partitions.

The partitions of the secondary memory are called pages.

The partitions of the main memory are called frames.

They are divided into equal size partitions to have maximum utilization of the main memory and avoid external fragmentation.
Example:
A process P having process size as 4B, page size as 1B.
There are four pages (say, P0, P1, P2, P3) each of size 1B.
When this process goes into the main memory for execution, depending upon the availability, it may be stored in non-contiguous
fashion in the main memory frame.

https://afteracademy.com/blog/what-are-paging-and-segmentation
Virtual memory: Address translation
 Virtual-memory address-translation for fixed-length pages is shown.
 Virtual address generated by the processor is interpreted as:

Virtual page number (high-order bits) = A.

Offset (low-order bits) - Specifies the location of a particular byte (or word) within
a page.
 Page table: Contains info about the main memory location of each page.

The main memory address where the page is stored.

The current status of the page.
 Page frame: Area in the main memory that can hold one page.
 Page table base register: The starting address of the page table = B.
 Address of the corresponding entry in the page table C = A + B.
 The contents of [C] give the starting address of the page if that page currently
resides in the main memory.
Virtual memory: Address translation
Control bits in the page table:
 Describe the status of the page while it is in the main memory.

Validity of the page (1 bit): Whether the page is loaded in the main memory.

Modified bit (1 bit): Indicates whether the page has been modified during its
residency in the memory.

Determine whether the page should be written back to the disk before it is
removed from the main memory.

Other control bits to indicate restrictions on page access:

A program may be given full read and write permission.

It may be restricted to read accesses only.
Segmentation:
 In paging, the process is divided into pages of fixed sizes.
 In segmentation, the process is divided into modules for better visualization.

Each segment or module consists of the same type of functions.
 Example:

The main function is included in one segment.

Library function is kept in other segments, and so on.
 The size of segments may vary, memory is divided into variable size parts.

Segment Number: It tells the specific segment of the process from


which the CPU wants to read the data.

Segment Offset: It tells the exact word in that segment which the CPU wants to read.

Physical Address = base address of the segment + the segment offset.

https://afteracademy.com/blog/what-are-paging-and-segmentation
Shared/Distributed Memory:
 A multiprocessor system consists of a number of processors capable of simultaneously executing independent tasks.
 A task may encompass:
 A few instructions for one pass through a loop.
 Thousands of instructions executed in a subroutine.
 In a shared-memory multiprocessor, all processors have access to the same memory.
 Tasks running in different processors can access shared variables in the memory using the same addresses.
 The size of the shared memory is likely to be large.
 Implementing a large memory in a single module would create a bottleneck.
 Many processors make requests to access the memory simultaneously.
 Problem is alleviated by distributing the memory across multiple modules.
 Simultaneous requests from different processors are more likely to
 access different memory modules.
Shared/Distributed Memory:
 An interconnection network enables any processor to access any module that is a part of
the shared memory.
 Memory modules are kept physically separate from the processors.

All requests to access memory must pass through the network.

Introduces latency.
 Uniform Memory Access (UMA) multiprocessor:

The same network latency for all accesses from the processors to the memory
modules.

The latency is uniform, but large for a network that connects many processors and
memory modules.
 For better performance, it is desirable to place a memory module close to each processor.

A collection of nodes, each consisting of a processor and a memory module.

The nodes are then connected to the network.
 The network latency is avoided when a processor makes a request to access its local
memory.
 A request to access a remote memory module must pass through the network.
 Non-Uniform Memory Access (NUMA) multiprocessors:

Difference in latencies for accessing local and remote portions of the shared memory.
Cache coherency in multiprocessor:
 Each variable in a program has a unique address location in the memory, which can be accessed by any processor.
 Each processor has its own cache in a shared-memory multiprocessor.

Copies of shared data may reside in several caches.
 Any processor can write to a shared variable in its own cache.

All other caches that contain a copy of that variable will then have the old, incorrect value.

They must be updated or invalidated.

Cache coherence: Requires having a consistent view of shared data in multiple caches.
Write-Through Protocol:
 Can be implemented in two ways.
 Version 1: Updating the values in other caches.

A processor writes a new value to a block of data in its cache.

The new value is also written into the memory module containing the block being modified.

Broadcast the written data to the caches of all processors in the system.

Each processor receives the broadcast data, updates the contents of the affected cache block if this block is present in its cache.
 Version 2: Invalidating the copies in other caches.

A processor writes a new value into its cache.

This value is also sent to the appropriate location in memory. All copies in other caches are invalidated.

Broadcasting is used to send the invalidation requests.
Cache coherency in multiprocessor:
Write-Back protocol:
 Coherence is based on the concept of ownership of a block of data in the memory.
 The memory is the owner of all blocks initially:

Blocks are read by a processor into its cache.
 A processor wants to write to a block in its cache:

First become the exclusive owner of that block.

All copies in other caches must first invalidated with a broadcast request.

New owner of the block may then modify the contents without having to take any other action.
 When another processor wishes to read a block that has been modified:

The request for the block forwarded to the current owner.

The data are then sent to the requesting processor by the current owner.

The data are also sent to the appropriate memory module.

It reacquires ownership and updates the contents of the block in the memory.

The cache of the processor that was the previous owner retains a copy of the block.

The block is now shared with copies in two caches and the memory.

Subsequent requests from other processors to read the same block are serviced by the memory module containing the block.
Cache coherency in multiprocessor:
Write-Back protocol:
 When another processor wishes to write to a block that has been modified:

The current owner sends the data to the requesting processor.

Current owner transfers ownership of the block to the requesting processor, invalidates its cached copy.

The block is being modified by the new owner.

The contents of the block in the memory are not updated.

The next request for the same block is serviced by the new owner.
Secondary storage devices:
 The semiconductor memories: The cost per bit of stored information is limitation.
 Economic realization of storage: Magnetic and optical disks.

Referred to as secondary storage devices.
Magnetic Hard Disks:
 One or more disk platters mounted on a common spindle.
 A thin magnetic film is deposited on each platter, both sides.
 Drive rotate at a constant speed.
 Read/write heads: Moves close to disk.
 Manchester encoding used for data storage (Fig.8.27(c)).

Low bit-storage density.

Large space required to accommodate two changes in magnetization.
 Winchester technology:

The disks and the read/write heads are placed in a sealed, air-filtered enclosure.

The read/write heads can operate closer to the magnetized track surfaces.

More density of data due to close distance.

Larger capacity.
Secondary storage devices:
 Surface is divided into concentric tracks.
 Each track is divided into sectors.
 The set of corresponding tracks on all surfaces of a stack of disks forms a logical cylinder.
 All tracks of a cylinder can be accessed without moving the read/write heads.
 Data are accessed by specifying the surface number, the track number, and the sector number.

Access Time:
 The disk receiving an address ==> T ==> The beginning of the actual data transfer.
 Access time T = seek time + rotational delay/latency time.

Seek time: The time required to move the read/write head to the proper track.

Latency time: The time taken to reach the addressed sector after the read/write head
is positioned over the correct track.

The time for half a rotation of the disk.
Secondary storage devices: Optical disk
 Developed in the mid-1980s by the Sony and Philips for digital representation for analog sound signals.
 Laser light can be focused on a very small spot.
 A coherent light beam that is sharply focused on the surface of the disk.
 Coherent light consists of synchronized waves that have the same wavelength.
 If a coherent light beam is combined with another beam of the same kind.

The two beams are in phase, the result is a brighter beam.
 If the waves of the two beams are 180 degrees out of phase:

They cancel each other, the result is darker beam.
 Indented parts are called pits.
 The unindented parts are called lands.
 The detector sees the reflected beam as a bright spot:

When the light reflects solely from a pit, or from a land ==> 0s.
 The beam moves over the edge between a pit and the adjacent land:

Pit is one quarter of a wavelength closer to the laser source.

The reflected beams from the pit and the adjacent land will be 180 deg out of phase.

Cancelling each other, dark spot ==> represent 1s.

You might also like