Unit 4
Unit 4
[email protected]
Unit 4: Memory organization
Basic concepts:
Programs and the data they operate on are held in the memory of the computer.
The execution speed of programs is highly dependent on the speed of instruction transfer between the processor and
the memory.
Sufficient memory to facilitate execution of large programs having large amounts of data.
Ideal case for memory:
Fast, large, inexpensive memory.
The maximum size of the memory in a computer is determined by the addressing scheme.
A computer with 16-bit address is capable of addressing up to 2 16 = 64K (kilo) memory locations.
32-bit address, 232 = 4G (giga) locations.
64-bit address, 264 = 16E (exa) ≈ 16 × 1018 locations.
The number of locations represents the size of the address space of the computer.
The memory is designed to store and retrieve data in word-length quantities.
Consider a byte-addressable computer, 32-bit address is sent from the processor to the memory unit.
32 bits determine which word will be accessed.
Basic concepts:
The address lines are used to specify the memory location involved in a data transfer operation.
The data lines to transfer the data.
The control lines carry the command indicating:
A Read or a Write operation.
Whether a byte or a word is to be transferred.
Necessary timing information by the memory to indicate when it has completed the requested operation.
The processor-memory interface asserts MFC based on memory’s response.
Memory access time: The time that elapses between the initiation of an operation to transfer a word of data and the
completion of that operation.
Memory cycle time: the minimum time delay required between the initiation of two successive memory operations.
Eg. the time between two successive Read operations.
Cycle time is slightly longer than the access time.
Random-access memory (RAM): Access time to any location is the same.
Access time of the following devices depends on the address or position of the data:
Serial, partly serial, magnetic and optical disks.
Basic concepts:
Cache and Virtual Memory:
The processor usually process instructions and data faster than they can be fetched from the main memory.
The memory access time is the bottleneck in the system.
Reduced using a cache memory.
A small, fast memory inserted between the larger, slower main memory and the processor.
Holds the currently active portions of a program and their data.
Virtual memory:
The active portions of a program are stored in the main memory.
The remainder is stored on the much larger secondary storage device.
Sections of the program are transferred back and forth between the main memory and the secondary storage
device, transparent to the application program.
The application program sees a memory that is much larger than the computer’s physical main memory.
Block Transfers:
Data move frequently between: 1. The main memory and the cache, 2. The main memory and the disk.
Data are always transferred in contiguous blocks involving tens, hundreds, or thousands of words.
Semiconductor RAM memories:
Internal Organization of Memory Chips:
Usually organized in the form of an array, each cell is capable of storing one bit of information.
Each row of cells constitutes a memory word in Fig. 8.2.
All cells of a row are connected to a common line referred to as the word line (driven by address decoder).
Each column are connected to a Sense/Write circuit by two bit lines.
Sense/write circuit connected to the data input/output lines of the chip.
Fig. 8.2 gives memory circuit of 16 words of 8-bits each (16 × 8 organization).
The R /W’ (Read /Write’) input specifies the required operation.
The CS (Chip Select) input selects a chip.
The memory circuit:
Stores 128 bits.
Address lines: 4, Data lines: 8, Control lines: 2 (total = 14 lines).
Two lines for power supply and ground.
Semiconductor RAM memories:
Internal Organization of Memory Chips:
1K (1024) memory cells can be organized as:
128 × 8 format:
Address lines: 7, Control lines: 2, Data lines: 8 (total 17 + 2 power/ground lines).
1K × 1 format:
Address lines: 10, Control lines: 2, Data lines: 1 (total 13 + 2 power/ground lines).
10 bit address lines divided into 5-bit row/column address lines.
Row address selects a row of 32 cells. One of these cells is selected by column address.
A 1G-bit chip may have:
256M × 4 format:
Address lines: 28, Control lines: 2, Data lines: 4.
Semiconductor RAM memories:
Static Memories:
Memories capable of retaining their state as long as power is applied.
Fig. 8.4 shows a static RAM (SRAM) cell.
Two inverters are cross-connected to form a latch.
Word line is at ground level, T1/T2 are turned off and the latch retains its state.
Example: The logic value at point X = 1, at Y=0. This state represents 1.
State is maintained as long as word line is at ground level.
Read Operation:
If QC > Qth, then the sense amplifier drives the bit line to the full voltage representing the logic value 1.
The capacitor is recharged to the full charge corresponding to the logic value 1.
If QC < Qth, the sense amplifier pulls the bit line to ground level to discharge the capacitor fully.
Reading the contents of a cell automatically refreshes its contents.
The word line is common to all cells in a row.
All cells in a selected row are read and refreshed at the same time.
Semiconductor RAM memories:
Dynamic RAMs:
A 256-Megabit DRAM chip, configured as 32M × 8, is shown.
The cells are organized in the form of a 16K × 16K array.
16,384 cells in each row are divided into 2,048 groups of 8, forming 2,048 bytes of data.
To select a row: 14 address bits (=log2(16,384)), To select a group: 11 address bits (=log2(2,048)).
Total 25 address bits needed to access a byte.
The row and column addresses are multiplexed on 14 pins, to reduce number of pins.
Read or a Write operation:
The row address is applied first. Loaded into the row address latch in response to control line called the Row Address Strobe
(RAS).
Read operation to be initiated. All cells in the selected row are read and refreshed.
After the row address is loaded:
The column address is applied to the address pins.
Loaded into the column address latch, based on the Column Address Strobe (CAS).
The information in this latch is decoded.
The appropriate group of 8 Sense/Write circuits is selected.
Semiconductor RAM memories:
Dynamic RAMs:
If the R/W control signal indicates a Read operation:
The output values of the selected circuits are transferred to the data lines, D7−0.
Write operation:
The information on the D7−0 lines is transferred to the selected circuits, used to overwrite the contents of the selected cells in the
corresponding 8 columns.
After CAS is active, only 8 bits are placed on the data lines, D7−0, selected by A10−0.
The other bytes in the same row can be accessed without having to reselect the row.
Each sense amplifier also acts as a latch.
When a row address is applied, the contents of all cells in the selected row are loaded into the corresponding latches.
It is only necessary to apply different column addresses to place the different bytes on the data lines.
All bytes in the selected row can be transferred in sequential order.
Apply a sequence of column addresses under the control of successive CAS signals.
A block of data can be transferred at a much faster rate (unlike random addresses).
The block transfer capability is referred to as the fast page mode feature.
The vast majority of main memory transactions involve block transfers.
The faster rate of the fast page mode makes dynamic RAMs suitable.
Semiconductor RAM memories:
Synchronous DRAMs:
DRAMs whose operation is synchronized with clock signal.
The cell array is the same as in asynchronous DRAMs.
The use of a clock signal distinguishes it from DRAMs.
SDRAMs have built-in refresh circuitry with a refresh counter.
Provides the addresses of the rows to be selected for refreshing.
The address and data connections of an SDRAM are buffered by means of registers.
The Sense/Write amplifiers function as latches.
A Read operation causes the contents of all cells in the selected row to be loaded into these latches.
The data in the latches of the selected column are transferred into the data register.
The buffer registers are useful when transferring large blocks of data at very high speed.
Isolates external connections from the chip’s internal circuitry.
Possible to start a new access operation while data are being transferred.
SDRAMs have several different modes of operation.
Can be selected by writing control information into a mode register.
Semiconductor RAM memories:
Synchronous DRAMs:
CAS line to select successive columns:
Generated internally using a column counter and the clock signal.
New data are placed on the data lines at the rising edge of each clock pulse.
1. The row address is latched under control of the RAS signal.
2. Memory takes some clock cycles to activate selected rows.
3. The column address is latched under control of the CAS signal.
4. The first set of data bits is placed on the data lines. (with one clock cycle delay)
5. SDRAM automatically increments the column address to access the next three sets of bits in the selected row.
Double-Data-Rate SDRAM:
Take advantage of the fact that a large number of bits are accessed at the same time inside the chip when a row address is
applied.
Data are transferred externally on both the rising and falling edges of the clock.
Memories that use this technique are called double-data-rate SDRAMs (DDR SDRAMs).
DDR2 and DDR3: operate at clock frequencies of 400 and 800 Mhz, respectively.
They transfer data with clock speeds of 800 and 1600 MHz, respectively.
Read-Only memories:
Static and dynamic RAM chips are volatile.
Some applications require memory devices that retain the stored information when power is turned off.
A special writing process is needed to place the information into a nonvolatile memory.
A memory is called a read-only memory, or ROM when information can be written into it only once at the time of manufacture.
ROM cell in Fig. 8.11:
A logic value 0 is stored in the cell if the transistor is connected to ground at point P.
Otherwise, a 1 is stored.
The bit line is connected through a resistor to the power supply.
To read the state of the cell, the word line is activated to close the transistor switch.
The voltage on the bit line:
Near zero if there is a connection between the transistor and ground.
High if there is no connection to ground.
A sense circuit at the end of the bit line generates the proper output value.
The connection to ground in each cell is determined when the chip is manufactured.
Using a mask with a pattern that represents the information to be stored.
Read-Only memories:
Programmable ROM (PROM):
Programmability is achieved by inserting a fuse at point P.
Before programmed, the memory contains all 0s.
The user can insert 1s at the required locations by burning out the fuses.
Burning process is irreversible.
The cost of preparing the masks needed for storing a particular information pattern makes ROMs costeffective only in large
volumes.
PROMs provides a more convenient and considerably less expensive approach
Can be programmed by the user directly.
Erasable PROM (EPROM):
It allows the stored data to be erased and new data to be written into it.
Erasable, reprogrammable ROM.
EPROMs are capable of retaining stored information for a long time.
While software is being developed, memory changes and updates can be easily made.
The transistor T is normally turned off, creating an open switch.
T can be turned on by injecting charge into it that becomes trapped inside.
Erasure requires dissipating the charge trapped in the transistors that form the memory cells.
Exposing the chip to ultraviolet light erases the memory.
Read-Only memories:
Electrically Erasable PROM (EEPROM):
An EPROM must be physically removed from the circuit for reprogramming.
The stored information cannot be erased selectively.
The entire contents of the chip are erased when exposed to ultraviolet light.
EEPROM can be programmed, erased, and reprogrammed electrically.
No need to remove chip for erasure, selective erasure possible.
Disadvantage of EEPROM:
Different voltages are needed for erasing, writing, and reading the stored data.
This increases circuit complexity.
Read-Only memories:
Flash memory:
A flash cell is based on a single transistor controlled by trapped charge.
Like an EEPROM, it is possible to read the contents of a single cell.
It is only possible to write an entire block of cells.
Prior to writing, the previous contents of the block are erased.
Flash devices have greater density, which leads to higher capacity and a lower cost per bit.
They require a single power supply voltage, and consume less power in their operation.
Used in: hand-held computers, cell phones, digital cameras, and MP3 music players.
Flash Cards:
Mount flash chips on a small card.
Flash cards with a USB interface are widely used and are commonly known as memory keys.
Flash Drives:
Larger flash memory modules have been developed to replace hard disk drives.
Designed to fully emulate hard disks.
The storage capacity of flash drives is significantly lower.
Flash drives are solid state electronic devices with no moving parts.
Shorter access times, insensitive to vibration, lower power consumption.
Memory hierarchy:
An ideal memory would be fast, large, and inexpensive.
A very fast memory can be implemented using static RAM chips.
Not suitable for implementing large memories (basic cells are larger).
Consumes more power.
Cost effective solution for voluminous data: Magnetic disks.
Secondary storage, available at a reasonable cost.
They are used extensively in computer systems.
Much slower than semiconductor memory units.
Dynamic RAM:
Main memory is built with this.
Large and considerably faster, yet affordable.
Static RAM:
Used in smaller units where speed is of the essence.
Cache memories are built with this.
Memory hierarchy:
The fastest access is to data held in processor registers.
At the top in terms of speed of access.
The registers provide only a minuscule portion of the required memory.
Processor cache - implemented directly on the processor chip.
Holds copies of the instructions and data from Main memory.
A small primary cache (L1 cache) is always located on the processor chip.
Access time comparable to that of processor registers.
Secondary (L2) cache is placed between the L1 cache and the rest of the memory.
Larger and slower.
Also placed on the processor chip.
Some computers have L3 cache.
Larger than L1, L2.
Also implemented with SRAM technology.
May or may not be on same chip as L1 and L2.
Memory hierarchy:
Main memory:
A large memory implemented using dynamic memory components.
Assembled in memory modules such as DIMMs.
Much larger but significantly slower than cache memories.
Access time of main memory = 100xL1 cache access time.
Disk devices:
Disk devices provide a very large amount of inexpensive memory.
Widely used as secondary storage in computer systems.
Very slow compared to the main memory.
The memory appear almost five times faster than it really is.
The improvement factor increases as the speed of the cache increases relative to the main memory.
Improving cache performance: Hit rate and miss penalty
Example:
Assume hit rate for data and instruction = 100%.
Estimate of the increase in memory access time due to cache miss:
A 100% hit rate in the cache would make the memory appear twice as fast as when realistic hit rates are used.
Ways to improve hit rate:
Make the cache larger - entails increased cost.
Increase the cache block size while keeping the total cache size constant (due to spatial locality).
Practical block sizes in the range of 16 to 128 bytes.
Ways to reduce miss penalty:
Use the load-through approach.
Improving cache performance: Caches on the processor chip
Information is transferred between different chips:
Considerable delays occur in driver and receiver gates on the chips.
It is best to implement the cache on the processor chip.
Most processors include one L1 cache.
Two separate L1 caches: one for data, one for instruction.
High-performance processors: Two levels of caches are used.
Separate L1 caches for instructions and data.
Must be very fast, determine the memory access time seen by the processor.
A larger L2 cache.
Can be slower, but should be larger than the L1 caches to ensure a high hit rate.
Speed is less critical because it only affects the miss penalty of the L1 caches.
L1 caches: Tens of kilobytes.
L2 caches: Hundreds of kilobytes or megabytes.
Improving cache performance: Caches on the processor chip
L2 cache reduces the impact of the main memory speed on the performance of a computer.
The average access time of the L2 cache is the miss penalty of either of the L1 caches.
The average access time experienced by the processor:
https://afteracademy.com/blog/what-is-fragmentation-and-what-are-its-types
Paging:
Paging is a non-contiguous memory allocation technique.
Secondary memory and the main memory is divided into equal size partitions.
The partitions of the secondary memory are called pages.
The partitions of the main memory are called frames.
They are divided into equal size partitions to have maximum utilization of the main memory and avoid external fragmentation.
Example:
A process P having process size as 4B, page size as 1B.
There are four pages (say, P0, P1, P2, P3) each of size 1B.
When this process goes into the main memory for execution, depending upon the availability, it may be stored in non-contiguous
fashion in the main memory frame.
https://afteracademy.com/blog/what-are-paging-and-segmentation
Virtual memory: Address translation
Virtual-memory address-translation for fixed-length pages is shown.
Virtual address generated by the processor is interpreted as:
Virtual page number (high-order bits) = A.
Offset (low-order bits) - Specifies the location of a particular byte (or word) within
a page.
Page table: Contains info about the main memory location of each page.
The main memory address where the page is stored.
The current status of the page.
Page frame: Area in the main memory that can hold one page.
Page table base register: The starting address of the page table = B.
Address of the corresponding entry in the page table C = A + B.
The contents of [C] give the starting address of the page if that page currently
resides in the main memory.
Virtual memory: Address translation
Control bits in the page table:
Describe the status of the page while it is in the main memory.
Validity of the page (1 bit): Whether the page is loaded in the main memory.
Modified bit (1 bit): Indicates whether the page has been modified during its
residency in the memory.
Determine whether the page should be written back to the disk before it is
removed from the main memory.
Other control bits to indicate restrictions on page access:
A program may be given full read and write permission.
It may be restricted to read accesses only.
Segmentation:
In paging, the process is divided into pages of fixed sizes.
In segmentation, the process is divided into modules for better visualization.
Each segment or module consists of the same type of functions.
Example:
The main function is included in one segment.
Library function is kept in other segments, and so on.
The size of segments may vary, memory is divided into variable size parts.
Segment Offset: It tells the exact word in that segment which the CPU wants to read.
https://afteracademy.com/blog/what-are-paging-and-segmentation
Shared/Distributed Memory:
A multiprocessor system consists of a number of processors capable of simultaneously executing independent tasks.
A task may encompass:
A few instructions for one pass through a loop.
Thousands of instructions executed in a subroutine.
In a shared-memory multiprocessor, all processors have access to the same memory.
Tasks running in different processors can access shared variables in the memory using the same addresses.
The size of the shared memory is likely to be large.
Implementing a large memory in a single module would create a bottleneck.
Many processors make requests to access the memory simultaneously.
Problem is alleviated by distributing the memory across multiple modules.
Simultaneous requests from different processors are more likely to
access different memory modules.
Shared/Distributed Memory:
An interconnection network enables any processor to access any module that is a part of
the shared memory.
Memory modules are kept physically separate from the processors.
All requests to access memory must pass through the network.
Introduces latency.
Uniform Memory Access (UMA) multiprocessor:
The same network latency for all accesses from the processors to the memory
modules.
The latency is uniform, but large for a network that connects many processors and
memory modules.
For better performance, it is desirable to place a memory module close to each processor.
A collection of nodes, each consisting of a processor and a memory module.
The nodes are then connected to the network.
The network latency is avoided when a processor makes a request to access its local
memory.
A request to access a remote memory module must pass through the network.
Non-Uniform Memory Access (NUMA) multiprocessors:
Difference in latencies for accessing local and remote portions of the shared memory.
Cache coherency in multiprocessor:
Each variable in a program has a unique address location in the memory, which can be accessed by any processor.
Each processor has its own cache in a shared-memory multiprocessor.
Copies of shared data may reside in several caches.
Any processor can write to a shared variable in its own cache.
All other caches that contain a copy of that variable will then have the old, incorrect value.
They must be updated or invalidated.
Cache coherence: Requires having a consistent view of shared data in multiple caches.
Write-Through Protocol:
Can be implemented in two ways.
Version 1: Updating the values in other caches.
A processor writes a new value to a block of data in its cache.
The new value is also written into the memory module containing the block being modified.
Broadcast the written data to the caches of all processors in the system.
Each processor receives the broadcast data, updates the contents of the affected cache block if this block is present in its cache.
Version 2: Invalidating the copies in other caches.
A processor writes a new value into its cache.
This value is also sent to the appropriate location in memory. All copies in other caches are invalidated.
Broadcasting is used to send the invalidation requests.
Cache coherency in multiprocessor:
Write-Back protocol:
Coherence is based on the concept of ownership of a block of data in the memory.
The memory is the owner of all blocks initially:
Blocks are read by a processor into its cache.
A processor wants to write to a block in its cache:
First become the exclusive owner of that block.
All copies in other caches must first invalidated with a broadcast request.
New owner of the block may then modify the contents without having to take any other action.
When another processor wishes to read a block that has been modified:
The request for the block forwarded to the current owner.
The data are then sent to the requesting processor by the current owner.
The data are also sent to the appropriate memory module.
It reacquires ownership and updates the contents of the block in the memory.
The cache of the processor that was the previous owner retains a copy of the block.
The block is now shared with copies in two caches and the memory.
Subsequent requests from other processors to read the same block are serviced by the memory module containing the block.
Cache coherency in multiprocessor:
Write-Back protocol:
When another processor wishes to write to a block that has been modified:
The current owner sends the data to the requesting processor.
Current owner transfers ownership of the block to the requesting processor, invalidates its cached copy.
The block is being modified by the new owner.
The contents of the block in the memory are not updated.
The next request for the same block is serviced by the new owner.
Secondary storage devices:
The semiconductor memories: The cost per bit of stored information is limitation.
Economic realization of storage: Magnetic and optical disks.
●
Referred to as secondary storage devices.
Magnetic Hard Disks:
One or more disk platters mounted on a common spindle.
A thin magnetic film is deposited on each platter, both sides.
Drive rotate at a constant speed.
Read/write heads: Moves close to disk.
Manchester encoding used for data storage (Fig.8.27(c)).
●
Low bit-storage density.
●
Large space required to accommodate two changes in magnetization.
Winchester technology:
●
The disks and the read/write heads are placed in a sealed, air-filtered enclosure.
●
The read/write heads can operate closer to the magnetized track surfaces.
●
More density of data due to close distance.
●
Larger capacity.
Secondary storage devices:
Surface is divided into concentric tracks.
Each track is divided into sectors.
The set of corresponding tracks on all surfaces of a stack of disks forms a logical cylinder.
All tracks of a cylinder can be accessed without moving the read/write heads.
Data are accessed by specifying the surface number, the track number, and the sector number.
Access Time:
The disk receiving an address ==> T ==> The beginning of the actual data transfer.
Access time T = seek time + rotational delay/latency time.
●
Seek time: The time required to move the read/write head to the proper track.
●
Latency time: The time taken to reach the addressed sector after the read/write head
is positioned over the correct track.
●
The time for half a rotation of the disk.
Secondary storage devices: Optical disk
Developed in the mid-1980s by the Sony and Philips for digital representation for analog sound signals.
Laser light can be focused on a very small spot.
A coherent light beam that is sharply focused on the surface of the disk.
Coherent light consists of synchronized waves that have the same wavelength.
If a coherent light beam is combined with another beam of the same kind.
●
The two beams are in phase, the result is a brighter beam.
If the waves of the two beams are 180 degrees out of phase:
●
They cancel each other, the result is darker beam.
Indented parts are called pits.
The unindented parts are called lands.
The detector sees the reflected beam as a bright spot:
●
When the light reflects solely from a pit, or from a land ==> 0s.
The beam moves over the edge between a pit and the adjacent land:
●
Pit is one quarter of a wavelength closer to the laser source.
●
The reflected beams from the pit and the adjacent land will be 180 deg out of phase.
●
Cancelling each other, dark spot ==> represent 1s.