I/O Systems
Indian Institute of Information Technology Kottayam
Why is I/O Important?
• Apart from computation, we have some other serious concerns:
• Crash of the storage system
• Loss of information
• Users are more interested in response time than CPU time (CPU time does
not include I/O performance).
• CPU performance: 50% to 100% improvement per year.
• I/O system performance is limited by mechanical delays (e.g., disk
drivers) < 10% improvement per year (MB/sec).
• The overall performance of a computer system will not improve
greatly because of the I/O bottleneck.
General/Realistic I/O System
• A computer system
• CPU, including cache(s) CPU
• Memory (DRAM)
• I/O peripherals: disks, input devices, “System” (memory-I/O) bus
displays, network cards, ...
• With built-in or separate I/O (or DMA) DMA DMA I/O ctrl
controllers
• All connected by a system bus Main kbd
Disk display NIC
Memory
I/O: Control + Data Transfer
• I/O devices have ports (i.e., I/O interfaces)
• Control: sends commands to the device and monitors status of the devices
• Control signal: It determines the function that the device will perform, such as
send data to the I/O controller (INPUT or READ), accept data from the I/O
controller (OUTPUT or WRITE)
• Status signal: It indicate the state of the device. Examples are READY/NOT-
READY to show whether the device is ready for data transfer.
• Data
• Labor-intensive part
• “Interesting” I/O devices do data transfers (to/from memory)
• Display: video memory → monitor
• Disk: memory disk
• Network interface: memory network
Operating System (OS) Plays a Big Role
• I/O interface is typically under OS control
• User applications access I/O devices indirectly (e.g., SYSCALL)
• Device drivers are “programs” that OS uses to manage devices
• Virtualization:
• Physical devices shared among multiple programs
• Direct access could lead to conflicts
• Synchronization
• Most have asynchronous interfaces, require unbounded waiting
• OS handles asynchrony internally, presents synchronous interface
• Standardization
• Devices of a certain type (disks) can/will have different interfaces
• OS handles differences (via drivers), presents uniform interface
I/O Device Characteristics
• Primary characteristic Device Partner I? O? Data Rate (KB/s)
Keyboard Human Input 0.01
• Data rate (i.e., bandwidth)
Mouse Human Input 0.02
• Contributing factors Speaker Human Output 0.60
• Partner: humans have slower Printer Human Output 200
output data rates than machines Display Human Output 240,000
• Input or output or both Modem Machine I/O 7
(input/output) Ethernet card Machine I/O ~1,000,000
Disk Machine I/O ~10,000
The System Bus
• System bus: connects system
components together CPU
• Important: insufficient
“System” (memory-I/O) bus
bandwidth can bottleneck
entire system
DMA DMA I/O ctrl
• Performance factors
• Physical length Main kbd display NIC
• Number and type of Memory Disk
connected devices (taps)
CPU Mem Three Buses
• Processor-memory bus
Proc-Mem • Connects CPU and memory, no direct I/O interface
+ Short, few taps → fast, high-bandwidth
adapter
– System specific
I/O • I/O bus
• Connects I/O devices, no direct P-M interface
I/O I/O – Longer, more taps → slower, lower-bandwidth
+ Industry standard
CPU Mem • Connect P-M bus to I/O bus using adapter
• Backplane bus
Backplane • CPU, memory, I/O connected to same bus
+ Industry standard, cheap (no adapters needed)
I/O I/O – Processor-memory performance compromised
Bus Design
data bus
address bus
control bus
• Goals
• High Performance: low latency and high bandwidth
• Standardization: flexibility in dealing with many devices
• Low Cost
• Processor-memory bus emphasizes performance, then cost
• I/O & backplane emphasize standardization, then performance
• Design issues
1. Width/multiplexing: are wires shared or separate?
2. Clocking: is bus clocked or not?
3. Switching: how/when is bus control acquired and released?
4. Arbitration: how do we decide who gets the bus next?
Standard Bus Examples
PCI/PCIe SCSI USB
Type Backplane I/O I/O
Width 32–64 bits 8–32 bits 1
Multiplexed? Yes Yes Yes
Clocking 33 (66) MHz 5 (10) MHz Asynchronous
Data rate 133 (266) MB/s 10 (20) MB/s 0.2, 1.5, 60 MB/s
Arbitration Distributed Distributed Daisy-chain
Maximum masters 1024 7–31 127
Maximum length 0.5 m 2.5 m –
• USB (universal serial bus) PCI: Peripheral Component Interconnect
• Popular for low/moderate bandwidth external peripherals PCIe: PCI Express
+ Packetized interface (like TCP), extremely flexible SCSI: Small Computer System Interface
+ Also supplies power to the peripheral
I/O Control and Interfaces
• Now that we know some concepts of I/O devices and buses
• How does I/O actually happen?
• How does CPU give commands to I/O devices?
• How do I/O devices execute data transfers?
• How does CPU know when I/O devices are done?
Sending Commands to I/O Devices
• Remember: only OS can do this! Two options:
• I/O instructions
• OS only? Instructions must be privileged (only OS can execute)
• These I/O instructions can specify both the device number and the command word
(or the location of the command word in memory).
• E.g., IA-32 (deprecated)
• Memory-mapped I/O
• Portion of physical address space reserved for I/O
• OS maps physical addresses to I/O device control registers
• Stores/loads to these addresses are commands to I/O devices
• Main memory ignores them, I/O devices recognize and respond
• Address specifies both I/O device and command
• OS only? I/O physical addresses only mapped in OS address space
Querying I/O Device Status
• Now that we have sent command to I/O device
• How do we query I/O device status?
• So that we know if data we asked for is ready?
• So that we know if device is ready to receive next command?
• Polling:
• Processor queries I/O device status register
• Loops until it gets status it wants (ready for next command)
• Or tries again a little later
+ Simple
– Waste of processor’s time
• Processor much faster than I/O device
Interrupt-Driven I/O
• Interrupts: alternative to polling
• I/O device generates interrupt when status changes, data ready
• OS handles interrupts just like exceptions (e.g., page faults)
• Identity of interrupting I/O device is recorded in ECR
• ECR: exception cause register
• I/O interrupts are asynchronous
• Not associated with any one instruction
• Don’t need to be handled immediately
• I/O interrupts are prioritized
• Synchronous interrupts (e.g., page faults) have highest priority
• High-bandwidth I/O devices have higher priority than low-bandwidth ones
• The process:
• The CPU issues the command to the I/O controller
• The CPU then continues with what it was doing
• The I/O controller, issues the command to the I/O device and waits for the I/O to
complete
• Upon completion of one byte/word input or output, the I/O controller sends an interrupt
signal to the CPU
• The CPU finishes what it was doing, then handles the interrupt
• This will involve moving the resulting datum to its proper location on input
• Once done with the interrupt, the CPU resumes execution of the program
Direct Memory Access (DMA)
• Interrupts remove overhead of polling
• But still requires OS to transfer data one byte/word at a time
• OK for low bandwidth I/O devices: mice, microphones, etc.
• Bad for high bandwidth I/O devices: disks, monitors, etc.
• Direct Memory Access (DMA) process:
• Transfer data between I/O device and memory without processor control
• Processor sends the starting address, the number of data, and the direction of
transfer to DMA controller.
• Transfers entire blocks (e.g., pages, video frames) at a time
• Can use bus “burst” transfer mode if available
• DMA controller sends an interrupt signal to the processor when done (or if error
occurs)
DMA Controllers
• To do DMA, I/O device attached to DMA
CPU
controller
• Multiple devices can be connected to one
DMA controller Bus
• Controller itself seen as a memory mapped
I/O device DMA DMA I/O ctrl
• Processor initializes start memory address, transfer
size, etc.
• DMA controller takes care of bus arbitration Main
Disk display NIC
and transfer details Memory
• So that’s why buses support arbitration and
multiple masters!
I/O Processors
• A DMA controller is a very simple CPU
component
• Some I/O requires complicated Bus
sequences of transfers
DMA DMA IOP
• I/O processor: heavier DMA controller that
executes instructions
• Can be programmed to do complex transfers Main display
Disk NIC
• E.g., programmable network card Memory
DMA and Memory Hierarchy
• DMA is good, but is not without challenges
• Without DMA: processor initiates all data transfers
• All transfers go through address translation
• Transfers can be of any size and cross virtual page boundaries
• All values seen by cache hierarchy
• Caches never contain stale data
• With DMA: DMA controllers initiate data transfers
•
DMA Register
DMA controller registers are accessed by the processor to
31 30 1 0
initiate transfer operations.
Status and
• Two registers are used for storing the starting address and
control
the word count.
• The third register contains status and control flags. IRQ Done
• The R/W bit determines the direction of the transfer. When IE R/W
this bit is set to 1 by a program instruction, the controller
performs a read operation, that is, it transfers data from the
memory to the I/O device. Otherwise, it performs a write Starting Address
operation. address Register
• When the controller has completed transferring a block of
data and is ready to receive another command, it sets the Word Word Count
Done flag to 1. count Register
• Bit 30 is the Interrupt-enable flag, IE. When this flag is set
to 1, it causes the controller to raise an interrupt after it has
completed transferring a block of data. Registers in a DMA interface
• Finally, the controller sets the IRQ bit to 1 when it has
requested an interrupt.
RAID
• Hard disk storage is critically important
because of our reliance on the hard disk
to store virtually everything
• However, the hard disk is the most likely
component to fail because of its frequent
usage and its high speed
• If we want reliable access, we need to use a
degree of redundancy by adding additional RAID 0 offers no redundancy, but improves disk
disks to our drive unit access
• This is the idea behind RAID
• Redundant array of independent disks (or)
Here, files are broken into strips and distributed
• Redundant array of inexpensive disks
across disk surfaces (known as disk spanning) so
• There are 7 forms (or levels) of RAID, each that access to a single file can be done in parallel
one different, and each with its own disk accesses
strengths and weaknesses
RAID 1 creates a mirror for every disk giving 100% redundancy at
twice the cost, 2 reads can be done in parallel, one from each mirror,
but writes require saving to both sets of disk
RAID 1 may be too expensive, so an alternative is to just store parity
information
• RAID 2 strips each byte into 1 bit per disk and uses additional disks
to store Hamming codes for redundancy
• Hamming code information is time consuming to generate, so
RAID 2 is not used, instead RAID 3 uses the same idea but only
stores parity information for redundancy
RAID 3 is most suitable for
small computer systems that
require some but not total
redundancy