Peripheral Component
Interconnect LOCAL BUS
Geetanjali Gadre
Hardware Technology Development Group,
C-DAC, Pune, India
DVLSI 2013
Centre for Development of Advanced Computing
Motivation
Transfer speeds over I/O bus had become a bottleneck, especially for
graphics. Moving peripheral functions with high bandwidth
requirements closer to the systems processor bus can eliminate this
bottleneck.
Bandwidth hungry peripherals are increasing
Graphics and Gaming!! (1024x768, 32-bit colour@30fps)
Network (Gigabit Ethernet)
Storage devices
Multimedia devices
The buses before PCI were mainly architecture specific. e.g.
ISA/EISA(x86), SBUS (sparc), Different add-on cards were required
for different systems. An open standard and processor independence
were required.
Plug and Play capabilities, bus mastering were required. Auto
detection and identification
DVLSI 2013
Centre for Development of Advanced Computing
The PCI Local Bus standard
The PCI Local Bus is a high performance, 32-bit or 64-bit bus
with multiplexed address and data lines.
The bus is intended for use as an interconnect mechanism
between highly integrated peripheral controller components,
peripheral add-in boards, and processor/memory systems.
The PCI Local Bus Specification, Rev. 2.1 includes the protocol,
electrical, mechanical, and configuration specification for PCI
Local Bus components and expansion boards.
PCI system architecture and software model are unchanged for
later generations of IO buses (PCI-X and PCI-express)
DVLSI 2013
Centre for Development of Advanced Computing
PCI Applications
Transparent 64bit / 66MHz extension
DVLSI 2013
Centre for Development of Advanced Computing
PCI-SIG
PCI Specifications developed by Intel
Version 1.0 was released in 1992
PCI Specifications are now managed by a
consortium of industry partners
These are known as PCI Special Interest
Group PCI-SIG
Specifications are available for purchase
from PCI-SIG
DVLSI 2013
Centre for Development of Advanced Computing
Bus Specification and release
dates
DVLSI 2013
Bus Type
Specification Release
Date
PCI 33 MHz
2.0
1993
PCI 66 MHz
2.1
1995
PCI-X 66 MHz and 133
MHz
1.0
1999
PCI-X 266 MHz and
533 MHz
2.0
Q1,2002
PCI Express
1.0
Q2,2002
Centre for Development of Advanced Computing
PCI Bus features
High Performance
Transparent upgrade from
32-bit data path at 33 MHz (132 MB/s peak)
to 64-bit data path at 33 MHz (264 MB/s peak)
and from
32-bit data path at 66 MHz (264 MB/s peak)
to 64-bit data path at 66 MHz (528 MB/s peak)
Capable of full concurrency with processor/memory
subsystem
Synchronous bus with operation up to 33 MHz or 66 MHz
Hidden (overlapped) central arbitration.
DVLSI 2013
Centre for Development of Advanced Computing
PCI Bus features
Low Cost
Optimized for direct silicon (component) interconnection;
i.e., no glue logic. Electrical/driver (i.e., total load) and
frequency specifications are met with standard ASIC
technologies and other typical processes.
Multiplexed architecture reduces pin count (47 signals for
target; 49 for master) and package size of PCI
components, or provides for additional functions to be
built into a particular package size.
Single PCI add-in card works in different systems with
minimal change to existing chassis designs reducing
inventory cost and end user confusion.
DVLSI 2013
Centre for Development of Advanced Computing
PCI Bus features
Benefits
Ease of use - full auto configuration
Longevity - Processor independence; 64 bit; 5V/3.3V
Interoperability - forward and backward compatible for 64/32 bit
and 33/66 MHz boards and components
Support for multi-master and peer to peer transfers
Data Integrity - Parity on both data and address.
DVLSI 2013
Centre for Development of Advanced Computing
PCI Bus Based Platform
DVLSI 2013
Centre for Development of Advanced Computing
Basic Transfer Control
All PCI data transfers are controlled with three signals
FRAME# is driven by the master to indicate the beginning
and end of a transaction.
IRDY# is driven by the master to indicate that it is ready
to transfer data.
TRDY# is driven by the target to indicate that it is ready to
transfer data.
Data transfer takes place whenever IRDY# and TRDY# are
asserted.
DVLSI 2013
Centre for Development of Advanced Computing
Basic Read
DVLSI 2013
Centre for Development of Advanced Computing
Basic Write
DVLSI 2013
Centre for Development of Advanced Computing
PCI Transaction Model
DVLSI 2013
Centre for Development of Advanced Computing
PCI Transaction Model(PIO)
Programmed I/O (PIO)
Transaction initiated by the CPU and targets a
peripheral device
North Bridge arbitrates,wins ownership of the PCI
Bus
Generates a PCI memory or I/O read/write bus cycles
CPU reads/writes data from/to target device
DVLSI 2013
Centre for Development of Advanced Computing
PCI Transaction Model(DMA)
Direct Memory Access (DMA)
PCI device becomes a master
Arbitrates for bus, wins ownership of
bus
Initiates a PCI memory bus cycle
More efficient method
CPU not involved in data movement
DVLSI 2013
Centre for Development of Advanced Computing
Arbitration
In order to minimize access latency, the PCI arbitration
approach is access-based rather than time slot based.
That is, a bus master must arbitrate for each access it
performs on the bus.
PCI uses a central arbitration scheme, where each master
agent has a unique request (REQ#) and grant (GNT#)
signal. A simple request-grant handshake is used to gain
access to the bus.
Arbitration is "hidden," which means it occurs during the
previous access so that no PCI bus cycles are consumed
due to arbitration, except when the bus is in an Idle state.
The arbiter is required to implement a fairness algorithm to
avoid deadlocks.
DVLSI 2013
Centre for Development of Advanced Computing
Basic Arbitration
DVLSI 2013
Centre for Development of Advanced Computing
# indicates active low
DVLSI 2013
Centre for Development of Advanced Computing
Interrupts are level sensitive and
asynchronous. Int B, C and D
only for multifunction devices
DVLSI 2013
Centre for Development of Advanced Computing
Signal Type definition
in: Input is a standard input-only signal.
out: Totem Pole Output is a standard active driver.
t/s: Tri-State is a bi-directional, tri-state input/output pin.
s/t/s: Sustained Tri-State is an active low tri-state signal
owned and driven by one and only one agent at a time. The
agent that drives an s/t/s pin low must drive it high for at
least one clock before letting it float. A pull-up is required
to sustain the inactive state until another agent drives it,
and must be provided by the central resource.
o/d : Open Drain allows multiple devices to share as a
wire-OR.
DVLSI 2013
Centre for Development of Advanced Computing
System pins
CLK (in) Provides timing for all signals
involved in PCI transactions except RST#,
INTA#, INTB#, INTC#, and INTD#.
RST# (in) Brings all PCI devices to a
consistent state. When reset is active, all
output signals on PCI should be tri-stated.
AD, C/BE# and PAR may be driven only
low to avoid floating of these signals.
REQ64# may be deasserted.
DVLSI 2013
Centre for Development of Advanced Computing
Address and data pins
AD[63:32] and AD[31:00] (t/s) Multiplexed
address and data pins.
C/BE[7:4]# and C/BE[3:0]# (t/s) Command
during address phase and Byte Enables during data
phase.
PAR(t/s) Even Parity over AD and C/BE in both
address and data phases. Parity is valid one clock
after each address phase. For data, it is stable and
valid one clock after either IRDY# is asserted on a
write transaction or TRDY# is asserted on a read
transaction.
DVLSI 2013
Centre for Development of Advanced Computing
Interface control pins
FRAME# (s/t/s) - Cycle Frame is driven by the current
master to indicate the beginning and duration of an access.
When deasserted, the transaction is in final data phase or
has completed.
IRDY# (s/t/s) - Initiator Ready. During a write, IRDY#
indicates that valid data is present on AD[31::00]. During
a read, it indicates the master is prepared to accept data.
TRDY# (s/t/s) - Target Ready. During a read, TRDY#
indicates that valid data is present on AD[31::00]. During
a write, it indicates the target is prepared to accept data.
STOP# (s/t/s) - Stop indicates the current target is
requesting the master to stop the current transaction.
DVLSI 2013
Centre for Development of Advanced Computing
Interface control pins
LOCK# (s/t/s) It indicates atomic operations to lock
current target.
E.g. Reading, modifying semaphores
IDSEL (in) - Initialization Device Select is used as a chip
select during configuration read and write transactions.
DEVSEL# (s/t/s) - Device Select, when actively driven,
indicates the driving device has decoded its address as the
target of the current access. As an input, indicates whether
any device on the bus has been selected.
DVLSI 2013
Centre for Development of Advanced Computing
Arbitration pins (Bus Masters only)
REQ# (t/s) Request to the arbiter to use the
bus. REQ# must be tri-stated while RST# is
asserted.
GNT# (t/s) Indicates access to an agent has
been granted. GNT# must be ignored while
RST# is asserted.
DVLSI 2013
Centre for Development of Advanced Computing
Error reporting pins
PERR# (s/t/s) Reports data parity errors on all
PCI transactions except special cycle. Reporting
of this error may be implemented through settings
in configuration space.
SERR# (o/d) - Reports address parity errors, data
parity errors on special cycle command or any
other system error leading to catastrophic results.
DVLSI 2013
Centre for Development of Advanced Computing
PCI commands
DVLSI 2013
C/BE(3:0)#
Command Type
0010
I/O Read
0011
I/O Write
0110
Memory Read
0111
Memory Write
1010
Configuration Read
1011
Configuration Write
1100
Memory Read Multiple
1110
Memory Read line
1111
Memory Write and Invalidate
Centre for Development of Advanced Computing
64 bit bus extension
The 64-bit bus provides additional data bandwidth for
agents that require it.
The high 32-bit extension for 64-bit devices needs an
additional 39 signal pins:
REQ64#
(works along with frame)
ACK64# (works along with DEVSEL)
AD[63::32]
C/BE[7::4]#
PAR64
At the end of reset, the central resource controls the state of REQ64#
to inform the 64-bit device that it is connected to a 64-bit bus.
DVLSI 2013
Centre for Development of Advanced Computing
64-bit Bus Transaction
DVLSI 2013
Centre for Development of Advanced Computing
66 MHz extension
M66EN pin indicates whether the card is capable of handling
66 MHz. 66MHZ_CAPABLE flag located in bit 5 of the PCI
Status register also indicates this.
DVLSI 2013
Centre for Development of Advanced Computing
Different address spaces in PCI devices
Memory space:
232 or 4GB (32-bit addressing)
264 (64-bit addressing).
I/O space:
232 or 4GB (32-bit addressing)
216 or 64KB (most systems support not more than this range)
Configuration space:
256 bytes (16 dwords + 48 dwords)
64 bytes predefined header
192 bytes user defined space
Only relevant registers are to be implemented in each part.
DVLSI 2013
Centre for Development of Advanced Computing
PCI protocol Timings
The basic bus transfer mechanism on PCI is a burst. A burst is
composed of an address phase and one or more data phases.
PCI supports both memory and I/O address spaces.
All signals are sampled on the rising edge of the clock. Each
signal has a setup and hold aperture with respect to the rising
clock edge, in which transitions are not allowed.
Clock to output
11 ns max
Setup time
7 ns min
Hold time
0
Active to float
28 ns max
Float to active
2 ns max
DVLSI 2013
Centre for Development of Advanced Computing
Target Retry
DVLSI 2013
Centre for Development of Advanced Computing
Target Disconnect
Refers to termination requested with or after data was
transferred on the initial data phase because the target is unable
to respond within the target subsequent latency requirement,
and, therefore, is temporarily unable to continue bursting.
Disconnect with data may be signaled on any data phase by
asserting TRDY# and STOP# together. This termination is
used when the target is only willing to complete the current
data phase and no more.
Disconnect without data may be signaled on any subsequent
data phase (meaning data was transferred on the previous data
phase) by de-asserting TRDY# and asserting STOP#.
DVLSI 2013
Centre for Development of Advanced Computing
Target Abort
Indicates the target requires the transaction to be stopped
and does not want the master to repeat the request again.
To signal Target-Abort, TRDY# must be deasserted when
DEVSEL# is deasserted and STOP# is asserted. If any
data was transferred during the previous data phases of the
current transaction, it may have been corrupted.
DVLSI 2013
Centre for Development of Advanced Computing
Target Abort
DVLSI 2013
Centre for Development of Advanced Computing
Electrical
DVLSI 2013
Centre for Development of Advanced Computing
Pin out recommendations
DVLSI 2013
Centre for Development of Advanced Computing
Plug and Play (PnP) capabilities
PnP allows systems to have hardware and software work
together to automatically configure devices and assign
resources.
PnP specification was developed by Microsoft with cooperation from Intel and other manufacturers.
Four main components supporting PnP feature are
System Hardware
Peripheral hardware
System BIOS
Operating system
DVLSI 2013
Centre for Development of Advanced Computing
Booting PCI based PnP system
One of the major tasks of configuring PCI devices by PnP
BIOS.
Create resource table of available IRQs, DMA channels and
I/O addresses excluding system reserved addresses.
Search for PnP and non-PnP devices on PCI bus.
Load last known configuration ESCD (Extended system
configuration data) stored in NVRAM.
Compare this with current configuration and start with
resource table to make appropriate changes to resource
allocation.
Assign resources to PnP devices from remaining resources
and inform devices of their new assignments.
Update ESCD by saving new system configuration.
DVLSI 2013
Centre for Development of Advanced Computing
PCI Configuration
DVLSI 2013
Centre for Development of Advanced Computing
Need for configuration
PCI devices implement one or more PCI functions i.e.
logical devices (up to 8 max)
Each function requires some resource allocation.
e.g. IRQ, memory space
Configuration enables PCI devices to transfer data to and
from other devices efficiently using the information stored
in its configuration space.
A PnP hardware-software combination can facilitate auto
configuration of PCI devices saving the user time and
efforts in manual configuration of devices.
DVLSI 2013
Centre for Development of Advanced Computing
Need for configuration (contd)
The configuration space is required for
Device Identification
Device control/status
Base Address registers
The configuration space provides ease of use and
control over devices to the system software. The
system software can easily identify devices, learn
about address space requirements, allocate addresses
DVLSI 2013
Centre for Development of Advanced Computing
Accessing the configuration space
Before the devices are configured, they do not have unique
addresses. So a special scheme is required to access the
devices.
IDSEL line is used by the PCI controller for accessing
every device on the bus. (Note: IDSEL is not bussed)
The software can access the configuration space by using
BIOS calls, and providing the Device ID, Vendor ID, etc
as inputs.
The software reads the configuration space and determines
the addresses of devices and then these addresses are used
to access the devices.
DVLSI 2013
Centre for Development of Advanced Computing
Address space
DVLSI 2013
Centre for Development of Advanced Computing
Configuration cycle generation
DVLSI 2013
Centre for Development of Advanced Computing
DVLSI 2013
Centre for Development of Advanced Computing
Configuration space header
3
(0 to 5)
2.2
Capabilities
Pointer
DVLSI 2013
Centre for Development of Advanced Computing
2.2
Identification of device
Vendor ID (16-bit) Identifies the manufacturer.
Reserved value is FFFF h and must be returned by
host/PCI bridge when it attempts to read
configuration register from a non-existent device.
Device ID (16-bit) Identifies the device type
Subsystem Vendor ID and Subsystem (Device) ID (16bit) This is required to distinguish between cards or
subsystems manufactured by different vendors bust
designed around the same third-party core logic.
Revision ID (8-bit) Identifies revision number
DVLSI 2013
Centre for Development of Advanced Computing
Command register (16-bit)
DVLSI 2013
Command - System can control the device behavior. e.g.
Disable the device, bus mastering enable, etc
Centre for Development of Advanced Computing
Command register (Contd)
Bit Description Setting
0
0
IO Space
1
Memory
0
1
Space
1
2
Bus Master
Special
cycles
Memory
write and
invalidate
enable
DVLSI 2013
1
0
1
0
0
Function
Disables device for PCI IO accesses
PCI device responds to IO accesses
Disables device for PCI memory accesses
PCI device responds to memory accesses
Enables device to act as bus master. Configuration
software uses this bit to determine if device has bus master
capability or not
Disables the bus mastering capability
Enables device to monitor special cycles
Ignores special cycles
Device uses memory write commands
Device can generate memory write and invalidate
command. Bit 2 is checked if device is capable of bus
mastering. If so, system cache line size is to be stored in
Cache Line Size configuration register before setting this bit
Centre for Development of Advanced Computing
Command register (Contd)
Bit Description Setting
0
VGA palette
5
snoop
1
6
7
8
DVLSI 2013
Parity Error
response
Stepping
control
SERR#
enable
Fast Back-toBack Enable
0
1
0
1
0
1
0
1
Function
Disable this for non-VGA graphics devices.
Required for display devices to perform snooping of IO
writes to VGA's colour palette registers.
Device cannot assert PERR#, but still must set Detected
parity Error status bit is status register.
Device can assert PERR#
Address / data stepping is disabled
Address / data stepping is enabled
Device cannot assert SERR#
Device can assert SERR#
Disables the Fast Back-to-Back transactions.
Enables bus master to perform these transactions with
different targets in the first and second transaction. All
targets on the bus where master recides must be capable
of this transaction only then this bit can be set
Centre for Development of Advanced Computing
Status register (16-bit)
DVLSI 2013
Provided information about device to the system. e.g.
66 MHz capable, data parity error detected, received
master abort, etc.
Centre for Development of Advanced Computing
Status register (contd)
Bit Description Setting
4
5
6
7
DVLSI 2013
Capabilities
List
66MHz
capable
Reserved
Fast Back-toBack
capable
Master data
Parity Error
R
R
Function
To read the additional capabilities a device can have. The list
is kept at pointer given by capabilities pointer in configuration
space. If this bit is hardwired to '1', it indicates that this
register is implemented.
1 = Device is capable of running at 66 MHz
0 = Device is capable of running at 33 MHz
Prior to spec 2.2 this was UDF
1 = Device is capable fast back-to-back transactions
R
0 = Device is not capable of fast back-to-back transactions
R/W
Implemented by Masters only if following conditions
are met:
Implemented by Masters only if following conditions are met:
1> Reporting bus master was initiator of transaction.
1> Reporting bus master was initiator of transaction.
2> It set PERR# or detected it.
2>
set PERR#
detected it.bit in command
3>ItParity
ErrororResponse
configuration register is set.
Centre for Development of Advanced Computing
Status register (contd)
Bit Description Setting
Device
9-10
Select
R
timing
Signaled
11
R/W
Target abort
Received
12
R/W
Target abort
Function
00b = Fast; 01b = Medium; 10b = Slow; 11b = Reserved
Set by target device capable of terminating a transaction
with target abort
All masters implement and set this bit whenever
transaction is terminated by current target
This bit should be set by a master whenever its
transactions (except for a special cycle) are terminated
with a master abort
13
Received
Master abort
R/W
14
Signaled
System
Error
R/W
This bit is set if device generates system error onSERR#
line
15
Detected
Parity Error
R/W
This bit is set whenever it detects parity error ( even if error
reporting is disabled by Parity Error Response bit in its
command register
DVLSI 2013
Centre for Development of Advanced Computing
Class-Code register
Class Code (24-bit) Identifies the generic function of the device.
23
16 15
Class Code
Sub-Class Code
8 7
0
Prog. I/F
e.g. 010000h is a mass storage controller(01h) for
SCSI controller (00h)
DVLSI 2013
Centre for Development of Advanced Computing
DVLSI 2013
Centre for Development of Advanced Computing
Header Type register (8-bit)
Defines format for dwords (32-bits) from 10h to 3Ch.
Header type
Configuration header format
0 = single function device
1 = multi-function device
DVLSI 2013
Centre for Development of Advanced Computing
Cache Line size register (8-bit)
DVLSI 2013
Mandatory for masters that use memory write and
invalidate command.
Also mandatory for memory targets that support cache
line wrap addressing.
R/W and specifies system cache line size in dwords
increments.
e.g. value 08h is size of eight dwords, or 32 bytes
A device dictates limit on number of cache line sizes
that it supports. An unsupported value written by
configuration s/w is treated as zero being written.
If 0 only memory write commands will be supported.
Centre for Development of Advanced Computing
Latency timer register (8-bit)
DVLSI 2013
Mandatory for masters that perform burst transactions.
It defines minimum amount of time, in PCI clock
cycles, a Bus master can retain ownership of the bus.
Recommended for masters that perform a burst of more
than two data phases.
Timer value is decremented at each clock after
transaction is initiated.
Centre for Development of Advanced Computing
BIST register (8-bit)
This is optional register and may be implemented by
both master and target devices.
If not implemented read to this register should return
zeros.
4 3
Reserved
Start BIST
BIST capable
DVLSI 2013
Centre for Development of Advanced Computing
0
Completion
code
Base Address Registers (24-bytes)
Required for devices that implement memory
and/or IO decoders.
A device may be located in memory and/or IO
space.
It is recommended to use memory mapping
since IO space is much crowded and some
processors do not support IO transactions.
16 bytes is the smallest memory block a PCI
memory decoder can be designed for.
4 bytes is the smallest IO block a PCI IO
decoder can be designed for.
DVLSI 2013
Centre for Development of Advanced Computing
Base Address
The BIOS writes all 1s to this location and reads back. The result
indicates the memory requirements.
DVLSI 2013
Centre for Development of Advanced Computing
Address allocation
The devices will write their address space requirements
into base registers.
The BIOS will read the registers in the configuration space
during initial configuration.
It will then reserve the space for the device, in the physical
address map, and write back the addresses to the base
address registers. These addresses can then be read by the
device itself and also by system software for accessing the
devices.
DVLSI 2013
Centre for Development of Advanced Computing
Expansion ROM BAR (32-bit)
It allows devices to be used during boot process.
ROM contains device driver that can be loaded during boot
up.
During configuration process, configuration software
writes all 1s except at bit 0 to check if ROM is
implemented.
If a non-zero value is returned, device implements
expansion ROM, but may not be present in the slot.
At this stage, the access to this ROM is disabled.
Once memory space requirement is determined from read
value, and starting physical address is written back to this
register.
DVLSI 2013
Centre for Development of Advanced Computing
Expansion ROM BAR (contd)
All ROM code is transferred to main memory to have
faster access times.
ROM decoder on device is disabled after this.
Initialization portion of device driver is executed and
discarded from memory to free some memory.
At this stage, only code image of the device driver exists in
main memory.
DVLSI 2013
Centre for Development of Advanced Computing
Other configuration registers
CardBus CIS pointer Implemented by devices that share
silicon between PCI and Cardbus. It points to Card
Information Structure.
Interrupt pin Register Identifies interrupt line (INTx#)
that this device uses. If 00, no interrupt is used. Spec 2.2
can generate interrupts either by using the interrupt pins or
by using message signaled Interrupts.
Interrupt Line register Specifies routing information of
the interrupt pin used by interrupt pin register.
DVLSI 2013
Centre for Development of Advanced Computing
Other configuration registers
Min_Gnt register and Max_Lat resister Applicable only
to and optional to Bus masters. Helps in determining value
to be programmed into Bus masters Latency Timer.
These are information only register and used by
configuration software to determine:
The duration of a typical transfer when it does acquire
the bus.
How often a bus master typically requires access to PCI
bus
e.g. frequent small transactions, infrequent burst
transactions.
DVLSI 2013
Centre for Development of Advanced Computing