Pci Express
Pci Express
OUTLINE
• PCI Express overview
• PCI architecture
➢ PCI Express link
➢ bus topology
➢ architecture layers
➢ transactions
➢ interrupts
• Introduced as "Third Generation I/O" (3GIO), PCI Express (PCIe) superseded both PCI and PCI-
X, and new motherboards may come with a mix of PCI and PCIe slots or only PCIe.
PCIe is a Switched Architecture - Multiple Lanes rather
than the shared bus structure of PCI
https://www.yourdictionary.com/pci-express#computer
4
Year created 2004
Created by Intel · Dell · HP · IBM
Supersedes AGP · PCI · PCI-X
Width in bits 1–32
One device each on each endpoint of each
connection.
Number of devices PCI Express switches can create multiple endpoints
out of one endpoint to allow sharing one endpoint
with multiple devices.
Per lane (each direction):
•v1.x: 250 MB/s (2.5 GT/s)
•v2.x: 500 MB/s (5 GT/s)
•v3.0: 985 MB/s (8 GT/s)
•v4.0: 1969 MB/s (16 GT/s)
Capacity
So, a 16-lane slot (each direction):
•v1.x: 4 GB/s (40 GT/s)
•v2.x: 8 GB/s (80 GT/s)
•v3.0: 15.75 GB/s (128 GT/s)
•v4.0: 31.51 GB/s (256 GT/s)
Style Serial
Yes, if Express Card, Mobile PCI Express Module or
Hotplugging interface
XQD card
Yes, with PCI Express External Cabling, such as 5
External interface
Thunderbolt
PCI express overview
• PCI Express architecture is a high performance, I/O interconnect for peripherals in
computing communication platforms.
• Evolved from PCI and PCI-X architectures and uses the same communication model
as the PCI and PCI-X buses.
• The same address spaces are retained: memory, I/O, and configuration.
• PCI and PCI-X generations shared parallel buses, the PCIe bus uses a serial point-to-
point interconnect for communication between two peripheral devices.
• PCIe implements packet-based protocol for information transfer
• Scalable performance based on number of signal lanes* implemented on the PCI
Express interconnect (dual simplex)
• The PCIe bus allows the same types of transactions as the previous buses: memory
read/write, I/O read/write, and configuration read/write.
• The compatibility is maintained with existing OS and software drivers, which do not
require changes.
* The
transmit and receive pair together are called a lane. The initial speed of 2.5 Gb/s provides a
6
nominal bandwidth of about 250 MB/s in each direction per PCI Express lane.
PCI Express Features
• Point-to-point connection
• Serial bus means fewer pins
• Scalable: x1, x2, x4, x8, x12, x16, x32
• Dual Simplex connection
• 2.5GT/s transfer/direction/s
• Packet based transaction protocol
https://www.mindshare.com/files/ebooks/PCI%20Express%20System%20Architecture.pdf 7
PCI Express Features
• The interface is serial, which enables to reduce the pin count and to simplify the
interconnections
• It unifies the I/O architecture for different types of systems and embedded systems
• It enables to interconnect IC on the motherboard and expansion cards via connectors
or cables
• The communication is based on packets with high transfer rate and efficiency
• The bus is scalable, by ability to implement a particular interconnection via several
communication lanes
• The software model is compatible with the classical PCI architecture, which allows to
configure PCIe devices, to use existing software drivers, without the need for changes
• It provides a differentiated quality of service (QoS) through the ability to allocate
dedicated resources for certain data flows, to configure the QoS arbitration policies
for each component, and to use isochronous transfers for real-time applications
• It provides an advanced power management through the ability to identify power
management capabilities of each peripheral device
• It ensures link-level data integrity for all types of transactions.
• It supports advanced error reporting and handling to improve fault isolation and error
recovery
• It supports hot-plugging and hot-swapping of peripheral devices
PCI Express Topology
• PCIe system is comprised of PCIe links that interconnect a set of components
• An example of topology referred to as a hierarchy – composed of:
- a Root Complex ,
- multiple Endpoints (I/O devices),
- a Switch
- a PCI Express to PCI/PCI-X Bridge, all interconnected via PCI Express Links
9
Root Complex (RC)
● Root Complex (RC) – is the device that connects one or more processors and the memory
subsystem to the I/O devices.
● RC device represents the root of an I/O hierarchy
● Similar to a host bridge in a PCI system : - RC generates transaction requests on behalf of the
processor, which is interconnected through a local bus.
- RC may support one or more PCI Express Ports – Root Ports.
Function2
Function1
• PCIe devices may have up to 8 logical functions and each endpoint is
assigned a device identifier (ID), which consists of a bus number, device
number, and function number.
• The link and PCIe functionality shared by all functions is managed through
Function 0
• All functions use a single Bus Number captured through the PCI enumeration
process
Configuration Space
• Devices will allocate resource such as
memory and record the address into
this configuration space
Enumeration
● The process by which configuration software discovers the system topology and
assigns bus numbers and system resources.
● RC or Host sends Configuration Packets to assign unique Bus, Device and Function
numbers to the End Points connected.
● On x86 PCIe hierarchy enumeration done by BIOS on hardware initialization state –
all registers are configured before bootloader.
● System software can reassign enumeration according to enumeration rules.
IO Hub
The Intel Quick Path Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced
the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms. It increased the scalability and bandwidth
available. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI).
"Uncore" is a term used by Intel to describe the functions of a microprocessor that are not in the core, but which must
be closely connected to the core to achieve high performance. It has been called "system agent" since the release of
the Sandy Bridge microarchitecture The core contains the components of the processor involved in executing
instructions, including the ALU, FPU, L1 and L2 cache. Uncore functions include QPI controllers, L3 cache, snoop
agent pipeline, on-die memory controller, and Thunderbolt controller.
PCIe Architecture Layers
PCIe system may be structured into five logical layers:
• The configuration/OS layer manages the configuration of PCIe devices by the OS
based on the Plug-and-Play specifications for initializing, enumerating, and
configuring I/O devices.
• The software layer interacts with the
OS through the same drivers as the
conventional PCI bus.
• The transaction layer manages the
transmission and reception of
information using a packet-based
protocol.
• The data link layer ensures the integrity
of data transfers via error detection
using a Cyclic Redundancy Check(CRC).
• The physical layer performs packet
transmission over the PCIe serial links.
• PCIe specification defines the architecture of PCIe
devices in terms of three logical layers
• The PCIe bus uses packets for transferring information between pairs of devices
connected via a PCIe connection
• Packets are formed in the transaction layer based on information obtained from
the device core and application and stored in a buffer
• The data link layer extends the packet with additional information required for
error detection at a receiver device
• The packet is then encoded in the physical layer and transmitted through
differential signals over the PCIe link
22
PCI Express transaction layer packet (TLP) types
23
Methods for Data Routing
• Each request or completion header is tagged as to its type and each of the packet
types are routed based on one of three schemes.
28
Bus Mastering (DMA)
• Until PCIe there was something intrusive in telling the CPU to withdraw from the
bus during DMA
• On PCIe, it is much easier for any device to send read / write TLPs to the bus, just
like Root Complex. This allows the device to directly access the processor memory
(DMA) or exchange packets with other peripherals on a peer-to-peer basis (as long
as switching entities accept this).
There are two things that need to happen first, as with any PCI device:
1. The device must receive bus control by setting the "Bus Master Enable" bit in one of
the standard configuration registers.
2. The driver software must inform the device about the physical address of the
relevant buffer, most likely by writing in a mapped Base Address Register
(configuration space).
DMA Transaction
30
Peer-to-Peer Transaction
31
PCI Express Device Layers
32
Interrupt Model: Three Methods
35
Physical Layer Function
37
PCI Express Error Handling
• All PCI Express devices are required to support some combination of:
# Existing software written for generic PCI error handling, and
which takes advantage of the fact that PCI Express has mapped many of
its error conditions to existing PCI error handling mechanisms.
# Additional PCI Express-specific reporting mechanisms
• Errors are classified as correctable and uncorrectable.
• Uncorrectable errors are further divided into:
# Fatal uncorrectable errors
# Non-fatal uncorrectable errors.
38
Correctable Errors
39
Uncorrectable Errors
• Errors classified as uncorrectable impair the functionality of the interface
and there is no specification mechanism to correct these errors
• The two subgroups are fatal and non-fatal
1. Fatal Uncorrectable Errors: Errors which render the link unreliable
– First-level strategy for recovery may involve a link reset by the system
– Handling of fatal errors is platform-specific
2. Non-Fatal Uncorrectable Errors: Uncorrectable errors associated with a
particular transaction, while the link itself is reliable
– Software may limit recovery strategy to the device(s) involved
– Transactions between other devices are not affected
40
Evolutia PCIe
42
43
Summary
44
Ivy Bridge systems example
References
https://indico.cern.ch/event/121654/attachments/68430/98164/Practical_introduction_to_PCI_Express_with_FPGAs
_-_Extended.pdf
Budruk, R., Anderson, D., Shanley, T., PCI Express System Architecture, MindShare Inc., Addison-Wesley Developer’s
Press, 2008, https://www.mindshare.com/files/ ebooks/PCI%20Express%20System%20Architecture.pdf
Ajanovic, J., “PCI Express (PCIe) 3.0 Accelerator Features”, Intel Corporation, 2008,
http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf.
PCI-SIG, “PCI Express Base Specification Revision 3.0”, November 10, 2010.
https://webcourse.cs.technion.ac.il/236376/Spring2017/ho/WCFiles/chipset_microarch.pdf
References
https://indico.cern.ch/event/121654/attachments/68430/98164/Practical_introduction_to_PCI_Express_with_FPGAs
_-_Extended.pdf
Budruk, R., Anderson, D., Shanley, T., PCI Express System Architecture, MindShare Inc., Addison-Wesley Developer’s
Press, 2008, https://www.mindshare.com/files/ ebooks/PCI%20Express%20System%20Architecture.pdf
Ajanovic, J., “PCI Express (PCIe) 3.0 Accelerator Features”, Intel Corporation, 2008,
http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf.
PCI-SIG, “PCI Express Base Specification Revision 3.0”, November 10, 2010.
https://webcourse.cs.technion.ac.il/236376/Spring2017/ho/WCFiles/chipset_microarch.pdf
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2
http://hardwareverification.weebly.com/pci---express-introduction.html
47