Liao X
Liao X
Sc degree in Internetworking
September 2007
Design and Implementation of PCI Multi-Bus Simulator
Acknowledgments
Design and implement a software system is not easy. Especially for a big software project, it will be the
result of the effort of a sizeable team of people. Although our graduation project (MINT 709) is not big, I
still get my team members’ help and support, also other senior technicians’ guide and help.
I’d espcially like to thank my advisor, Dr. Mike MacGregor. He helped me get this very good project, gave
me very nice and patient guide step by step and lent us books generously. On the other hand, I will thank
PhD. student, Qinghua Ye, who gives me lots of useful guide and help me correct many important
problems. Also, I will Thank my another two team members, Yile Sun and Ligang Wang, who help me
overcome lots of technical problems and give me many excellent idea to make this project go smoothly.
Every meeting, I can get lots of new development ideas which expand my idea and open me up to new
ideas and perspectives. Working with them, I can share their excellent development experience.
Finally, I will thank my wife, Qian Yang, who provides unstinting support for whatever I happen to be
doing and cheerfully accepts my complaints about the workload that I freely elected to undertake. She
always manages to be on hand whenever I need sustenance or sympathy, or both, and undoubtedly I
would never develop my project smoothly without her.
Xiaoyong Liao
I
Design and Implementation of PCI Multi-Bus Simulator
Abstract
PCI Bus is perhaps the most successful Bus design, both on the technical and the marketing levels. From
the earlist general PCI Bus to current powerful PCI-E Bus, we can see that PCI Bus has become the
most popular Bus system throughout the world.
Our project is consisted of three sections: source coding, GUI design and implementation, and testing.
Finally, we will give some analysis and statistics according to the data generated by the PCI Multi -Bus
simulator.
In the coding section, I firstly introduce the development history of PCI Bus and then I introduce two
important new PCI Bus, including PCI-X and PCI-E. Finally, I introduce our simulator’s design and
implementation. As for the development eviroment, I use Linux Red Hat OS, G++ compiler, and OOP
technology to do the coding work. On the other hand, we can use the generated data to plot by
GNUPLOT tool under Linux OS and get a good statistical result.
Till now, my coding work goes smoothly. Although there still exists some bugs not being found, I believe
that this simulator will become better and better gradually in the future.
II
Design and Implementation of PCI Multi-Bus Simulator
Table of Contents
1. Computer Bus………………………………………………………………………………..1
REFERENCE.…………………………………………………………………………………...34
APPENDIX……………………………………………………………………………………….34
III
Design and Implementation of PCI Multi-Bus Simulator
IV
Design and Implementation of PCI Multi-Bus Simulator
1. Computer Bus
It is widely recognized that the computer system bus affects the system characteristics in several
important ways:
(1) The bus bandwidth and transfer parameters place a limit on the system performance.
(2) The system bus is an interface that connects hardware components produced by different
vendors and provides interoperability.
(3) The wide variety of configuration options supported by increasingly complex and sophisticated
I/O devices make manual configuration a difficult and error-prone task. Support for software
based automatic configuration has become a necessity.
(4) When multiple processors share a bus with common resources, some form of support for
multi-processing is required to arbitrate the use of shared resources.
Even though memories are getting faster, CPUs get faster and quicker. Although the memory burst
speed can be increased by using interleaving, the initial latency cannot be reduced, and in fact
becomes the dominant factor in bus usage. This is just one of a number of parameters, other than
demand for raw bus bandwidth, that have changed in recent years and must be considered in
modern system bus design.
From historical perspective aspect till now, we can review all kind of computer busses such as ISA,
Micro Channel, EISA, NuBus, Future Bus+, VME64, PCI, PCI-X and PCI Express Bus. Today I
here introduce more detail about PCI, PCI-X and PCI Express Busses, as for other busses, I will
not introduce. Figure 1 shows us the PCI family history.
1
Design and Implementation of PCI Multi-Bus Simulator
high performance local bus architecture that offers low cost and allows differentiation. While the
primary focus is on enabling new price-performance points in today's systems, it is important that
a new standard also accommodates future system requirements and be applicable across multiple
platforms and architectures. Figure2 shows the multiple dimensions of the PCI Local Bus.
While the initial focus of local bus applications was on low to high end desktop systems, the PCI
Local Bus also comprehends the requirements from mobile applications up through servers. The
PCI Local Bus specifies 3.3 volt signaling requirements.
The PCI component and add-in card interface is processor independent, enabling an efficient
transition to future processor generations and use with multiple processor architectures. Processor
independence allows the PCI Local Bus to be optimized for I/O functions, enables concurrent
operation of the local bus with the processor/memory subsystem, and accommodates multiple high
performance peripherals in addition to graphics (motion video, LAN, SCSI, FDDI, hard disk
drives, etc.). Movement to enhanced video and multimedia displays (i.e., HDTV and 3D graphics)
and other high bandwidth I/O will continue to increase local bus bandwidth requirements. A
transparent 64-bit extension of the 32-bit data and address buses is defined, doubling the bus
bandwidth and offering forward and backward compatibility of 32-bit and 64-bit PCI Local Bus
peripherals. A forward and backward compatible PCI-X specification (see the PCI-X Addendum to
the PCI Local Bus Specification) is also defined, increasing the bandwidth capabilities of the 33
MHz definition by a factor of four
The PCI Local Bus standard offers additional benefits to the users of PCI based systems.
Configuration registers are specified for PCI components and add-in cards. A system with
embedded auto configuration software offers true ease-of-use for the system user by automatically
configuring PCI add-in cards at power on.
2
Design and Implementation of PCI Multi-Bus Simulator
Since its introduction, the PCI bus has gained wide support from all the computer industry. Almost
all PC systems today contain PCI slots, as well as the Apple and IBM Power-PC based machines,
and the Digital Alpha based machines. The PCI standard has become so popular it influenced the
creation of more than one related standard based on leveraging PCI technology.
The PCI Bus is designed to overcome many of the limitations in previous buses. The major
benefits of using PCI are:
(1) High speed
The PCI bus can transfer data at a peak rate of 132MBytes/sec using the current 32 bit data
path and a 33MHz clock speed. Future implementations featuring a 64 bit data path and
66MHz clock speed may transfer data as fast as 524MBytes/sec. Even at its basic mode, PCI
delivers more than tenfold the performance levels offered by its predecessor in the PC world,
the ISA bus.
(2) Expandability
The PCI bus can be expanded to a large number of slots using PCI to PCI bridges. The bridge
units connect separate small PCI buses to form a single, unified, hierarchical bus. When traffic
is local to each bus, more than one bus may be active concurrently. This allows load balancing,
while still allowing any PCI Master on any bus to access any PCI Target on any other PCI bus.
(6) Portability
By incorporating the (optional) OpenBoot standard, any device with OpenBoot Firmware can
boot systems containing any microprocessor and O/S. Even without OpenBoot, it is common
to see drivers for many video cards and SCSI controller for multiple CPU architectures and
operating systems.
3
Design and Implementation of PCI Multi-Bus Simulator
PCI is a local bus, sometimes also called an intermediate local bus, to distinguish it from the CPU
bus. The concept of the local bus solves the downward compatibility problem in an elegant way.
The system may incorporate an ISA, EISA, or Micro Channel bus, and adapters compatible with
these buses. On the other hand, high-performance adapters, such as graphics or network cards,
may plug directly into PCI. PCI also provides a standard and stable interface for peripheral chips.
By interfacing to the PCI, rather than to the CPU bus, peripheral chips remain useful as new
microprocessors are introduced. The PCI bus itself is linked to the CPU bus through a PCI to Host
Bridge.
Figure3 shows a typical PCI Local Bus system architecture. This example is not intended to imply
any specific architectural limits. In this example, the processor/cache/memory subsystem is
connected to PCI through a PCI bridge. This bridge provides a low latency path through which the
processor may directly access PCI devices mapped anywhere in the memory or I/O address spaces.
It also provides a high bandwidth path allowing PCI masters direct access to main memory. The
bridge may include optional functions such as arbitration and hot plugging. The amount of data
buffering a bridge includes is implementation specific.
4
Design and Implementation of PCI Multi-Bus Simulator
The basic PCI transfer is a burst. This means that all memory space and I/O space accesses occur
in burst mode; a single transfer is considered a “burst” terminated after a single data phase.
Addresses and data use the same 32-bit, multiplexed, address/data bus. The first clock is used to
transfer the address and bus command code. The next clock begins one or more data transfers,
during which either the master, or the target, may insert wait cycles.
PCI supports posting. A posted transaction completes on the originating bus before it completes on
the target bus. For example, the CPU may write data at high speed into a buffer in a CPU-to-PCI
bridge. In this case the CPU bus is the originating bus and PCI is the target bus. The bridge
transfers data to the target (PCI bus) as long as the buffer is not empty, and asserts a not ready
signal when the buffer becomes empty. In the other direction, a device may post data on the PCI
bus, to be buffered in the bridge, and transferred from there to the CPU via the CPU bus. If the
buffer becomes temporarily full, the bridge de-asserts the target ready signal.
In a read transaction, a turnaround cycle is required to avoid contention when the master stops
driving the address and the target to begin driving the data on the multiplexed address/data bus.
This is not necessary in a write transaction, when the master drives both the address and data lines.
A turnaround cycle is required, however, for all signals that may be driven by more than one PCI
unit. Also, an idle clock cycle is normally required between two transactions, but there are two
kinds of back-to-back transactions in which this idle cycle may be eliminated. In both cases the
first transaction must be a write, so that no turnaround cycle is needed, the master drives the data
at the end of the first transaction, and the address at the beginning of the second transaction. The
first kind of back-to-back occurs when the second transaction has the same target as the first one.
Every PCI target device must support this kind of back-to-back transaction. The second kind of
back-to-back occurs when the target of the second transaction is different than the target of the
first one, and the second target has the Fast Back-to-Back Capable bit in the status register set to
one, indicating that it supports this kind of back-to-back.
For arbitration, PCI provides a pair of request and grant signals for each PCI unit, and defines a
central arbiter whose task is to receive and grant requests, but leaves to the designer the choice of
a specific arbitration algorithm. PCI also supports bus parking, allowing a master to remain bus
owner as long as no other device requests the bus. The default master becomes bus owner when
the bus is idle. The arbiter can select any master to be the default owner.
5
Design and Implementation of PCI Multi-Bus Simulator
and whether the device can act as a PCI bus master. At power-up, device independent software
determines what devices are present, and how much address space each device requires. The boot
software then relocates PCI devices in the address space using a set of base address registers
Figure4 illustrates two typical applications for a bridge. The first application is the use of a bridge
to create a second PCI bus segment to which additional PCI connectors are added. This bus
segment is labeled in the figure as PCI Bus 1. In this example, the primary interface of bridge 1 is
connected to PCI bus 0 while its secondary interface is connected to PCI bus 1. The second
application example is the use of a bridge to create a PCI bus segment on an expansion board that
allows multiple PCI devices to reside on a single expansion board. In this example, the primary
interface of bridge 2 is connected to PCI bus 1 and its secondary interface is connected to PCI bus
2. Note that the number assigned to the bridge corresponds to the number of the bus segment
spawned by the bridge. In this example, the host bridge is considered to be bridge number 0 and
spawns PCI bus segment 0.
A bridge allows transactions between a master on one PCI interface and a target on the other
interface as illustrated in Figure5. The target interface on one bus is connected to the master
interface on the other bus. The blocks between the data path of the primary and secondary
interfaces provide any necessary transaction address and data buffering. The target block
6
Design and Implementation of PCI Multi-Bus Simulator
connected to the primary PCI interface must support PCI configuration space. The bridge basically
consists of four state machines¾two masters and two targets. Each of the master and target
interface state machines must adhere to the requirements of the PCI Local Bus Specification.
The PCI Local Bus Specification requires all devices, including a PCI-to-PCI bridge, to implement
a 256-byte configuration register address space. The first 64 bytes in each device’s PCI
Configuration Space must adhere to a standard configuration header format. The remaining 192
bytes of the Configuration Space may be used for additional capabilities as defined by the
Capabilities Pointer or for device-specific purposes.
The 64-byte header format for a bridge is defined in Figure6. The first 16 bytes of the bridge
header implement the common format for all devices as required by the PCI Local Bus
Specification. The next 48 bytes of the device’s Configuration Space are Header Type specific. A
Header Type value of 1 indicates that the device follows the bridge register layout, which is
defined in this specification. The following sections define the basic behavior of configuration
registers and how reset affects them. Then a brief review of the common registers is presented,
followed by a detailed specification of the bridge specific registers.
Presently, The PCI Bridge Architecture defines two kinds of PCI-to-PCI Bridge (P2PB). The first
one is transparent P2PB by which all devices work under synchronous clock, the other is
un-transparent P2PB by which all devices can work under different clock cycle. In our simulator,
we just simply simulate transparent P2PB situation.
7
Design and Implementation of PCI Multi-Bus Simulator
PCI-X was developed jointly by IBM, HP, and Compaq. PCI-X is a revision to the PCI standard
that doubles the clock speed (from 66 MHz to 133 MHz) and hence the amount of data exchanged
between the computer processor and peripherals. Standard PCI supports up to 64-bit at 66 MHz
(though anything above 32-bit at 33 MHz is only seen in high-end systems) and additional bus
standards move 32 bits at 66 MHz or 64 bits at 133 MHz. The theoretical maximum amount of
data exchanged between the processor and peripherals with PCI-X is 1.06 GB/s, compared to 532
MB/s with standard PCI. PCI-X is generally backward compatible with PCI, meaning that, a
PCI-X card can be installed in a PCI slot provided it has the correct voltage keying for the slot and
(if inserting in a 32 bit slot) nothing obstructs the overhanging part of the edge connector. PCI and
PCI-X cards can be intermixed on a PCI-X bus, but the speed will be limited to the speed of the
slowest card (for this reason and the voltage compatibility issue most systems with PCI-X will
have a normal PCI bus as well) . PCI-X improves the fault tolerance of PCI allowing, for example,
faulty cards to be reinitialized or taken offline. Figure8 shows us that PCI-X is more efficient
protocol than PCI
IBM, HP, and Compaq designed PCI-X for servers to increase performance for high bandwidth
devices such as Gigabit Ethernet, Fibre Channel and Ultra3 SCSI cards, and to allow processors to
be interconnected in clusters. Figure9 shows us networking is moving to 10 Gigabit technologies.
Compaq, IBM, and HP submitted PCI-X to the PCI Special Interest Group (Special Interest Group
of the Association for Computing Machinery) in 1998. PCI SIG approved PCI-X, and it is now an
open standard that can be adapted and used by all computer developers. PCI SIG controls
technical support, training and compliance testing for PCI-X. IBM, Intel, Microelectronics and
8
Design and Implementation of PCI Multi-Bus Simulator
Mylex plan to develop chipsets to support PCI-X. 3Com and Adaptec intended to develop PCI-X
peripherals. Figure10 shows us PCI-X 2.0 can provide enough bandwidth to support all
applications.
In 2003 PCI SIG ratified PCI-X 2.0 which adds 266 MHz and 533 MHz variants. These variants
give roughly 2.15 GB/s and 4.3 GB/s throughput, respectively. PCI-X 2.0 makes additional
9
Design and Implementation of PCI Multi-Bus Simulator
protocol revisions that are designed to help system reliability and add error correction ECC to the
bus to avoid resends. To deal with one of the most common complaints of the PCI-X form factor,
the 184 pin connector, 16-bit ports were developed to allow PCI-X to be used in devices with tight
space constraints. Similar to PCI-Express, PtoP functions were added to allow for devices on the
bus to talk to each other without burdening the CPU or bus controller.
Despite the various theoretical advantages of PCI-X 2.0 and its backward compatibility with
PCI-X and PCI devices, it has not been implemented on a large scale (as of 2006). This lack of
implementation is primarily because hardware vendors have chosen to integrate PCI-Express
instead. Currently, the PCI-X workgroup is developing PCI-X 1066 which will double PCI-X 533
bandwidths, maintain backward compatibility, and include new hardware/software extensions.
Figure11 shows the PCI-X roadmap.
Figure10. PCI-X 2.0 can provide enough bandwidth to support all applications
10
Design and Implementation of PCI Multi-Bus Simulator
PCI-X has a number of technological and economical disadvantages to PCI-Express. The 64-bit
parallel interface requires inherently difficult trace routing, because as with all parallel interfaces,
the signals from the bus must arrive simultaneously or within a very short window, and noise from
adjacent slots may cause interference. The serial interface of PCIe suffers fewer such problems
and therefore requires less complex and less expensive designs. PCI-X buses, like PCI, are
half-duplex bidirectional whereas PCIe buses are full-duplex bidirectional. PCI-X buses run only
as fast as the slowest device; PCIe devices are able to independently negotiate the bus speed
11
Design and Implementation of PCI Multi-Bus Simulator
the clock. The astute reader will notice that there are 4 times more combinations than characters
used. Some of these additional characters are either discarded due to an insufficient number of
edges within the 10 bit packet to extract the clock. Others are used to encode error commands.
Some may be used to provide "DC balancing" so that the wire doesn't acquire an electrical charge.
The remainder are simply not used. Therefore, each lane transmits 250 MB/s. The most number of
lanes supported is x32, so 250MB/s x 32 x 2 (bi-directionality) is 16GB/s for a theoretical
maximum transfer rate.
PCI-E uses fabric topology. A fabric is composed of point-to-point Links that interconnect a set of
components – an example fabric topology is shown in Figure12. This figure illustrates a single
fabric instance referred to as a hierarchy – composed of a Root Complex (RC), multiple Endpoints
(I/O devices), a Switch, and a PCI Express-PCI Bridge, all interconnected via PCI Express Links.
Each of the components of the topology is mapped in a single flat address space and can be
accessed using PCI-like load/store accesses transaction semantics.
12
Design and Implementation of PCI Multi-Bus Simulator
robin or weighted round robin when contention occurs on the same Virtual Channel. This is
described in more detail later within the specification.
(8)Endpoint devices (represented by Type 00h Configuration Space headers) may not appear to
configuration software on the switch's internal bus as peers of the virtual PCI-to-PCI Bridges
representing the Switch Downstream Ports.
13
Design and Implementation of PCI Multi-Bus Simulator
PCI Express uses packets to communicate information between components. Packets are formed
in the Transaction and Data Link Layers to carry the information from the transmitting component
to the receiving component. As the transmitted packets flow through the other layers, they are
extended with additional information necessary to handle packets at those layers. At the receiving
side the reverse process occurs and packets get transformed from their Physical Layer
representation to the Data Link Layer representation and finally (for Transaction Layer Packets) to
the form that can be processed by the Transaction Layer of the receiving device. Figure15 shows
the conceptual flow of transaction level packet information through the layers.
The PCI-E physical Layer interface is known by the acronym PIPE which stands for "Physical
Interface for PCI Express". At the electrical level, each lane utilizes two unidirectional Current
Mode Logic (CML) pairs at 2.5 Gbit/s. Transmit and receive are separate differential pairs, for a
total of 4 data wires per lane.
A connection between any two PCIe devices is known as a "link", and is built up from a collection
of 1 or more lanes. All devices must minimally support single-lane (x1) link. Devices may
optionally support wider links composed of 2, 4, 8, 12, 16, or 32 lanes. This allows for very good
compatibility in two ways: A PCIe card will physically fit (and work correctly) in any slot that is
at least as large as it is (e.g. an x1 sized card will work in any sized slot), and a slot of a large
physical size (e.g. x16) can be wired electrically with fewer lanes (e.g. x1 or x8) as long as it
provides the power and ground connections required by the larger physical slot size. In both cases,
PCIe will negotiate the highest mutually supported number of lanes. It is not possible to place a
physically larger PCIe card (e.g. a 16x sized card) into a smaller slot, even though the two would
be signal-compatible if it were possible.
PCIe sends all control messages, including interrupts, over the same links used for data. The serial
protocol can never be blocked, so latency is still comparable to PCI, which has dedicated interrupt
lines.Data transmitted on multiple-lane links is interleaved, meaning that each successive byte is
sent down successive lanes. The PCIe specification refers to this interleaving as "data striping."
While requiring significant hardware complexity to synchronize (or deskew) the incoming striped
data, striping can significantly increase the throughput of the link. Due to padding requirements,
striping may not necessarily reduce the latency of small data packets on a link.
14
Design and Implementation of PCI Multi-Bus Simulator
As with all high data rate serial transmission protocols, clocking information must be embedded in
the signal. At the physical level, PCI Express utilizes the very common 8B/10B encoding scheme
to ensure that strings of consecutive ones or consecutive zeros are limited in length. This is
necessary to prevent the receiver from losing track of where the bit edges are. In this coding
scheme every 8 (uncoded) payload bits of data are replaced with 10 (encoded) bits of transmit data,
consuming an extra 25% of the overall electrical bandwidth.
Many other protocols (such as SONET) use a different form of encoding known as "scrambling"
to embed clock information into data streams. The PCI Express specification also defines a
scrambling algorithm, but it is used to reduce EMI (Electromagnetic interference) by preventing
repeating data patterns in the transmitted data stream.
First-generation PCIe is constrained to a single signaling rate of 2.5 Gbit/s. The PCI Special
Interest Group (the industry organization that maintains and develops the various PCI standards)
plans future versions adding signaling rates of 5 and 10 Gbit/s.
The PCI-E Data Link Layer implements the sequencing of the Transaction Layer Packets (TLPs)
that are generated by the Transaction Layer, data protection via a 32-bit cyclic redundancy check
code (CRC, known in this context as LCRC) and an acknowledgment protocol (ACK and NAK
signaling). TLPs that pass an LCRC check and a sequence number check result in an
acknowledgment, or ACK, while those that fail these checks result in a negative acknowledgment,
or NAK. TLPs that result in a NAK, or timeouts that occur while waiting for an ACK, result in the
TLPs being replayed from a special buffer in the transmit data path of the Data Link Layer. This
guarantees delivery of TLPs in spite of electrical noise, barring any malfunction of the device or
transmission medium. ACK and NAK signals are communicated via a low-level packet known as
a data link layer packet, or DLLP. DLLPs are also used to communicate flow control information
between the transaction layers of two connected devices, as well as some power management
functions.
The PCI-E transaction layer implements split transactions (transactions with request and response
separated by time), allowing the link to carry other traffic while the target device gathers data for
the response.
PCI Express utilizes credit-based flow control. In this scheme, a device advertises an initial
amount of credit for each of the receive buffers in its Transaction Layer. The device at the opposite
end of the link, when sending transactions to this device, will count the number of credits
consumed by each TLP from its account. The sending device may only transmit a TLP when doing
so does not result in its consumed credit count exceeding its credit limit. When the receiving
device finishes processing the TLP from its buffer, it signals a return of credits to the sending
device, which then increases the credit limit by the restored amount. The credit counters are
modular counters, and the comparison of consumed credits to credit limit requires modular
arithmetic. The advantage of this scheme (compared to other methods such as wait states or
handshake-based transfer protocols) is that the latency of credit return does not affect performance,
provided that the credit limit is not encountered. This assumption is generally met if each device is
15
Design and Implementation of PCI Multi-Bus Simulator
First-generation PCIe is often quoted to support a data rate of 250 MB/s in each direction, per lane.
This figure is a calculation from the physical signaling rate (2.5 Gbaud) divided by the encoding
overhead (10bits/byte.) This means a 16 lane (x16) PCIe card would then be theoretically capable
of 250 * 16 = 4 GB/s in each direction. While this is correct in terms of data bytes, more
meaningful calculations will be based on the usable data payload rate, which depends on the
profile of the traffic, which is a function of the high-level (software) application and intermediate
protocol levels. Like other high data rate serial interconnect systems, PCIe has a protocol and
processing overhead due to the additional transfer robustness (CRC and Acknowledgments). Long
continuous unidirectional transfers (such as those typical in high-performance storage controllers)
can approach >95% of PCIe's raw (lane) data rate. These transfers also benefit the most from
increased number of lanes (x2, x4, etc.) But in more typical applications (such as a USB or
Ethernet controller), the traffic profile is characterized as short data packets with frequent enforced
acknowledgments [citation needed]. This type of traffic reduces the efficiency of the link, due to
overhead from packet parsing and forced interrupts (either in the device's host interface or the PC's
CPU.) This loss of efficiency is not particular to PCIe.
Most of modern computer systems are built up using a transaction-based I/O buses to transfer data
between devices and memory or directly from one device to another. During the period of
designing this PCI Multi-bus simulation system, it is rather import to check that PCI bus and
PCI-to-PCI Bridge (P2PB) are able to transfer data to and from the attached devices at the rate
required for correct operation of the device. This section describes a simulator for the evaluation
of the use of PCI bus and P2PB. This simulator is based on the PCI Local Bus Specification and
P2PB Architecture Specification.
The PCI bus is controlled by complex protocol which permits several different forms of
transaction as follows:
(1) Data Transfer transaction focuses on transferring data between PCI Bus devices or between
PCI Bus devices and main memory.
16
Design and Implementation of PCI Multi-Bus Simulator
Our simple simulator just deals with regular data transfer transactions on the single PCI bus or
multiple PCI bus by P2PB. It can simulate the operation of both the 32bit/64bit versions of PCI
bus at synchronous clock frequencies: 33MHz, 66MHz, or 133MHz. On the other hand, this
multi-Bus simulator can simply simulate 6~8 Buses’ transactions.
The timing diagrams in this section show the relationship of significant signals involved in 32-bit
transactions. When a signal is drawn as a solid line, it is actively being driven by the current
master or target. When a signal is drawn as a dashed line, no agent is actively driving it. However,
it may still be assumed to contain a stable value if the dashed line is at the high rail. Tri-stated
signals are indicated to have indeterminate values when the dashed line is between the two rails
(e.g., AD or C/BE# lines). When a solid line becomes a dotted line, it indicates the signal was
actively driven and now is tri-stated. When a solid line makes a low to high transition and then
becomes a dotted line, it indicates the signal was actively driven high to pre charge the bus and
then tri-stated.
Figure16 illustrates a read transaction and starts with an address phase which occurs when
FRAME# is asserted for the first time and occurs on clock 2. During the address phase, AD[31::00]
contain a valid address and C/BE[3::0]# contain a valid bus command.
17
Design and Implementation of PCI Multi-Bus Simulator
The first clock of the first data phase is clock 3. During the data phase, C/BE# indicate which byte
lanes are involved in the current data phase. A data phase may consist of wait cycles and a data
transfer. The C/BE# output buffers must remain enabled (for both read and writes) from the first
clock of the data phase through the end of the transaction. This ensures C/BE# are not left floating
for long intervals. The C/BE# lines contain valid byte enable information during the entire data
phase independent of the state of IRDY#. The C/BE# lines contain the byte enable information for
data phase N+1 on the clock following the completion of the data phase N. This is not shown in
Figure16 because a burst read transaction typically has all byte enables asserted; however, it is
shown in Figure17. Notice on clock 5 in Figure17, the master inserted a wait state by de-asserting
IRDY#. However, the byte enables for data phase 3 are valid on clock 5 and remain valid until the
data phase completes on clock 8.
The first data phase on a read transaction requires a turnaround-cycle (enforced by the target via
TRDY#). In this case, the address is valid on clock 2 and then the master stops driving AD. The
earliest the target can provide valid data is clock 4. The target must drive the AD lines following
the turnaround cycle when DEVSEL# is asserted. Once enabled, the output buffers must stay
enabled through the end of the transaction. (This ensures that the AD lines are not left floating for
long intervals.)
One way for a data phase to complete is when data is transferred, which occurs when both IRDY#
and TRDY# are asserted on the same rising clock edge. (TRDY# cannot be driven until
DEVSEL# is asserted.) When either IRDY# or TRDY# is de-asserted, a wait cycle is inserted and
no data is transferred. As noted in Figure16, data is successfully transferred on clocks 4, 6, and 8
and wait cycles are inserted on clocks 3, 5, and 7. The first data phase completes in the minimum
time for a read transaction. The second data phase is extended on clock 5 because TRDY# is
de-asserted. The last data phase is extended because IRDY# was de-asserted on clock 7.
The master knows at clock 7 that the next data phase is the last. However, because the master is
not ready to complete the last transfer (IRDY# is de-asserted on clock 7), FRAME# stays asserted.
Only when IRDY# is asserted can FRAME# be de-asserted as occurs on clock 8, indicating to the
target that this is the last data phase of the transaction.
Figure17 illustrates a write transaction. The transaction starts when FRAME# is asserted for the
first time which occurs on clock 2. A write transaction is similar to a read transaction except no
turnaround cycle is required following the address phase because the master provides both address
and data. Data phases work the same for both read and write transactions.
In Figure17, the first and second data phases complete with zero wait cycles. However, the third
data phase has three wait cycles inserted by the target. Notice both agents insert a wait cycle on
clock 5. IRDY# must be asserted when FRAME# is de-asserted indicating the last data phase.
The data transfer was delayed by the master on clock 5 because IRDY# was de -asserted. The last
data phase is signaled by the master on clock 6, but it does not complete until clock 8. Note:
18
Design and Implementation of PCI Multi-Bus Simulator
Although this allowed the master to delay data, it did not allow the byte enables to be delayed.
Generally speaking, a transaction on the PCI bus involves an address phase, in which an address is
transferred followed by a sequence of one or more data phases, in each of which a data item is
transferred. Each transaction is initiated by a bus device acting as master and involves another
participant known as the target. Data transfers may be oriented towards the master or towards the
target, with the corresponding transactions respectively being denoted read and write transactions.
The address specified at the start of the transaction is the address within the target of the first data
item transferred.
The PCI bus may be implemented in a 32bit or 64 bit version. In the 32 bit version, an address is
represented by 4 bytes, which are transmitted in a single bus clock cycle, and each data item can
be from 0 to 4 bytes, which are likewise transmitted in a single bus clock cycle. In the 64 bit
version, an address is represented by 8 bytes and each data item by from 0 to 8 bytes, which are
transmitted in a single clock cycle. According to the PCI bus specification, the number of data
bytes to be transferred in a given data phase is specified by byte enable bus signals presented
immediately prior to the actual transfer of data and may vary from one data phase to the next
within a burst transaction. The simulator makes the simplifying assumption that the first (N-1)
data phases of an N data phase transaction involve transfer of data with the maximum width (4
bytes or 8 bytes) for the bus version concerned and that only the final transfer may involve fewer
bytes, to allow for data transfers whose length is not an integral multiple of 4 or 8 bytes
respectively.
The PCI bus specification allows the target to delay the transfer of a data item by inserting wait
states, each lasting one clock cycle, in any data phase. In more detail, the first data phase of a
transaction can be extended to at most 16 clock cycles and each subsequent data phase to at most 8
clock cycles. The simulator allows the user to specify a maximum number of wait states for each
device. Note that these wait states are specified for the device as master, although strictly speaking
19
Design and Implementation of PCI Multi-Bus Simulator
they describe the behavior of the device’s target in the system under consideration. The user can
also specify whether the actual number of wait states used in each data phase is selected
deterministically or stochastically. With deterministic selection, exactly wait states are inserted in
every data phase. With stochastic selection, the simulator will for each data phase select a random
whole number as the number of wait states to be inserted. These features make it possible to
simulate the behavior of devices with slow response and of targets such as main memory, which
may exhibit access contention.
5.3 Relative parameters of Device, Data Generation, Input, Output, and Bus
In this simulator, every device can work as a PCI Bus master which can initiate either read or
write data transfer transaction by using specified block data. A device is defined as PCI Bus’s
logical device to work according to the PCI Bus specification. If a physical device which can both
read and write is to be simulated, or if the device uses several different block sizes or buffer size,
then two or more PCI Bus devices need to be defined. Every device has several basic characters
mentioned in table 1.
Name character Function
Priority Positive integer giving the priority of the device during bus arbitration by
using fixed device’s priority. For rotating and time quantum priority
arbitration, the value is ignored.
Transaction type There are two kind of transactions: Read, Write
Buffer size Positive integer giving the size in bytes of the buffer which the device as
20
Design and Implementation of PCI Multi-Bus Simulator
master will try to fill or empty during each transaction. If no new devices
are active during the transaction, all the data will be transferred in a single
burst. If pre-emption occurs before the end of the transaction, the rest of the
buffer will be transferred in on or more following secondary transactions.
Maximum data rate The maximum data rate is that the device can reach. If the contents of the
buffer have not been completely transferred by the time the next buffer full
is generated, the user is warned that a data overrun situation has occurred.
Maximum wait states The maximum number of wait states which can be inserted in each data
phase by the device’s target in the system.
Wait state generator process Stochastic if the number of wait states is selected randomly from the
interval [0...nw] for each data transfer phase, or Deterministic if the number
of wait states is exactly n w in each data transfer phase. (0<= n w<=8)
Latency timer The initial value is used for the latency timer at the start of each new
transaction of relative the device
Table1. Device parameters introduction
As for the input parameters, they are used to set general parameters of the simulation and general
bus parameters. When the simulator source code runs, it will show us some dialogue for the
setting parameters’ input and we will typed in interactively. The relative input parameters shown in
table 2.
Input parameters Type content
Number of masters A positive integer, the number of PCI Bus’s logical devices.
Number of clock cycles A positive integer, the length of the simulation clock cycles
in simulation
Throughput plot required(y/n) If yes, the simulator will plot 2D diagram with bus throughput and applied
load parameters. (y/n responses are not case sensitive)
…… ……
Data rate plot required(y/n) If yes, the simulator will plot 3D diagram with bus utilization and time at
all kinds of applied loads. (y/n responses are not case sensitive)
Histogram of transfer times plot If yes, the simulator will plot 3D histogram of transfer time at all kinds of
required(y/n) applied load. (y/n responses are not case sensitive)
Table2. relative input parameters
If a throughput plot is wanted, the user will input a name for the file to contain data for sending to
gnuplot. If a data rate plot is wanted, the user will give a name for the file to contain data for
sending to gnuplot, and for the size in clock cycles of the time slot to be used to sample the
utilization of the bus. There are relative points appear in the gnuplot diagram for corresponding
time slot. Choosing a smaller value will create a more accurate diagram in which the bus
utilization changes with time, but it will create a big file to be generated. If a histogram of transfer
times is wanted, the user will input a name for the file to contain data for sending to gnuplot and
also for the number of histogram bins corresponding to the relative unit normalized transfer time,
which is the time taken to transfer a buffer of data for corresponding device if no pre-emption
appears and no wait states are inserted during the period of transfer.
21
Design and Implementation of PCI Multi-Bus Simulator
Output from the simulator consists of some standard information directed to the standard output
stream stdout and a number of optional forms of output, as selected by the user during the initial
dialogue. Table 4 shows us the relative standard output parameters.
Name content
The amount of generated data It is generated by all of the PCI Bus devices during the simulation period
The amount of transferred data It is actually transferred on the PCI Bus during the period of simulation.
The average burst lengths It is the average burst lengths of data transferred by each of PCI Bus device
Table4. Relative standard output parameters
In this section, I simply introduce some basic input and output information. As for others, I will
not give more details because of the paper space.
Then, define constant value and parameters (table 5 shows some of parameters):
Parameter name content
#define IDLE 0 These 4 constant values are used to implement data transfer transaction
#define ADDRESS 1 operation including addressing, inserting overload, and data transfer.
22
Design and Implementation of PCI Multi-Bus Simulator
#define WAIT 2
#define DATA 3
Double f; PCI bus clock frequency (MHz)
Int width; PCI bus data width (Bytes)
Int arb; PCI bus arbitration scheme
Int nm; Number of PCI bus masters
…… ……
Int buff_size[]; Buffer size for master 1..n (Bytes)
Int read_write[]; R or W tranction type for master 1..n
Char rw_both[]; Both R/W transaction type for master 1..n
Int status_selection[] Wait state selection for master 1..n
Int LT[]; Latency timer value
Int PB[] PCI bridge selection
Int PBAbus1[] Bus0 device accesses Bus1 devices
Int PBAbus0[] Bus1 device accesses Bus0 devices
…… ……
Long int sim_total; Total simulation time (clock cycles)
Long int t; Current t;
Long int slot_width; Width of time slot for activity graph
Long int slot_current; Time for end of current time slot
Long int slot_data Amount of data transferred at start of slot
…… ……
Int pre_empt; It will be true, if a new PCI bus master has requested bus
Int throughput_plot; If throughput file is required, it will be true
Int rate_plot; If data rate plot is required, it will be true
Int verbose; If users want detail of transfer, it will be true
Char tqvalue; Time quantum scheme’s temporary parameter
Table5. Define constant value and parameters
Finally, I declare some of relative functions and define relative Classes (table 6 as follows)
Class/Function name content
Class device; class device{
public:
void pparams_bus1(FILE *fp);
void pparams_bus2(FILE *fp);
void pparams_bus3(FILE *fp);
……
void pparams_bus0_bus1(FILE *fp);
void pparams_bus1_bus0(FILE *fp);
……
~device(); };
Class arbitrator; class arbitrator{
public:
23
Design and Implementation of PCI Multi-Bus Simulator
24
Design and Implementation of PCI Multi-Bus Simulator
→ setting the remaining data in every device’s buffer is zero to corresponding array
→ setting every device’s burst length is zero to corresponding array
→ setting accumulated number of burst is zero to corresponding array
→ some of other initialization
Part 2: If a new buffer full of data is due for a device, checking that current buffer is empty,
insetting data, and deciding instant when next buffer is due
Part 3: If there are one or more devices with data in their buffers, then select the de vice with
highest priority to do data transaction operation according to the arbitration algorithm
Part 4: When the device with highest priority start to transfer data, we use switch() statement
to implement the whole data transfer process including IDLE operation to do
overhead operation before transferring data, ADDRESS operation to transfer address,
WAIT operation to execute wait state action during data transfer, and DATA operation
to transfer data item with general method and burst method
All these parts mentioned above used by every device, when they get bus control right.
(5) During the data transfer process, the system will output related data and result to
corresponding file for plotting. On the other hand, we can see the real time output on the
computer screen.
(6) Finally, at the end of the execution of software, the system will call GNUPLOT automatically
and start the plot of throughput, data rate, and histogram according to users’ requirement.
Fix priority algorithm: it is simple and just arbitrates several devices according to their fixing
priority. The device with high priority device always gets the bus control right for data transfer
operation. For example, device 1’s priority is 2, device 2’s priority is 3, and device 3’s priority is 4,
when they request to control bus simultaneously, the arbitration scheme will decide which one will
get the bus control right according to their priority’s value, the smaller the better.
Rotating priority algorithm: the arbitration scheme will not consider every device’s priority and
just according to a sequence to operate. For example, when device 1 finish its data transaction,
device 2 request the data transfer, when arbitrator finds its buffer not empty, the arbitrator will
give device 2 the bus control right. Or else, device 2 will have no right to control bus and
arbitrator will check device 3 for data transaction operation.
Time quantum algorithm: the arbitrator will give every device the same time quantum to transfer
their data. The time quantum is a constant which equals 16 clock cycles. If a device’s operation
time is smaller than time slice (or time quantum), it will finish data transaction smoothly. Or else,
the arbitration scheme will stop it, push its relative parameters into a stack for protection when its
operation does not finish in a fixed time slice and select next device to use bus for data transfer
operation according to corresponding algorithm.
25
Design and Implementation of PCI Multi-Bus Simulator
Red robin algorithm: this algorithm is almost the same as time quantum algorithm. On the other
hand, it is a little bit simple than the time quantum algorithm and I did not use it in my source code,
but I can add it according to the special requirement.
In our simulator, we mainly use former two arbitration algorithm. As for time quantum algorithm,
we just use it do some simple compare testing as supplement.
If users want a file containing data for a data rate plot be produced, then an output file containing
data suitable for reading into gnuplot will be generated. The final data to be plotted consisted of
(Time, Load, PCI_Bus utilization) triplets, (t, l, u), is fit for plotting as a 3D plot by the
GNUPLOT relative command. Figure19 shows us the example of 3D data rate plot.
26
Design and Implementation of PCI Multi-Bus Simulator
If users want a file containing data for a histogram produced, then an output file containing data
suitable for reading into gnuplot will be generated. The relative data to be plotted is composed of
(Transfer time, Load, Samples) triplets, (t,l,n), fit for plotting as a 3D plot via the gnuplot relative
command. Figure20 shows us a 3D histogram plot. In this example, because there are several
devices, the data file contains several samples columns, one for each device. When display the file,
user must select the desired samples column by using gnuplot command as follow:
Gnuplot> load “name of file.gnp”
Gnuplot> splot “name of data file” using 1:2:4 notitle
In this example, the suing 1:2:4 parameter to the splot command specifies that time value are
selected from column 1, load from column 2 and samples from column 4 of the file. Transfer
times for several devices can be plotted on the same histogram by suing several file and using
specification in the splot command. For example:
Gnuplot> load “name of file.gnp”
Gnuplot> splot “name of data file” using 1:2:3, “name of data file” using 1:2:6
In figure21, we can see that the CPU is connected to PCI Bus0 and PCI Bridge connects the
promary Bus to the secondary PCI Bus, PCI Bus1. In PCI Bridge specification, PCI Bus1 is
described as being downstream of the PCI-PCI Bridge and PCI Bus0 is upstream of the Bridge.
The SCSI and Ethernet devices are connected to the secondary PCI Bus. Physically the Bridge,
secondary PCI Bus and two devices would be contained on the same PCI card. The PCI -ISA
Bridge supports older, legacy ISA devices.
27
Design and Implementation of PCI Multi-Bus Simulator
If we want to design and implement a PCI multi-Bus hardware product, we must spend lots of
time thinking over PCI address space assignment, PCI configuration header, PCI I/O, and PCI
memory address. On the other hand, we also deal with PCI-PCI Bridges’ PCI configuration cylces,
PCI Bus numbering, PCI I/O and PCI memory windows. Under Linux system, the kernel PCI data
structures diagram looks like Figure 22. Each PCI device (including the P2P Bridges) is descirbed
by a pci_dev data structure. Every PCI Bus is described by a pci_bus data structure. The result is a
tree structure of PCI buses each of which has a number of child PCI devices attached to it. As a
PCI bus can only be reached using a PCI-PCI Bridge (except the primary PCI bus, bus 0), each
pci_bus contains a pointer to the PCI device (the PCI-PCI Bridge) that it is accessed through. That
PCI device is a child of the the PCI Bus's parent PCI bus.
28
Design and Implementation of PCI Multi-Bus Simulator
When Linux looks for downstream PCI Buses, it must configure the intervening P2P Bridges’
secondary and subordinate Bus numbers. As for the hardware aspect, P2P Bridges need to know
the primary Bus number, secondary Bus number, subordiante Bus number, PCI I/O and PCI
memory windows for P2P Bridges to pass PCI I/O, PCI memory or PCI configuration address
space reads/writes across them. How does Linux to deal with this problem which is at the time
when you wish to configure any given PCI-PCI Bridge you do not know the subordiante Bus
number for the Bridge, you do not know if there are further P2P Bridge downstream and you do
not know what number will be assigned to them. The answer is to depthwise recursive algorithm
and scan each Bus for any P2P Bridges assgining them numbers as they are found. As each
PCI-PCI bridge is found and its secondary bus numbered, assign it a temporary subordinate
number of 0xFF and scan and assign numbers to all PCI-PCI bridges downstream of it. This all
seems complicated but the worked example below makes this process clearer.
Here I give a simple example for the Linux P2P Bridge nubmering process. There are 4 basic steps
to simply describe the whole process:
Step1: in Figure23, The first Bridge scan would find Bridge1. The PCI bus downstream of
Bridge1 would be numbered as 1 and Bridge1 assigned a secondary bus number of 1 and a
temporary subordinate bus number of 0xFF. This means that all Type 1 PCI Configuration
addresses specifying a PCI bus number of 1 or higher would be passed across Bridge1 and
onto PCI Bus1. They would be translated into Type 0 Configuration cycles if they have a
bus number of 1 but left untranslated for all other bus numbers. This is exactly what the
Linux PCI initialisation code needs to do in order to go and scan PCI Bus1.
Step2: Linux uses a depthwise algorithm and so the initialisation code goes on to scan PCI Bus 1.
Here it finds PCI-PCI Bridge2. There are no further PCI-PCI bridges beyond PCI-PCI
Bridge2, so it is assigned a subordinate bus number of 2 which matches the number
assigned to its secondary interface. Figure24 shows how the buses and PCI-PCI bridges are
numbered at this point.
29
Design and Implementation of PCI Multi-Bus Simulator
Step3: The PCI initialisation code returns to scanning PCI Bus 1 and finds another PCI-PCI
Bridge, Bridge3. It is assigned 1 as its primary bus interface number, 3 as its secondary bus
interface number and 0xFF as its subordinate bus number. Figure25 shows how the system
is configured now. Type 1 PCI configuration cycles with a bus number of 1, 2 or 3 wil be
correctly delivered to the appropriate PCI buses.
Step4: Linux starts scanning PCI Bus 3, downstream of PCI-PCI Bridge3. PCI Bus 3 has another
PCI-PCI bridge (Bridge4) on it, it is assigned 3 as its primary bus number and 4 as its
secondary bus number. It is the last bridge on this branch and so it is assigned a
subordinate bus interface number of 4. The initialisation code returns to PCI-PCI Bridge3
and assigns it a subordinate bus number of 4. Finally, the PCI initialisation code can assign
4 as the subordinate bus number for PCI-PCI Bridge1. Figure26 shows the final bus
numbers.
30
Design and Implementation of PCI Multi-Bus Simulator
These 4 steps refers to not only hardware but also software, they are rather complex to be
implemented. Because our PCI multi-Bus simulator is completely implemented by software under
Linux Red Hat, it just do some simple operation and transaction. Figure27 shows the simple
system structure. We try to design and implement 6~8 PCI Buses (including Bus0) and 5~7 P2P
Bridges. Our Buses structure looks like star style (Bus1~Busn are extended from Bus0).
Under multi PC Bus situation, Besides the single PCI bus’ relative functions, I add several new
ones such as Bridge_Bus1(), Bridge_Bus2()…Bridge_Bus6(), Bus0_access_Bus(),
Bus1_access_Bus()…. Bus6_access_Bus(). These functions deal with the data transfer transaction
among Bus0, Bus1, Bus2…Bus6 devices. I here simply introduce these functions.
31
Design and Implementation of PCI Multi-Bus Simulator
PCI_Bridge_Bus1(): this function is used to implement data transaction among bus1’s devices.
According to one of arbitration scheme, when PCI Bridge1 gets the bus0 control right, it will
check if bus1’s devices want data transfer transaction. If one or more bus1’s devices want it, the
Bridge1 will let them do the relative operations and when they finish, Bridge1 will release the
bus0’s control right and bus0’s other device will get the bus0’s control right to do corresponding
operation. In this function, we will call PCI_Bus1_access_Bus () function which will let Bus1’s
devices access any other Buses’ devices. On the other, PCI_Bridge_Bus1() can extend its new
subordinate Buses according to our original design, although this function is still not started to be
implemented.
PCI_Bridge_Bus2(): this function is used to implement data transaction among bus2’s devices.
According to one of arbitration scheme, when PCI Bridge2 gets the bus0 control right, it will
check if bus2’s devices want data transfer transaction. If one or more bus2’s devices want it, the
Bridge2 will let them do the relative operations and when they finish, Bridge2 will release the
bus0’s control right and bus0’s other device will get the bus0’s control right to do corresponding
operation. In this function, we will call PCI_Bus2_access_Bus () function which will let Bus2’s
devices access any other Buses’ devices. On the other, PCI_Bridge_Bus2() can extend its new
subordinate Buses according to our original design, although this function is still not started to be
implemented.
PCI_Bus0_access_Bus1(): This function focuses on bus0’s device accessing bus1’s devices. When
one of bus0’s devices (except Bridge) wants to access bus1’s devices, we make this bus0’s device
and all bus1’s devices as a new group. In this new group, we can use different arbitration
algorithm to implement data transfer transaction among these devices. When corresponding data
transaction finish, this new group will disintegrate automatically.
PCI_Bus0_access_Bus(): This function can implement that Bus0’s non-Bridge devices access any
other Buses’ devices such as Bus1, Bus2, Bus3…Bus6 and so on. The accessing process almost
the same as PCI_Bus0_access_Bus1().
PCI_Bus1_access_Bus():This function can implement that Bus1’s non-Bridge devices access any
other Buses’ devices such as Bus0, Bus2, Bus3…Bus6 and so on. The accessing process almost
the same as PCI_Bus1_access_Bus0(). From PCI_Bus0_access_Bus(), we can know Bus2, Bus3,
Bus4, Bus5 and Bus6’s accessing process.
32
Design and Implementation of PCI Multi-Bus Simulator
All these functions mentioned above are the main ones in the multi PCI bus simulator. Figure28
shows the source code executive process and you can get the simulation process according to the
diagram. As for others, I do not give more details.
As for the whole PCI Multi-Bus simulator, I just introduce the basic content of design and
implementation. If you want more detail, please review my source code. On the other hand, this
simulator is just used for PCI Bus research and it still exists lots of logic bugs which causes many
problems. Therefore, this simulator maybe can’t give you accurate simulation of PCI bus data
transfer transaction and all these stuffs are just used as reference. However, I believe that this
simulator will become more and more robust in the coming future under further development.
33
Design and Implementation of PCI Multi-Bus Simulator
REFERENCE
[1] PCI Local Bus Specification, revision 3.0, August 12, 2002
[2] PCI-to-PCI Bridge Architecture Specification, revision 1.1, December 18, 1998
[3] http://en.wikipedia.org/wiki/PCI-X
[4] http://en.wikipedia.org/wiki/PCI_Express#_note-2
[5] http://tldp.org/LDP/tlk/dd/pci.html
[6] PCI Local Bus (ppt)
[7] PCI Family History (pdf)
[8] PCI-X 2.0 Overview (pdf)
[9] PCI Local Bus Specification, revision 3.0, Feburary 3, 2004
[10] PCI BIOS Specification, revision 2.1, August 26, 1994
APPENDIX
//Part of source code of PCI Multi-Bus Simulator
Call head file: pcibridge.h”
#include “pcibridge.h”
34
Design and Implementation of PCI Multi-Bus Simulator
Void bus::PCI_Bridge_Bus1(){
for (int r1=0;r1<=nm1;r1++){
if (bus1_access_bus[r1]==0){
bus_object.PCI_Bus1_acces_Bus(0,nm); //access the BUS0's devices
};
if (bus1_access_bus[r1]==2){
bus_object.PCI_Bus1_acces_Bus(2,nm4); //access the BUS2's devices
};
if (bus1_access_bus[r1]==3){
bus_object.PCI_Bus1_acces_Bus(3,nm5); //access the BUS3's devices
};
if (bus1_access_bus[r1]==4){
bus_object.PCI_Bus1_acces_Bus(4,nm6); //access the BUS4's devices
};
if (bus1_access_bus[r1]==5){
bus_object.PCI_Bus1_acces_Bus(5,nm7); //access the BUS5's devices
};
if (bus1_access_bus[r1]==6){
bus_object.PCI_Bus1_acces_Bus(6,nm8); //access the BUS6's devices
};
};
Void bus::PCI_Bridge_Bus2()
Void bus::PCI_Bridge_Bus3()
Void bus::PCI_Bridge_Bus4()
Void bus::PCI_Bridge_Bus5()
Void bus::PCI_Bridge_Bus6()
35
Design and Implementation of PCI Multi-Bus Simulator
void graph::draw_graph()
Functions’ defination of all Time Quatum Class and objects (omited here!)
Main(){
Bus_object.businit();
36
Design and Implementation of PCI Multi-Bus Simulator
if (PBAbus[i]!=0){
printf("\n BUS0 device %d is communicating with BUS%d devices!\n\n",i,PBAbus[i]);
if (PBAbus[i]==1){
bus_object.PCI_Bus0_acces_Bus(1,nm1); //access the BUS1's devices
};
if (PBAbus[i]==2){
bus_object.PCI_Bus0_acces_Bus(2,nm4); //access the BUS2's devices
};
if (PBAbus[i]==3){
bus_object.PCI_Bus0_acces_Bus(3,nm5); //access the BUS3's devices
};
if (PBAbus[i]==4){
bus_object.PCI_Bus0_acces_Bus(4,nm6); //access the BUS4's devices
};
if (PBAbus[i]==5){
bus_object.PCI_Bus0_acces_Bus(5,nm7); //access the BUS5's devices
};
37
Design and Implementation of PCI Multi-Bus Simulator
if (PBAbus[i]==6){
bus_object.PCI_Bus0_acces_Bus(6,nm8); //access the BUS6's devices
};
if (PBAbus[i]==7){
bus_object.PCI_Bus0_acces_Bus(7,nm9); //access the BUS7's devices
};
if (PBAbus[i]==8){
bus_object.PCI_Bus0_acces_Bus(8,nm10); //access the BUS8's devices
};
}
else{
printf("\n BUS0 device can not access this BUS, because it is not exist!\n");
};
……
};
... …
while (t < simtot){
// implement Bus0’s data transfer transaction including Read and Write
………..}
// 2D/3D plotting
graph_object.draw_graph();
} // end of main()
38