0% found this document useful (0 votes)
7 views23 pages

Xapp 458

Uploaded by

takla.houwayek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views23 pages

Xapp 458

Uploaded by

takla.houwayek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Application Note: Spartan-3A FPGA Family

R Implementing DDR2-400 Memory Interfaces


in Spartan-3A FPGAs
Author: Eric Crabill
XAPP458 (v1.0.1) July 9, 2009

Summary High-performance consumer products and their requirement for low-cost, high-bandwidth
memory create demand for high-performance DDR2 memory interfaces. Xilinx offers a
Memory Interface Generator (MIG) integrated in the CORE Generator™ software for ultimate
design flexibility and ease-of-use. MIG is a free, user-friendly tool designed to create memory
interfaces in unencrypted RTL. This tool supports multiple memory architectures across a
variety of FPGA selections, providing system designers with the flexibility to easily customize
their own design.
Spartan®-3A FPGAs with the higher speed grade (-5) have been specified for operation up to
DDR2-333 using a 166 MHz clock, while lower speed grade (-4) devices have been specified
for operation up to DDR2-266 using a 133 MHz clock. Based on demand for even higher
performance, Xilinx has validated a DDR2-400 (200 MHz clock) memory interface in
Spartan-3A FPGAs with the higher speed grade (-5). The validation results also apply to
Spartan-3AN and Spartan-3A DSP FPGAs with the higher speed grade (-5).
The DDR2-400 memory interface discussed in this application note is derived from the default
output of MIG. The design is fully verified in hardware using Spartan-3A FPGAs with the higher
speed grade (-5) assembled on Spartan-3A Starter Kits. The validation effort includes
characterization at different process corners, as well as temperature and voltage variations that
meet commercial grade requirements.

Purpose The goal of this application note is to thoroughly document all aspects of the DDR2-400
memory interface to allow customers to leverage it in their own applications, drastically
reducing development time. This document has the following organization:
• Memory Interface
♦ Component Configuration
♦ Changes to Standard MIG Output
♦ Timing Budgets for 200 MHz
• Verification Platform
♦ Reference Clock Quality
♦ Signal Termination and Signal Integrity
♦ Component Placement and Routing
• Verification Design and Process
♦ Functional Description
♦ Error Checking via Frame CRC
♦ Power Supply Control via the RS-232 Interface
♦ Reference Clock Generation
♦ Verification Plan
♦ Results

© 2007–2009 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, ISE, and other designated brands included herein are trademarks of Xilinx in the United States and other
countries. All other trademarks are the property of their respective owners.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 1


R

Memory Interface

The information presented in this application note applies to implementation of similar DDR2-
400 memory interfaces with point-to-point connections and JEDEC compatible DDR2 memory
devices.

Memory A wide variety of DDR2 components are available from a number of memory vendors. The
Interface DDR2-400 memory component selection is made while keeping two important project goals in
mind.
The first goal is to leverage existing memory interface source code from MIG. By avoiding
substantial changes or redesign, the need for logical reverification of the memory interface is
eliminated. Currently, the MIG-based memory interface for Spartan-3A FPGAs supports DDR2
devices with a CAS latency of three. Therefore, a device offering DDR2-400 performance with
CL = 3 is required.
The second goal is to verify the design in hardware, a task that requires a test board. While it
is possible to build a unique board specifically for test, the existing Spartan-3A Starter Kit is
already populated with a memory device offering DDR2-400 performance with CL = 3. The use
of the Spartan-3A Starter Kit for hardware verification eliminates the expense of designing a
unique board for verification.

Component Configuration
The selected DDR2-400 capable memory device with CL = 3 is a Micron Technology
MT47H32M16BN-3:D, a 512 Mb device organized as 8 Meg x 16 bits x 4 banks. This device is
standard on production Spartan-3A Starter Kits. It has superior AC performance characteristics
compared to the lower performance MT47H32M16BN-5E:D but is priced similarly. The device
interfaces to the Spartan-3A FPGA using point-to-point connections as shown in Figure 1.
X-Ref Target - Figure 1

SD_A<12:0>
A[12:0]
SD_DQ<15:0>
DQ[15:0]
SD_BA<1:0>
BA[1:0]
SD_RAS
RAS#
SD_CAS
CAS#
SD_WE
WE#
SSTL18 Termination

SD_UDM
UDM
SD_UDQS_N
Spartan-3A UDQS#
FPGA SD_UDQS_P
UDQS DDR2
SD_LDM SDRAM
LDM
SD_LDQS_N
LDQS#
SD_LDQS_P
LDQS
SD_CS
CS#
SD_ODT
ODT
SD_CK_N
CK#
SD_CK_P
CK
SD_CKE
CKE

X458_01_090507

Figure 1: Point-to-Point Memory Interface

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 2


R

Memory Interface

The Spartan-3A FPGA, which is a Xilinx XC3S700A-5FG484C, is a higher speed grade (-5)
device that accommodates the higher performance memory interface. This device is not
standard on production Spartan-3A Starter Kits. The test boards for hardware verification are
reworked to replace the lower speed grade (-4) devices with higher speed grade (-5) devices.

Changes to Standard MIG Output


The memory interface realized in this application note is derived from MIG output but is not
available directly from MIG. The memory interface is initially created using MIG with the
parameter settings shown below. For more information on MIG and how to use it, refer to
UG086, Memory Interface Solutions User Guide. Some minor modifications are made to
accommodate the Spartan-3A Starter Kit, operation at 200 MHz, and the needs of the
verification design.
*******************************************
Part : XC3S700A-FG484
Frequency in MHz : 133
Speed grade : 4
No of controllers : 1
DCM used : 1
Add test bench : 1
Number of write pipes : 4
*******************************************
Memory type : MT47H32M16XX-5E
Bits per strobe : 8
Banks for data : 3
Data bits : 16
Banks for addr & ctrl : 1
Row address bits : 13
Column address bits : 10
Bank address bits : 2
*******************************************
Mode Register
Burst Length : 4(010)
Burst Type : Sequential(0)
CAS Latency : 3(011)
Mode : Normal(0)
DLL Reset : Yes(1)
Write Recovery : 3(010)
PD Mode : Fast Exit(0)
Extended Mode Register
DLL Enable : Enable-Normal(0)
Output Drive : Full Strength(0)
RTT (nominal) : RTT Disabled(00)
Additive Latency : 0(000)
OCD Operation : OCD Exit(000)
DQS# Enable : Enable(0)
RDQS Enable : Disable(0)
Outputs : Enable(0)
*******************************************
The scope of the changes is small enough that it is not necessary to logically reverify the
interface code for correctness. The modified memory interface is assumed to function correctly,
which is a valid assumption, based on the system functional simulation and the ultimate
hardware verification.

Changes Specific to the Spartan-3A Starter Kit (RTL and UCF)


In addition to providing a generic DDR2 memory interface, MIG also provides a design
customized specifically for the Spartan-3A Starter Kit. The design is known good, materially the

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 3


R

Memory Interface

same as the generic design, and has a few modifications noted in the documentation provided
with the files. Relevant modifications are summarized below for reference:
• The sys_clk reference clock input is modified from differential to single ended.
• The input data capture FIFOs are modified to support routing requirements of the pinout.
• The UCF is modified to reflect these RTL modifications and pinout of this board.
Users of MIG who are targeting a configuration that differs from the Spartan-3A Starter Kit
should be aware that the memory interface used in the verification design is intended for the
Spartan-3A Starter Kit and contains these modifications. However, the changes specific to the
Spartan-3A Starter Kit are not required for general DDR2-400 operation.

Changes Specific to 200 MHz (RTL and UCF)


Minor modifications to the baseline Spartan-3A Starter Kit memory interface are required to
accommodate operation at 200 MHz. The code changes are described in the documentation
provided with the source code to the verification design. The RTL modifications and their
purposes are summarized here:
• The initialization counter is modified to increase the time-out value to guarantee a 200 µs
interval with a 200 MHz system clock. Without this change, the memory interface begins
the initialization process too early. This particular modification is made in the counter code
itself, because the time-out value is not a parameter.
• The refresh counter is modified to increase the time-out value (the max_ref_cnt
parameter) to reduce the refresh rate. Without this change, the memory interface issues
substantially more refresh cycles than necessary.
• The refresh-to-active counter is modified to increase the time-out value (the
rfc_count_value parameter) to satisfy the memory timing requirements with a 200 MHz
system clock. Without this change, the memory interface does not wait enough cycles for
the refresh-to-active interval, thereby violating this specification.
• For experimental purposes, an additional control input is added to the memory interface to
allow the user to disable the refresh counter. In general use cases, this feature has limited
utility and is not required for DDR2-400 operation.
• The system clock generation is modified to accept a 50 MHz input and multiply it up to
200 MHz before feeding it into the memory interface. This modification allows the use of
the 50 MHz oscillator on the Spartan-3A Starter Kit, with a penalty of extra jitter. While a
direct 200 MHz differential clock signal is preferred, the Spartan-3A Starter Kit has limited
provisions to connect such a signal source.
The UCF also has modifications to support operation at 200 MHz. The timing specifications are
scaled, and additional placement and routing constraints are added to ensure consistent
results. The modifications and their purposes are summarized below:
• Timing specifications are scaled for 200 MHz operation. The PERIOD constraint is
reduced below 5.000 ns, and the MAXDELAY constraints are also scaled, where
appropriate.
• Location constraints are added to the Digital Clock Managers (DCMs) and Global Clock
Buffers (BUFGs). Signals between these primitives not routed on dedicated resources are
constrained with Directed Routing (DIRT) constraints in the UCF, which eliminate
variability in the clocking circuits that might result from automatic placement and routing.
The result is shown in Figure 2.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 4


R

Memory Interface

X-Ref Target - Figure 2

X458_02_090507

Figure 2: Global Clock Resources with Placement and Routing Constraints

• All logic involved in the input capture circuit is placed in the standard constraint file. To
ensure consistent results, all critical routes in the input capture circuit are given DIRT
constraints in the UCF. These circuits are discussed in detail in XAPP768c, Interfacing
Spartan–3 Devices With 166 MHz or 333 Mb/s DDR SDRAM Memories. The input capture
logic for the top byte is shown in Figure 3, page 6, and the input capture logic for the
bottom byte is shown in Figure 4, page 7.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 5


R

Memory Interface

X-Ref Target - Figure 3

X458_03_090507

Figure 3: Top Byte Input Capture with Placement and Routing Constraints

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 6


R

Memory Interface

X-Ref Target - Figure 4

X458_04_090507

Figure 4: Bottom Byte Input Capture with Placement and Routing Constraints

DIRT constraints are not required in generic MIG designs. The location constraints in the
generic design place the I/O and input capture logic in an arrangement that is recognized by the
router. Upon recognizing this arrangement, the router uses a “template” to route signals with
low skew and low delay. However, if the arrangement is not recognized, or low-skew and low-
delay routing channels are unavailable, the router does not issue a warning.
In the DDR2-400 design, DIRT constraints are applied to ensure that the same low-skew and
low-delay routes are used in every route attempt. The success of the DIRT constraints are
verified in the place-and-route report. The DIRT constraints cover additional intermediate
routes that are drawn in green in Figure 3 and Figure 4. In this manner, the timing results for the
input capture are 100% reproducible.

Hierarchy and Implementation


MIG output for the Spartan-3A Starter Kit is generated with clocking resources and a test
bench; no option is provided to eliminate the clocking resources or the test bench. The test

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 7


R

Memory Interface

bench includes a small hardware test application that generates and checks a sequence of
memory accesses.
For small designs, it is easy to replace the hardware test application with the desired user
application. For larger designs, it is desirable to use the memory interface as a sub-module.
The hardware test application is therefore removed from “main_0”, and the exposed memory
interface signals are exported. The resulting structure has clocking resources but no test
bench, which is one of the four generation options available for the generic MIG design.
The synthesis, mapping, and place-and-route options for the DDR2-400 design on the
Spartan-3A Starter Kit are similar to the generic MIG design, but employ higher effort levels for
better performance.

Timing Budgets for 200 MHz


Evaluation of the PERIOD and MAXDELAY constraints by the static timing analyzer is not
sufficient to determine if the memory interface is functional at a particular frequency. The
PERIOD constraint covers the internal timing between synchronous elements, and the
MAXDELAY constraints cover portions of other critical paths.
XAPP768c, Interfacing Spartan–3 Devices With 166 MHz or 333 Mb/s DDR SDRAM
Memories, and XAPP454, DDR2 SDRAM Memory Interface for Spartan-3 FPGAs, both
discuss the concept of timing budgets for the interface between the FPGA and the memory
device. Five timing budgets are to be considered. All timing budgets must pass; otherwise the
memory interface cannot be expected to function at the DDR2-400 performance level. The
timing budgets are:
• DDR Read
• DDR Write
• SDR Output
• Loopback
• Clock to Memory
Most timing data used in these budgets is obtained from the device data sheets. Additional data
is obtained from Xilinx ISE® software. Assumptions and other important notes are discussed
with each budget.
The DDR Read timing budget pertains to the input data capture scheme when the Spartan-3A
FPGA is receiving data from the memory device. The input capture scheme is discussed in
detail in XAPP768c, Interfacing Spartan–3 Devices With 166 MHz or 333 Mb/s DDR SDRAM
Memories. Table 1 through Table 3 show the DDR Read timing budget for DDR2-400 operation.
Package skew data for Spartan-3A FPGAs is not published; internal Xilinx data is used instead.
For designs in other Spartan-3A FPGA selections, the user should file a support case with
Xilinx Technical Support. The board layout skew between signals in each signal group (data
and strobe, address and control) is assumed to be 50 ps or less, which is a realistic amount in
a point-to-point connection, provided that some attention is paid to delay matching during board
routing.

Table 1: DDR Read Timing Budget


Leading-Edge Trailing-Edge
Parameter Value Meaning
Uncertainties Uncertainties
Master Clock Frequency in
200 0 0 Master clock frequency
MHz
Tclock 5000 0 0 Master clock period
Clock phase (half
Tclock_phase 2500 0 0
period)

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 8


R

Memory Interface

Table 1: DDR Read Timing Budget (Cont’d)


Leading-Edge Trailing-Edge
Parameter Value Meaning
Uncertainties Uncertainties
DCM/BUFG duty cycle
Tclock_duty_cycle_dist 240 0 0 distortion from the
FPGA data sheet
Total data period,
Tdata_period 2260 0 0 Tclock_phase -
Tclock_duty_cycle_dist
Strobe to data
Tdqsq 350 350 0 distortion from memory
data sheet
Package skew for
Tpackage_skew 60 60 60
XC3S700A-FG484
Setup time for
RAM16X1D from the
Tds -70 -70 0
FPGA data sheet,
parameter Tds
Hold time for
RAM16X1D, from
Tdh 130 130 0
FPGA data sheet,
parameter Tdh
Common clock means
Tjitter 0 0 0 all signals jitter
together; this is zero
Skew on local clock as
Tlocal_clock_skew 64 64 64 reported in PAR clock
summary
Hold skew for DQ from
Tqhs 450 0 450
the memory data sheet
Skew between data
Tpcb_layout_skew 50 50 50 and strobes on the
board (assumed)
Worst case for leading
Total Uncertainties 584 624 and trailing can never
happen simultaneously
Window for DQS Position 1052 584 1636

The duty cycle distortion of the global clock resources at 200 MHz is listed here as 240 ps
although the device data sheet indicates the value is 350 ps. Based on extensive data, Xilinx
tightened the duty cycle distortion specification for the circuits used in this particular
application, specifically at 200 MHz. While this gain seems immaterial here, it plays a
significant role in a subsequent timing budget.

Table 2: Delay Details for DDR Read Timing Budget


Parameter Value Meaning
Data Delay (IOB to FIFO) 386 Measured in FPGA Editor
Local Clock Route Delay 421 As reported in the PAR summary
DQS Delay from IOB to LUT 418 Measured in FPGA Editor
LUT Delay for DQS 620 LUT delay from the FPGA data sheet
Total DQS Delay 839

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 9


R

Memory Interface

The total extra DQS delay must be within the lower and upper bounds of the data valid window,
and the total result is PASS (see Table 3).

Table 3: LUT Delay Derating Values


Parameter 100% 90% 80% 70% 60% 50% 40%
Data Delay 386 347 309 270 232 193 154
LUT Delay 620 558 496 434 372 310 248
Number of LUTs
4 4 5 6 7 8 10
in a Clock Phase
Number of LUTs
1 2 2 3 3 4 5
to Delay DQS
Total DQS Delay 1459 1871 1663 1889 1619 1660 1576
Total Extra DQS
1073 1524 1354 1619 1388 1467 1421
Delay
Data Valid
Window (Lower 584 566 547 529 510 492 474
Bound)
Data Valid
Window (Upper 1636 1648 1661 1673 1686 1698 1710
Bound)
PASS PASS PASS PASS PASS PASS PASS

The DDR Write timing budget pertains to the output data generation scheme when the
Spartan-3A FPGA is transmitting data to the memory device using DDR output flip-flops.
Table 4 shows the DDR Write timing budget for DDR2-400 operation.

Table 4: DDR Write Timing Budget


Leading-Edge Trailing-Edge
Parameter Value Meaning
Uncertainties Uncertainties
Tclock 5000 0 0 Master clock period
Clock phase (half
Tclock_phase 2500 0 0
period)
DCM/BUFG duty cycle
Tclock_duty_cycle_dist 240 0 0 distortion from the
FPGA data sheet
Total data period,
Tdata_period 2260 0 0 Tclock_phase -
Tclock_duty_cycle_dist
DQ and DM input setup
time relative to DQS
Tds 400 400 0
from the memory data
sheet, Tds
DQ and DM hold time
relative to DQS from
Tdh 400 0 400
the memory data
sheet, Tdh
Package skew for
Tpackage_skew 60 60 60
XC3S700A-FG484
Clock tree skew for
Clock_tree_skew 87 87 87 DDR signals measured
in the FPGA Editor

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 10


R

Memory Interface

Table 4: DDR Write Timing Budget (Cont’d)


Leading-Edge Trailing-Edge
Parameter Value Meaning
Uncertainties Uncertainties
Common clock means
Tjitter 0 0 0 all signals jitter
together; this is zero
Phase offset between
Tclock_out_phase 200 200 200 outputs of DCM from
the FPGA data sheet
Skew between data
Tpcb_layout_skew 50 50 50 and strobes on the
board (assumed)
Data Valid Window
797 797 Includes 2 * Tjitter
Edges
Margin 666 Budget PASS

The SDR Output timing budget pertains to the output address and control generation scheme
when the Spartan-3A FPGA is accessing the memory device using output flip-flops. Table 5
shows the SDR Output timing budget for DDR2-400 operation.

Table 5: SDR Output Timing Budget


Leading-Edge Trailing-Edge
Parameter Value Meaning
Uncertainties Uncertainties
Tclock 5000 0 0 Master clock period
Memory Address Address and control input
and Control Input 375 375 0 setup from the memory data
Setup Time (Tis) sheet
Memory Address Address and control input
and Control Input 375 0 375 hold time from the memory
Hold Time (Tih) data sheet
Package skew from
Tpackage_skew 60 60 60 packaging team for
XC3S700A-FG484
Common clock means all
Tjitter 0 0 0 signals jitter together; this is
zero
Clock tree skew for SDR
Tclock_tree_skew 78 78 78 signals measured in FPGA
Editor
Skew between address and
Tpcb_layout_skew 50 50 50 control signals on the board
(assumed)
Phase offset between
Tclkout_phase 200 200 200 outputs of the DCM from the
FPGA data sheet
Data Valid Window
763 763 Includes 2 * Tjitter
Edges
Margin 3474 Budget PASS

The Loopback timing budget pertains to the input data capture scheme when the Spartan-3A
FPGA is receiving data from the memory device. This additional budget supplements the DDR

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 11


R

Memory Interface

Read timing budget and is related to the write-enable generation for the input capture scheme.
Table 6 shows the Loopback timing budget for DDR2-400 operation.

Table 6: Loopback Timing Budget


Leading-Edge Trailing-Edge
Parameter Meaning
Delays Delays
Tclock 5000 0 Master clock period
Tclock_phase 2500 0 Clock phase (half period)
Delay Details for DQS
DQS Delay from IOB to The delay of the DQS line from
418 418
LUT the input buffer to the LUT
DQS Local Clock Route The delay of the DQS line from
421 421
Delay the output of LUT delay element
The LUT delay on DQS is not
considered, since both DQS and
Total DQS Delay 839 839
rst_dqs_div signals are delayed
the same amount
Delay Details for Loopback
Constrained using MAXDELAY
rst_dqs_div Delay from IOB
668 668 constraints; value from the PAR
to LUT
report
Constrained using MAXDELAY
delayed_rst_dqs_div Delay
871 871 constraints; value from the PAR
to LUT
report
LUT delay (OR gate) 620 620 Implemented in a single LUT
Constrained using MAXDELAY
fifo_1_wen Delay from LUT 1041 1041 constraints; value from the PAR
report
Total Loopback Signal Sum of all delays listed
3200 3200
Delay
Margin 2639 2639 Budget PASS

The Clock-to-Memory timing budget evaluates the path from the system clock source, through
the Spartan-3A FPGA, to the memory device clock input. The revised duty cycle distortion
specification for the global clock resources is essential in this budget for DDR2-400 operation,
shown in Table 7.

Table 7: Clock-to-Memory Timing Budget


Leading-Edge Trailing-
Parameter Meaning
Delays Edge Delays
Tclock 5000 0 Master clock period
Tclock_phase 2500 0 Clock phase (half period)
Cycle-to-cycle jitter of the
oscillator (Max set by
CLKIN_CYC_JITT_DLL_HF 150 150 CLKIN_CYC_JITT_DLL_
HF from the Spartan-3A
data sheet)
DCM/BUFG duty cycle
CLKOUT_DUTY_CYCLE_DLL 240 240 distortion from the FPGA
data sheet

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 12


R

Verification Platform

Table 7: Clock-to-Memory Timing Budget (Cont’d)


Leading-Edge Trailing-
Parameter Meaning
Delays Edge Delays
Derived after duty cycle
Clock Phase from DCM plus BUFG 2185 2815 distortion and jitter values
are subtracted from clock
From the DDR memory
Memory Clock Jitter 150 150
data sheet
From the DDR memory
Memory Duty Cycle Distortion 2250 2750
data sheet
Derived after jitter and
duty cycle distortion
Memory Input Clock Timing 2183 2833
parameters are applied to
the input clock
Margin 3 18 Budget PASS

All five timing budgets in consideration pass, as do the PERIOD and MAXDELAY constraints.
The memory interface is expected to function at the DDR2-400 performance level. The
verification process takes the design into the lab to verify the conclusion drawn from the timing
budgets.

Verification The Spartan-3A Starter Kit is used as the verification platform, avoiding the need to design a
Platform unique board for the verification of the DDR2-400 memory interface. As an additional
convenience, the use of this board allows technical information generated during its design to
be leveraged in the project documentation as reference material.
The test boards for hardware verification are reworked to remove the existing lower speed
grade (-4) devices, which are replaced with the higher speed grade (-5) devices. Otherwise, the
verification platforms use the same board as the kits currently available on the Xilinx website,
part numbers HW-SPAR3A-SK-UNI-G, HW-SPAR3AN-SK-UNI-G, and HW-SPAR3ADDR2-
DK-UNI-G.

Reference Clock Quality


The DDR2-400 memory interface requires a 200 MHz reference clock. Per the Clock-to-
Memory timing budget shown in Table 7, the 200 MHz reference clock must have a cycle-to-
cycle jitter of 150 ps or less. This generally requires the direct input of a 200 MHz reference
clock from a good source. For new designs, an appropriate 200 MHz reference clock source
(typically differential) should be used.
The Spartan-3A Starter Kit has flexible clocking options, but they become limited when the
desired frequency exceeds 125 MHz. Most +3.3V, single-ended clock sources are not available
at 200 MHz. As a result, a compromise is used for verification purposes. It is discussed later
with the verification design.

Signal Termination and Signal Integrity


DS529, Spartan-3A FPGA Family: Complete Data Sheet, and UG086, Memory Interface
Solutions User Guide, indicate the following recommended termination scheme:
• Selection of SSTL18_I versus SSTL18_II for the Spartan-3A FPGA SelectIO™ mode is at
the discretion of the user, based on the topology of the board design.
• For single-ended, unidirectional signals: One 50Ω termination to VTT, within 1 inch of the
load device.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 13


R

Verification Platform

• For single-ended, bidirectional signals: Two 50Ω terminations, each to VTT, within 1 inch of
each device.
• For differential, unidirectional signals: One 100Ω differential termination, between the
signal pair, or one 50Ω termination to VTT on each signal of the pair, within 1 inch of the
load device.
• For differential, bidirectional signals: Two 100Ω differential terminations, each between the
signal pair, or two 50Ω terminations to VTT on each signal of the pair, within 1 inch of each
device.
• Where terminations are used, they can be implemented with external resistors and/or the
on-die termination feature of the memory device, where appropriate.
In practice, there are a variety of valid termination techniques that cover a spectrum of
cost/performance points. In addition to the component cost, another cost to consider is the
complexity of placement and routing on a board. Ultimately, the designer is responsible for
making sure the selected termination scheme properly addresses the overall system
requirements. This involves simulation of any proposed termination scheme and subsequent
evaluation of the final result.
The termination scheme in use on the Spartan-3A Starter Kit is a good compromise that yields
respectable performance while reducing the component count and board complexity. It is
suitable for a point-to-point connection between devices where the signal length is low and the
signal loading is light:
• SSTL18_I is selected for the Spartan-3A FPGA SelectIO mode. The memory device uses
full strength drivers with on-die termination disabled.
• For single ended, unidirectional and bidirectional signals: One 50Ω termination to VTT, in
the middle of the trace, which is roughly 1 inch from both devices.
• For differential, unidirectional and bidirectional signals: One 50Ω termination to VTT on
each signal in the pair, in the middle of the trace, which is roughly 1 inch from both
devices. Differential signals are effectively treated as two single-ended signals.
The Revision A prototype of the Spartan-3A Starter Kit had additional terminator component
footprints to enable the designer to experiment with the termination scheme. Several
termination schemes were initially validated through simulation using IBIS models. Subsequent
experimentation with the prototype confirmed that the less expensive termination scheme
described above produced satisfactory results.

Component Placement and Routing


Placement of relevant devices on the Spartan-3A Starter Kit is shown in Figure 5. The pinout
for the memory interface in the Spartan-3A FPGA is located in I/O Bank 3, which is on the left
side of IC1. This relative placement is easy to route, given the pinout generated by MIG.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 14


R

Verification Platform

X-Ref Target - Figure 5

X458_05_090507

Figure 5: Component Placement

Some space is needed to accommodate serpentine traces for delay matching of the routes.
However, what is shown in Figure 5 is far from an aggressive placement and could be
optimized. The Revision A prototype of the Spartan-3A Starter Kit had additional terminator
component footprints that were removed after evaluating the signal integrity on the prototypes,
freeing up considerable space. However, the placement of the memory device and the Spartan-
3A FPGA was not changed. In theory, this placement could be compressed, with further space
savings achieved using resistor packs, smaller resistor packages, or staggering the placement
of the termination devices to reduce average trace length.
Figure 6 and Figure 7 show the signal routing. The data and strobe signals form a source
synchronous bus (governed by the read and write timing budgets) and are routed with an
average length of 2.5 inches and less than 50 ps of skew. The address and control signals form
another bus and are routed similarly.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 15


R

Verification Platform

X-Ref Target - Figure 6

X458_06_090507

Figure 6: Top/Mid Layer Routing


X-Ref Target - Figure 7

X458_07_090507

Figure 7: Bottom Layer Routing

The MIG-based memory interface in the Spartan-3A FPGA also requires a properly tuned
loopback signal. This signal is used by the memory interface to generate an enable for the input
capture scheme. The trace delay of the loop should be the sum of the trace delays of the clock
forwarded to the memory and the average DQS trace delay. For more information on this topic,
refer to UG086, Memory Interface Solutions User Guide.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 16


R

Verification Design and Process

Verification The verification design integrates the DDR2-400 memory interface with an application that
Design and implements a frame buffer. In addition to the frame buffer, additional functions exist to facilitate
the verification process, including data error checking and power supply control. The
Process verification process involves evaluating the behavior of multiple units of the verification
platform, programmed with the verification design, across process, voltage, and temperature
variations.

Functional Description
A frame buffer is an output device that drives a video display. An integral part of any frame
buffer is a memory. The memory must have enough capacity to contain at least one entire
display frame worth of data and must have enough bandwidth to provide data at or above the
rate required by the display. In this implementation, the frame buffer is designed to drive a
standard UXGA-capable display with a 1600 pixel by 1200 pixel frame at a frame rate of 75 Hz
(1600x1200@75). Each pixel is represented by 12 bits of information, 4 bits for each color
channel, providing 4096 possible colors.
This frame buffer configuration is selected to facilitate data movement between the memory
interface and the display generator. The design requires a system clock of 200 MHz for the
DDR2-400 memory interface, and it is desirable (for simplicity) to clock the display generator
with the same clock to form a fully synchronous system. Standard 1600x1200@75 display
timing uses a 202.5 MHz clock (which is close to 200 MHz) and, with a minor change to the
horizontal timing, a 200 MHz clock can be used.
Another simplification from the frame buffer configuration is the elimination of pixel packing and
color indexing. The Spartan-3A Starter Kit has a provision for driving a VGA output with 12-bit
color, but the memory device has a 16-bit interface with byte enables. At the cost of storage
efficiency, pixels are considered as 16-bit quantities with a 12-bit field that represents color
information and a 4-bit field that is not displayed. More efficient systems are possible when
pixels are packed together, avoiding unused bits. It is also possible to further reduce storage
and bandwidth requirements by using color indexing. In the case of the verification design,
however, the goal is to consume as much bandwidth as possible! Figure 8 shows a simplified
block diagram.
X-Ref Target - Figure 8

PicoBlaze
I/O Bus

Instruction PicoBlaze VTC and HWC Video Timing


ROM Processor Control Regs Controller

LCD and LED CPU/Frame Access FSM


Control Regs Access Regs and Logic

User I/O Parallel Flash Memory Buffers and


Processor Control Regs Interface Output Generator

LCD and LED Mechanical Parallel Flash DDR2-400 VGA Output


Displays Switches Device Memory Connector
X458_08_090807

Figure 8: Simplified Block Diagram of the Verification Design

Besides the DDR2-400 memory interface, the other significant block is the Memory Access
FSM which sits between the PicoBlaze™ processor, the line buffers, and the DDR2-400

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 17


R

Verification Design and Process

memory interface. The FSM monitors the PicoBlaze processor ports and the video timing
controller to determine what type of memory access should be requested from the memory
interface. The PicoBlaze processor can move data into the frame buffer as memory writes. The
video timing controller can move data out of the frame buffer and into a selected line buffer as
memory reads.
A very simple arbitration scheme is used: writes initiated by the PicoBlaze processor always
have priority over the display fetch. For this reason, writes by the PicoBlaze processor should
only be performed before the display fetch is enabled or during periods when no display fetch
is taking place, such as horizontal and vertical blanking periods. Failure to observe this does
not corrupt the data stored in the frame buffer, but it causes transient corruption of pixels on the
display. The PicoBlaze processor receives a vertical blanking interrupt to facilitate
synchronization with the display fetch process.
The display fetch process consists of 1200 line fetches, with each line fetch taking place one
line earlier than used on the display. This prefetching increases the buffering requirements
compared to a flow-through implementation, but eliminates the concern about the frequency
and duration of periodic memory refresh cycles, as long as the average available bandwidth
during a line fetch is adequate.
Each line fetch reads 1600 words of data, and the process of reading those 1600 words can
involve one, two, or three burst reads from the memory interface. The number of burst reads
depends on the initial read address relationship to memory row boundaries and the number of
words to be fetched. The data is placed in a line buffer.
The display generation circuits obtain data from a line buffer, optionally scale the intensity, and
overlay six programmable hardware cursors. The intensity scale and hardware cursors are not
directly relevant to the operation of the DDR2-400 interface, but serve to demonstrate that the
video data can be digitally processed in real time. Figure 9 shows a photograph of the system
operation.
X-Ref Target - Figure 9

X458_09_090807

Figure 9: Verification Design Operation

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 18


R

Verification Design and Process

Error Checking via Frame CRC


During initialization, the data for display is read from the parallel Flash as bytes and written to
the frame buffer as words. One complete frame (approximately four megabytes) is copied in
this manner. During the copy, a CRC-32 is computed across each “nibble lane” of the words to
be written, resulting in four CRC-32 values. The computation is done in hardware. The “nibble
lanes” correspond to the three color channels (red, green, and blue) plus an additional channel
that is not displayed but contains a pattern to ensure all 16 bits of the memory interface are
exercised. This is illustrated in Figure 10.
X-Ref Target - Figure 10

CRC-32 CRC-32
Checksum Checksum

CRC-32 CRC-32
Checksum Checksum
X458_10_091207

Figure 10: Frame CRC-32 Calculation on Color Channels

After initialization has completed, the display fetch process begins. Prior to the start of a new
frame, another set of four CRC-32 values are initialized. As the frame is displayed, CRC-32
values are computed in real time on the data using the hardware CRC circuits. Here, running
four parallel CRC-32 circuits with 4-bit data inputs is an advantage over a single CRC-32 circuit
with a 16-bit input for performance reasons because the calculations need to take place at the
pixel clock rate. The computation is identical to the one used during the initialization process.
At the end of each frame, after the CRC-32 values are compared, the green LD3, LD2, LD1,
and LD0 indicators are updated to indicate CRC-32 comparison errors for each of the four
channels. These indicators help diagnose which data channels (nibbles) have errors, and also
provide a qualitative indication of how many errors are occurring based on the perceived
intensity of the indicator illumination. If no errors take place, the indicators are off. If errors take
place occasionally, the indicators flicker, transitioning to a steady glow as the error rate
increases.
At the first detection of any error, the yellow LD13 indicator is illuminated until the design is
reset or the power is cycled. This serves as a pass/fail indicator used to quickly assess the
cumulative test results at a given point in time. As long as LD13 is off, no errors have been
detected during operation.

Power Supply Control via the RS-232 Interface


The verification design enables interactive control of the output voltage levels of the power
supplies on the Spartan-3A Starter Kit via the RS-232 interface. For more information on the
power supply design of the Spartan-3A Starter Kit, refer to Chapter 17 of UG334, Spartan-3A
Starter Kit Board User Guide. This interactive control is of particular use when performing
verification at different supply voltages.
Commands to change the output voltage levels are entered into an ASCII terminal, received by
the verification design, and converted into I2C accesses to the power supply devices. Each

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 19


R

Verification Design and Process

device has four independently adjustable regulators: two linear regulators and two switching
regulators. All regulators are ±3%, with adjustment steps of 0.05V for the switching regulators
and 0.1V for the linear regulators. There are eight power rails, five of which are relevant to the
operation of the memory interface:
• VCCAUX, nominally +3.3V, powers certain auxiliary circuits in Spartan-3A FPGAs,
including digital clock managers (DCMs), which are used in the memory interface.
• VCCINT, nominally +1.2V, powers internal logic in Spartan-3A FPGAs, including
configurable logic blocks (CLBs), which implement the bulk of the memory interface
including the lookup table (LUT) based delay lines used in the input capture scheme.
• VREF1V8, nominally +1.8V, powers the Spartan-3A FPGA voltage reference for the 1.8V
signaling interface with the memory device.
• DDR1V8, nominally +1.8V, powers the memory device, as well as the Spartan-3A FPGA
pins that form the interface with the memory device.
• DDR0V9, nominally +0.9V, powers the termination network for the signals between the
Spartan-3A FPGA and the memory device.
The design also accepts a set of abbreviated single-keystroke commands as “shortcuts” for
desired voltage conditions used in the verification plan.

Reference Clock Generation


The reference clock must be a 200 MHz clock signal with 150 ps of cycle-to-cycle jitter or less.
Because there is no reasonably priced option to directly apply a 200 MHz clock from an
external source, the on-board 50 MHz oscillator is used with a DCM inside the Spartan-3A
FPGA to multiply the 50 MHz signal up to 200 MHz. This setup introduces additional jitter into
the 200 MHz reference clock which is undesirable and violates the Clock-to-Memory timing
budget. This approach must not be used for production designs.
For verification purposes, this approach is acceptable provided that the verification process
yields a passing result. It simply means that more design margin exists than can be proven with
the verification platform.

Verification Plan
The verification process involves evaluating the behavior of multiple units of the verification
platform, programmed with the verification design, across process, voltage, and temperature
variations. Confirming proper operation of the design in this three-dimensional space,
particularly in regions where the devices exhibit worst-case (slowest) performance, verifies the
conclusion drawn from the timing budgets. Consider the following variables:
• Silicon Process: The Spartan-3A Starter Kits are reworked to replace the existing lower
speed grade (-4) devices with the higher speed grade (-5) devices screened to specific
performance levels. One set of units is built with slow devices from the higher speed
grade, and another set of units is built with typical devices from the higher speed grade.
The memory devices are considered “sample tested”.
• Supply Voltage: The component data sheets indicate the allowed supply voltage range for
each component, and the programmable power supplies allow easy output voltage
adjustments to enable exploration of the functional ranges for the verification design. Due
to the accuracy of the power supply devices and the resolution of the output voltage steps,
the usable settings within the allowed supply voltage range are limited.
• Temperature: The component data sheets also indicate the allowed temperature range for
each component. The Spartan-3A FPGAs with the higher speed grade (-5) are available
only in the commercial temperature grade, with allowed junction temperatures between
85°C and 0°C. During the verification process, a Thermonics T-2420 temperature forcing
machine is used to evaluate operation at 85°C, 25°C, and 0°C.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 20


R

Design Files

The process for each unit consists of forcing the temperature to each of the desired values and
allowing a three minute “soak time” to ensure the junction temperature is as close as possible
to the case temperature. Once the temperature is stable, the error indicator is monitored while
the supply voltages are set at different levels within the allowed voltage ranges.

Results
The design is robust across process variations and the commercial temperature range,
provided the power supplies are regulated within the ranges shown in Table 8. The results in
Table 8 have been “cleaned up” to ease understanding and incorporate additional margin.

Table 8: Power Supply Specifications


Nominal Tolerance
VCCAUX 3.300V ±5%
VCCINT 1.200V ±3%
VREF1V8 1.800V ±3%
DDR1V8 1.800V ±3%
DDR0V9 0.900V ±3%

The programmable power supplies allow adjustment, but they have fairly coarse steps. The
tolerances in Table 8 reflect available output voltage settings that exhibit robust operation, are
within the device data sheet specifications, and are achievable with the programmable power
supply solution.
For example, VCCINT, the Spartan-3A FPGA supply, is specified at 1.2V±5% in DS529,
Spartan-3A FPGA Family: Complete Data Sheet. The programmable power supply was
capable of generating a nominal VCCINT voltage with the next steps at +3.5% and -4.2%. Larger
steps from nominal are outside the operational range of VCCINT for the Spartan-3A FPGA and
were not tested. The reported result is simplified to ±3% in Table 8.
While these power supply specifications are moderately tighter than otherwise required by the
individual component data sheets, they are not difficult to meet. Many commercially available,
reasonably priced power solutions exist, including the power solution designed into the
Spartan-3A Starter Kit.

Design Files The design files for this application note, which are available in Verilog-HDL only, are located on
the Xilinx website at https://secure.xilinx.com/webreg/clickthrough.do?cid=91539.
This design is compatible with the Spartan-3A Starter Kit and derivative kits (HW-SPAR3A-SK-
UNI-G, HW-SPAR3AN-SK-UNI-G, and HW-SPAR3ADDR2-DK-UNI-G) but requires the
replacement of the existing lower speed grade (-4) device with the higher speed grade (-5)
device. Xilinx does not provide this service.
The design can be downloaded into a standard Spartan-3A Starter Kit with the lower speed
grade (-4) device for evaluation purposes only. The design may operate correctly, operate with
data errors, or not operate at all. Xilinx does not guarantee the operation of this design in lower
speed grade (-4) devices.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 21


R

Conclusion

Conclusion System designers may take advantage of Spartan-3A, Spartan-3AN, or Spartan-3A DSP
FPGAs with the higher speed grade (-5) to incorporate robust, low-cost, and high-performance
DDR2-400 memory interfaces in applications operating over the commercial temperature
range. A successful implementation for a point-to-point connection requires the use of
moderately tighter voltage regulation coupled with a DDR2-400 capable memory device with
CL = 3, such as a Micron Technology MT47H32M16BN-3:D. The increased performance of
DDR2-400 memory interfaces is a compelling advantage and can be used to enable new
applications or optimize and enhance existing applications.

References The following documents and links provide additional information useful to this application note:
• DS529, Spartan-3A FPGA Family: Complete Data Sheet
http://www.xilinx.com/support/documentation/data_sheets/ds529.pdf
• MT47H32M16 (32M x 16) DDR2 SDRAM Data Sheet
http://download.micron.com/pdf/datasheets/dram/ddr2/512MbDDR2.pdf
• UG086, Xilinx Memory Interface Generator User Guide
http://www.xilinx.com/support/documentation/ip_documentation/ug086.pdf
• UG334, Spartan-3A Starter Kit Board User Guide
http://www.xilinx.com/support/documentation/boards_and_kits/ug334.pdf
• Spartan-3A Starter Kit Board Schematics
http://www.xilinx.com/support/documentation/boards_and_kits/s3astarter_schematic.pdf
• Spartan-3A Starter Kit Board Photoplots
http://www.xilinx.com/support/documentation/boards_and_kits/s3a_starter_gerbers.pdf
• XAPP454, DDR2 SDRAM Memory Interface for Spartan-3 FPGAs
http://www.xilinx.com/support/documentation/application_notes/xapp454.pdf
• XAPP768c, Interfacing Spartan–3 Devices With 166 MHz or 333 Mb/s DDR SDRAM
Memories
http://www.xilinx.com/support/software/memory/protected/XAPP768c.pdf
• Xilinx ISE 9.2i Service Pack 1, Spartan-3A speed file “PRODUCTION 1.37 2007-06-02”
http://www.xilinx.com/tools/designtools.htm

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 22


R

Revision History

Revision The following table shows the revision history for this document:
History
Date Version Description of Revisions
09/19/07 1.0 Initial Xilinx release.
07/09/09 1.0.1 Updated URLs and trademark usage throughout.

Notice of Xilinx is disclosing this Application Note to you “AS-IS” with no warranty of any kind. This Application Note
is one possible implementation of this feature, application, or standard, and is subject to change without
Disclaimer further notice from Xilinx. You are responsible for obtaining any rights you may require in connection with
your use or implementation of this Application Note. XILINX MAKES NO REPRESENTATIONS OR
WARRANTIES, WHETHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING,
WITHOUT LIMITATION, IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR
FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL XILINX BE LIABLE FOR ANY LOSS OF
DATA, LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR INDIRECT
DAMAGES ARISING FROM YOUR USE OF THIS APPLICATION NOTE.

XAPP458 (v1.0.1) July 9, 2009 www.xilinx.com 23

You might also like