Altera Stratix
Altera Stratix
S10-OVERVIEW
2016.10.31
Subscribe
Send Feedback
Contents
Contents
Featuring several groundbreaking innovations, including the all new HyperFlex™ core
architecture, this device family enables you to meet the demand for ever-increasing
bandwidth and processing performance in your most advanced applications, while
meeting your power budget.
With an embedded hard processor system (HPS) based on a quad-core 64-bit ARM®
Cortex®-A53, the Stratix 10 SoC devices deliver power efficient, application-class
processing and allow designers to extend hardware virtualization into the FPGA fabric.
Stratix 10 SoC devices demonstrate Intel's commitment to high-performance SoCs
and extend Intel's leadership in programmable devices featuring an ARM-based
processor system.
© 2016 Intel Corporation. All rights reserved. Intel, the Intel logo, Altera, Arria, Cyclone, Enpirion, MAX,
Megacore, NIOS, Quartus and Stratix words and logos are trademarks of Intel Corporation in the US and/or
other countries. Other marks and brands may be claimed as the property of others. Intel warrants performance ISO
of its FPGA and semiconductor products to current specifications in accordance with Intel's standard warranty, 9001:2008
but reserves the right to make changes to any products and services at any time without notice. Intel assumes Registered
no responsibility or liability arising out of the application or use of any information, product, or service
described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the
latest version of device specifications before relying on any published information and before placing orders for
products or services.
1 Stratix 10 GX/SX Device Overview
With these capabilities, Stratix 10 FPGAs and SoCs are ideally suited for the most
demanding applications in diverse markets such as:
• Compute and Storage—for custom servers, cloud computing and data center
acceleration
• Networking—for Terabit, 400G and multi-100G bridging, aggregation, packet
processing and traffic management
• Optical Transport Networks—for OTU4, 2xOTU4, 4xOTU4
• Broadcast—for high-end studio distribution, headend encoding/decoding, edge
quadrature amplitude modulation (QAM)
• Military—for radar, electronic warfare, and secure communications
• Medical—for diagnostic scanners and diagnostic imaging
• Test and Measurement—for protocol and application testers
• Wireless—for next-generation 5G networks
• ASIC Prototyping—for designs that require the largest monolithic FPGA fabric
with the highest I/O count
To clock these building blocks, Stratix 10 devices use programmable clock tree
synthesis, which uses dedicated clock tree routing to synthesize only those branches
of the clock trees required for the application. All devices support in-system, fine-
grained partial reconfiguration of the logic array, allowing logic to be added and
subtracted from the system while it is operating.
All family variants also contain high speed serial transceivers, containing both the
physical medium attachment (PMA) and the physical coding sublayer (PCS), which can
be used to implement a variety of industry standard and proprietary protocols. In
addition to the hard PCS, Stratix 10 devices contain multiple instantiations of PCI
Express hard IP that supports Gen1/Gen2/Gen3 rates in x1/x2/x4/x8/x16 lane
configurations, and hard 10GBASE-KR/40GBASE-KR4 FEC for every transceiver. The
hard PCS, FEC, and PCI Express IP free up valuable core logic resources, save power,
and increase your productivity.
Package Type
F : FineLine BGA (FBGA), 1.0 mm pitch
Note:
1. Contact Intel for availability
Floating point DSP capability Up to 1 TFLOP, requires soft Up to 10 TFLOPS, hard IEEE 754
floating point adder and compliant single precision floating
multiplier point adder and multiplier
Maximum transceivers 66 96
Maximum transceiver data rate (chip-to- 28.05 Gbps 17.4 Gbps L-Tile
chip) 28.3 Gbps H-Tile
Maximum transceiver data rate (backplane) 12.5 Gbps 12.5 Gbps L-Tile
28.3 Gbps H-Tile
Hard protocol IP PCIe Gen3 x8 (up to 4 PCIe Gen3 x16 (up to 4 instances)
instances) SR-IOV (4 physical functions / 2k
virtual functions) on H-Tile devices
10GBASE-KR/40GBASE-KR4 FEC
Core clocking and PLLs Global, quadrant and regional Programmable clock tree synthesis
clocks supported by fractional- supported by fractional synthesis
synthesis fPLLs fPLLs and integer IO PLLs
Register state readback and writeback Not available Non-destructive register state
readback and writeback for ASIC
prototyping and other applications
Feature Description
Embedded hard IP • PCIe Gen1/Gen2/Gen3 complete protocol stack, x1/x2/x4/x8/x16 end point and root
port
• DDR4/DDR3/LPDDR3 hard memory controller (RLDRAM3/QDR II+/QDR IV using soft
memory controller)
• Multiple hard IP instantiations in each device
• Single Root I/O Virtualization (SR-IOV)
High performance monolithic • HyperFlex core architecture with Hyper-Registers throughout the interconnect routing
core fabric and at the inputs of all functional blocks
• Monolithic fabric minimizes compile times and increases logic utilization
• Enhanced adaptive logic module (ALM)
• Improved multi-track routing architecture reduces congestion and improves compile
times
• Hierarchical core clocking architecture with programmable clock tree synthesis
• Fine-grained partial reconfiguration
Variable precision DSP • IEEE 754-compliant hard single-precision floating point capability
blocks • Supports signal processing with precision ranging from 18x19 up to 54x54
• Native 27x27 and 18x19 multiply modes
• 64-bit accumulator and cascade for systolic FIRs
• Internal coefficient memory banks
• Pre-adder/subtractor improves efficiency
• Additional pipeline register increases performance and reduces power
Phase locked loops (PLL) • Fractional synthesis PLLs (fPLL) support both fractional and integer modes
• Fractional mode with third-order delta-sigma modulation
• Precision frequency synthesis
• Integer PLLs adjacent to general purpose I/Os, support external memory, and LVDS
interfaces, clock delay compensation, zero delay buffering
continued...
Feature Description
Software and tools • Quartus Prime Pro Edition design suite with new Spectra-Q engine and Hyper-Aware
design flow
• Fast Forward compiler to allow HyperFlex architecture performance exploration
• Transceiver toolkit
• Qsys system integration tool
• DSP Builder advanced blockset
• OpenCL™ support
• SoC Embedded Design Suite (EDS)
Hard Processor Multi-processor unit (MPU) core • Quad-core ARM Cortex-A53 MPCore processor with ARM
System CoreSight debug and trace technology
• Scalar floating-point unit supporting single and double
precision
• ARM NEON media processing engine for each processor
Ethernet media access controller • Three 10/100/1000 EMAC with integrated DMA
(EMAC)
SD/SDIO/MMC controller • 1 eMMC version 4.5 with DMA and CE-ATA support
• SD, including eSD, version 3.0
• SDIO, including eSDIO, verion 3.0
• CE-ATA - version 1.1
External External Memory Interface • Hard Memory Controller with DDR4 and DDR3, and
Memory LPDDR3
Interface
Package Substrate
PCIe Gen3 Hard IP
Transceiver Tile
(24 Channels)
(24 Channels)
EMIB
EMIB
Variable-Precision, Hard Floating-Point DSP Blocks
Transceiver Tile
(24 Channels)
(24 Channels)
EMIB
EMIB
PCIe Gen3 Hard IP
Transceiver Tile
(24 Channels)
(24 Channels)
EMIB
EMIB
SDM
Table 4. Stratix 10 GX/SX FPGA and SoC Family Plan—FPGA Core (part 1)
Stratix 10 Logic Ele- eSRAM eSRAM M20K M20K MLAB MLAB 18x19
GX/SX ments (KLE) Blocks Mbits Blocks Mbits Counts Mbits Multi-
Device pliers 1
Name
Table 5. Stratix 10 GX/SX FPGA and SoC Family Plan—Interconnects, PLLs and Hard
IP (part 2)
Stratix 10 GX/SX Interconnects PLLs Hard IP
Device Name
Maximum GPIOs Maximum XCVR fPLLs I/O PLLs PCIe Hard IP
Blocks
GX 400/
392 24 8 8 1
SX 400
GX 650/
400 48 16 8 2
SX 650
GX 850/
736 48 16 15 2
SX 850
GX 1100/
736 48 16 15 2
SX 1100
continued...
GX 1650/
704 96 32 14 4
SX 1650
GX 2100/
704 96 32 14 4
SX 2100
GX 2500/
1160 96 32 24 4
SX 2500
GX 2800/
1160 96 32 24 4
SX 2800
GX 4500/
1640 24 8 34 1
SX 4500
GX 5500/
1640 24 8 34 1
SX 5500
Table 6. Stratix 10 GX/SX FPGA and SoC Family Package Plan, part 1
Cell legend: General Purpose I/Os, High-Voltage I/Os, LVDS Pairs, Transceivers 2 3 4 5 6 7
4 Each LVDS pair can be configured as either a differential input or a differential output.
5 High-Voltage I/O pins and LVDS pairs are included in the General Purpose I/O count.
Transceivers are counted separately.
6 Each package column offers pin migration (common circuit board footprint) for all devices in the
column.
7 Stratix 10 GX devices are pin migratable with Stratix 10 SX devices in the same package.
SX 2100
GX 4500/
SX 4500
GX 5500/
SX 5500
Table 7. Stratix 10 GX/SX FPGA and SoC Family Package Plan, part 2
Cell legend: General Purpose I/Os, High-Voltage I/Os, LVDS Pairs, Transceivers 2 3 4 5 6 7
GX 400/
SX 400
GX 650/
SX 650
In addition to the traditional user registers found in the Adaptive Logic Modules (ALM),
the HyperFlex core architecture introduces additional bypassable registers everywhere
throughout the fabric of the FPGA. These additional registers, called Hyper-Registers
are available on every interconnect routing segment and at the inputs of all functional
blocks.
Interconnect Interconnect
The Hyper-Registers enable the following key design techniques to achieve the 2X core
performance increases:
• Fine grain Hyper-Retiming to eliminate critical paths
• Zero latency Hyper-Pipelining to eliminate routing delays
• Flexible Hyper-Optimization for best-in-class performance
Package Substrate
Transceiver Tile Transceiver Tile
EMIB
EMIB
EMIB
EMIB
Bank
Transceiver PLLs, RX, and TX CLocks
Bank
PCIe Gen3 Hard IP
Transceiver Tile
(24 Channels)
EMIB
Bank
PCIe Gen3 Hard IP
Transceiver Tile
(24 Channels)
Bank
EMIB
All transceiver channels feature a dedicated Physical Medium Attachment (PMA) and a
hardened Physical Coding Sublayer (PCS).
• The PMA provides primary interfacing capabilities to physical channels.
• The PCS typically handles encoding/decoding, word alignment, and other pre-
processing functions before transferring data to the FPGA core fabric.
Within each transceiver tile, the transceivers are arranged in four banks of six PMA-
PCS groups. A wide variety of bonded and non-bonded data rate configurations are
possible within each bank, and within each tile, using a highly configurable clock
distribution network.
Stratix 10 device features provide exceptional signal integrity at data rates up to 28.3
Gbps. Clocking options include ultra-low jitter LC tank-based (ATX) PLLs with optional
fractional synthesis capability, channel PLLs operating as clock multiplier units (CMUs),
and fractional synthesis PLLs (fPLLs).
• ATX PLL—can be configured in integer mode, or optionally, in a new fractional
synthesis mode. Each ATX PLL spans the full frequency range of the supported
data rate range providing a stable, flexible clock source with the lowest jitter.
• CMU PLL—when not being used as a transceiver, select PMA channels can be
configured as channel PLLs operating as CMUs to provide an additional master
clock source within the transceiver bank.
• fPLL—In addition, dedicated fPLLs are available with precision frequency synthesis
capabilities. fPLLs can be used to synthesize multiple clock frequencies from a
single reference clock source and replace multiple reference oscillators for multi-
protocol and multi-rate applications.
On the receiver side, each PMA has an independent channel PLL that allows analog
tracking for clock-data recovery. Each PMA also has advanced equalization circuits that
compensate for transmission losses across a wide frequency spectrum.
• Variable Gain Amplifier (VGA)—to optimize the receiver's dynamic range
• Continuous Time Linear Equalizer (CTLE)—to compensate for channel losses
with lowest power dissipation
• Decision Feedback Equalizer (DFE)—to provide additional equalization
capability on backplanes even in the presence of crosstalk and reflections
• On-Die Instrumentation (ODI)—to provide on-chip eye monitoring capabilities
(EyeQ). This capability helps to optimize link equalization parameters during board
bring-up and supports in-system link diagnostics and equalization margin testing
Deserializer
CTLE
VGA ∑ CDR
DFE EyeQ
All link equalization parameters feature automatic adaptation using the new Advanced
Digital Adaptive Parametric Tuning (ADAPT) circuit. This circuit is used to dynamically
set DFE tap weights, adjust CTLE parameters, and optimize VGA gain and threshold
voltage. Finally, optimal and consistent signal integrity is ensured by using the new
hardened Precision Signal Integrity Calibration Engine (PreSICE) to automatically
calibrate all transceiver circuit blocks on power-up. This gives the most link margin
and ensures robust, reliable, and error-free operation.
Backplane Support Drive backplanes at data rates up to 28.3 Gbps, including 10GBASE-KR compliance
Cable Driving Support SFP+ Direct Attach, PCI Express over cable, eSATA
Transmit Pre-Emphasis 5-tap transmit pre-emphasis and de-emphasis to compensate for system channel loss
Continuous Time Linear Dual mode, high-gain, and high-data rate, linear receive equalization to compensate for
Equalizer (CTLE) system channel loss
Decision Feedback Equalizer 15 fixed tap DFE to equalize backplane channel loss in the presence of crosstalk and noisy
(DFE) environments
Advanced Digital Adaptive Fully digital adaptation engine to automatically adjust all link equalization parameters—
Parametric Tuning (ADAPT) including CTLE, DFE, and VGA blocks—that provide optimal link margin without intervention
from user logic
Precision Signal Integrity Hardened calibration controller to quickly calibrate all transceiver control parameters on
Calibration Engine (PreSICE) power-up, which provides the optimal signal integrity and jitter performance
ATX Transmit PLLs Low jitter ATX (inductor-capacitor) transmit PLLs with continuous tuning range to cover a
wide range of standard and proprietary protocols, with optional fractional frequency
synthesis capability
Fractional PLLs On-chip fractional frequency synthesizers to replace on-board crystal oscillators and reduce
system cost
Digitally Assisted Analog Superior jitter tolerance with fast lock time
CDR
On-Die Instrumentation— Simplify board bring-up, debug, and diagnostics with non-intrusive, high-resolution eye
EyeQ and Jitter Margin Tool monitoring (EyeQ). Also inject jitter from transmitter to test link margin in system.
Dynamic Reconfiguration Allows for independent control of each transceiver channel Avalon memory-mapped
interface for the most transceiver flexibility.
Multiple PCS-PMA and PCS- 8-, 10-, 16-, 20-, 32-, 40-, or 64-bit interface widths for flexibility of deserialization width,
Core to FPGA fabric interface encoding, and reduced latency
widths
The PCS contains multiple gearbox implementations to decouple the PMA and PCS
interface widths. This feature provides the flexibility to implement a wide range of
applications with 8, 10, 16, 20, 32, 40, or 64-bit interface width between each
transceiver and the core logic.
8 Stratix 10 transceivers can support data rates below 1 Gbps with over sampling.
The PCS also contains hard IP to support a variety of standard and proprietary
protocols across a wide range of data rates and encoding schemes. The Standard PCS
mode provides support for 8B/10B encoded applications up to 12.5 Gbps. The
Enhanced PCS mode supports 64B/66B and 64B/67B encoded applications up to 17.4
Gbps. The enhanced PCS mode also includes an integrated 10GBASE-KR/40GBASE-
KR4 Forward Error Correction (FEC) circuit. For highly customized implementations, a
PCS Direct mode provides an interface up to 64 bits wide to allow for custom encoding
and support for data rates up to 28.3 Gbps.
For more information about the PCS-Core interface or the double rate transfer mode,
refer to the Stratix 10 Transceiver PHY User Guide.
Standard PCS 1 to 12.5 Phase compensation FIFO, byte Rate match FIFO, word-aligner, 8B/10B
serializer, 8B/10B encoder, bit-slipper, decoder, byte deserializer, byte
channel bonding ordering
PCI Express 2.5 and 5.0 Same as Standard PCS plus PIPE 2.0 Same as Standard PCS plus PIPE 2.0
Gen1/Gen2 x1, interface to core interface to core
x2, x4, x8, x16
PCI Express Gen3 8.0 Phase compensation FIFO, byte Rate match FIFO (0-600 ppm mode),
x1, x2, x4, x8, serializer, encoder, scrambler, bit- word-aligner, decoder, descrambler,
x16 slipper, gear box, channel bonding, and phase compensation FIFO, block sync,
PIPE 3.0 interface to core, auto speed byte deserializer, byte ordering, PIPE
negotiation 3.0 interface to core, auto speed
negotiation
CPRI 0.6144 to 9.8 Same as Standard PCS plus Same as Standard PCS plus
deterministic latency serialization deterministic latency deserialization
Enhanced PCS 2.5 to 17.4 FIFO, channel bonding, bit-slipper, and FIFO, block sync, bit-slipper, and gear
gear box box
10GBASE-R 10.3125 FIFO, 64B/66B encoder, scrambler, FIFO, 64B/66B decoder, descrambler,
FEC, and gear box block sync, FEC, and gear box
Interlaken 4.9 to 17.4 FIFO, channel bonding, frame FIFO, CRC-32 checker, frame sync,
generator, CRC-32 generator, descrambler, disparity checker, block
scrambler, disparity generator, bit- sync, and gear box
slipper, and gear box
SFI-S/SFI-5.2 11.3 FIFO, channel bonding, bit-slipper, and FIFO, bit-slipper, and gear box
gear box
IEEE 1588 1.25 to 10.3125 FIFO (fixed latency), 64B/66B encoder, FIFO (fixed latency), 64B/66B decoder,
scrambler, and gear box descrambler, block sync, and gear box
SDI up to 12.5 FIFO and gear box FIFO, bit-slipper, and gear box
GigE 1.25 Same as Standard PCS plus GigE state Same as Standard PCS plus GigE state
machine machine
The PCI Express hard IP consists of the PHY, Data Link, and Transaction layers. It also
supports PCI Express Gen1/Gen2/Gen3 end point and root port, in x1/x2/x4/x8/x16
lane configurations. The PCI Express hard IP is capable of operating independently
from the core logic (autonomous mode). This feature allows the PCI Express link to
power up and complete link training in less than 100 ms, while the rest of the device
is still in the process of being configured. The hard IP also provides added
functionality, which makes it easier to support emerging features such as Single Root
I/O Virtualization (SR-IOV) and optional protocol extensions.
Note: The x16 lane configuration is not available on all transceiver tile types.
The PCI Express hard IP has improved end-to-end data path protection using Error
Checking and Correction (ECC). In addition, the hard IP supports configuration of the
device via protocol (CvP) across the PCI Express bus at Gen1/Gen2/Gen3 rates.
The Interlaken PCS hard IP is based on the proven functionality of the PCS developed
for Intel’s previous generation FPGAs, which has demonstrated interoperability with
Interlaken ASSP vendors and third-party IP suppliers. The Interlaken PCS hard IP is
present in every transceiver channel in Stratix 10 devices.
This bandwidth is provided along with the ease of design, lower power, and resource
efficiencies of hardened high-performance memory controllers. The external memory
interfaces can be configured up to a maximum width of 144 bits when using either
hard or soft memory controllers.
Stratix 10 FPGA
Core Fabric
User Design
Hard
Memory AXI/Avalon IF
Controller
Memory Controller
PHY Interface
Hard Nios II
Hard PHY
(Callibration/Control)
I/O Interface
Each I/O bank contains 48 general purpose I/Os and a high-efficiency hard memory
controller capable of supporting many different memory types, each with different
performance capabilities. The hard memory controller is also capable of being
bypassed and replaced by a soft controller implemented in the user logic. The I/Os
each have a hardened double data rate (DDR) read/write path (PHY) capable of
performing key memory interface functionality such as:
• Read/write leveling
• FIFO buffering to lower latency and improve margin
• Timing calibration
• On-chip termination
Stratix 10 devices also feature general purpose I/Os capable of supporting a wide
range of single-ended and differential I/O interfaces. LVDS rates up to 1.6 Gbps are
supported, with each pair of pins having both a differential driver and a differential
input buffer. This enables configurable direction for each LVDS pair.
The ALM block diagram shown in the following figure has eight inputs with a
fracturable look-up table (LUT), two dedicated embedded adders, and four dedicated
registers.
1
Reg
2 Full
Adder
3
Reg
4 Adaptive
LUT
5
6 Reg
Full
7 Adder
8
Reg
The Quartus Prime software leverages the ALM logic structure to deliver the highest
performance, optimal logic utilization, and lowest compile times. The Quartus Prime
software simplifies design reuse as it automatically maps legacy designs into the
Stratix 10 ALM architecture.
This technique uses dedicated clock tree routing and switching circuits, and allows the
Quartus Prime software to create the exact clock trees required for your design. Clock
tree synthesis minimizes clock tree insertion delay, reduces dynamic power dissipation
in the clock tree and allows greater clocking flexibility in the core while still
maintaining backwards compatibility with legacy global and regional clocking schemes.
The core clock network in Stratix 10 devices supports the new HyperFlex core
architecture at clock rates up to 1 GHz. It also supports the hard memory controllers
up to 2666 Mbps with a quarter rate transfer to the core. The core clock network is
supported by dedicated clock input pins, fractional clock synthesis PLLs, and integer
I/O PLLs.
The fPLLs are located in the 3D SiP transceiver tiles, eight per tile, adjacent to the
transceiver channels. The fPLLs can be used to reduce both the number of oscillators
required on the board and the number of clock pins required, by synthesizing multiple
clock frequencies from a single reference clock source. In addition to synthesizing
reference clock frequencies for the transceiver transmit PLLs, the fPLLs can also be
used directly for transmit clocking. Each fPLL can be independently configured for
conventional integer mode, or enhanced fractional synthesis mode with third-order
delta-sigma modulation.
In addition to the fPLLs, Stratix 10 devices contain up to 34 integer I/O PLLs (IOPLLs)
available for general purpose use in the core fabric and for simplifying the design of
external memory interfaces and high-speed LVDS interfaces. The IOPLLs are located in
each bank of 48 general purpose I/O, 1 per I/O bank, adjacent to the hard memory
controllers and LVDS SerDes in each I/O bank. This makes it easier to close timing
because the IOPLLs are tightly coupled with the I/Os that need to use them. The
IOPLLs can be used for general purpose applications in the core such as clock network
delay compensation and zero-delay clock buffering.
The eSRAM blocks are a new innovation in Stratix 10 devices. These large embedded
SRAM blocks are tightly coupled to the core fabric and are directly accessible with no
need for a separate memory controller. Each eSRAM block is arranged as 8 channels,
40 banks per channel, with a total capacity of 45-Mbits running at clock rates up to
750 MHz. Within the eSRAM block, each channel has a bus width of 72 bit read and 72
bit write, and has one READ and one WRITE per channel. This allows each eSRAM
block to support a total aggregate bandwidth (read + write) of up to 864 Gbps.
The eSRAM block is implemented as a simple dual port memory with concurrent read
and write access per channel, and includes integrated hard ECC generation and
checking. Compared to an off-chip SRAM solution, the eSRAM block allows you to
reduce system power and save board space and cost.
The M20K and MLAB blocks are familiar block sizes carried over from previous Intel
device families. The MLAB blocks are ideal for wide and shallow memories, while the
M20K blocks are intended to support larger memory configurations and include hard
ECC. Both M20K and MLAB embedded memory blocks can be configured as a single-
port or dual-port RAM, FIFO, ROM, or shift register. These memory blocks are highly
flexible and support a number of memory configurations as shown in Table 11 on page
25.
The DSP blocks can be configured to support signal processing with precision ranging
from 18x19 up to 54x54. A pipeline register has been added to increase the maximum
operating frequency of the DSP block and reduce power consumption.
18 Multiplier 44
18 x 19
Pipeline Systolic Systolic
+/–
Register Register Register
Registers +
Output
Register
108 Coefficient –
Pipeline
Registers Register
Pipeline Feedback
+/–
Register Register 64
Multiplier
18 44
18 x 19
64
Pipeline
Register
74
Coefficient
Pipeline Register
Input Registers
Registers Output
Multiplier
27 x 27 Register
108
Pre-Adder
Pipeline Feedback
+/–
Register Register 64
64
IEEE-754 Single-Precision
32 Floating-Point Adder
32
Pipeline
Pipeline Register
Register Output
Register
Input Registers
Pipeline Pipeline
96 Register Register
Pipeline
Pipeline Register
Register IEEE-754
Single-Precision
Floating-Point
Multiplier 32
Each DSP block can be independently configured at compile time as either dual 18x19
or a single 27x27 multiply accumulate. With a dedicated 64-bit cascade bus, multiple
variable precision DSP blocks can be cascaded to implement even higher precision
DSP functions efficiently.
In floating point mode, each DSP block provides one single precision floating point
multiplier and adder. Floating point additions, multiplications, mult-adds and mult-
accumulates are supported.
The following table shows how different precisions are accommodated within a DSP
block, or by utilizing multiple blocks.
18x19 bits 1/2 of Variable Precision DSP Block Medium precision fixed point
27x27 bits 1 Variable Precision DSP Block High precision fixed point
19x36 bits 1 Variable Precision DSP Block with external Fixed point FFTs
adder
36x36 bits 2 Variable Precision DSP Blocks with external Very high precision fixed point
adder
54x54 bits 4 Variable Precision DSP Blocks with external Double Precision floating point
adder
Single Precision 1 Single Precision floating point adder, 1 Single Floating point
floating point Precision floating point multiplier
Complex multiplication is very common in DSP algorithms. One of the most popular
applications of complex multipliers is the FFT algorithm. This algorithm has the
characteristic of increasing precision requirements on only one side of the multiplier.
The Variable Precision DSP block supports the FFT algorithm with proportional increase
in DSP resources as the precision grows.
For FFT applications with high dynamic range requirements, the Intel FFT IP Core
offers an option of single precision floating point implementation with resource usage
and performance similar to high precision fixed point implementations.
The Variable Precision DSP block is ideal to support the growing trend towards higher
bit precision in high performance DSP applications. At the same time, it can efficiently
support the many existing 18-bit DSP applications, such as high definition video
processing and remote radio heads. With the Variable Precision DSP block architecture
and hard floating point multipliers and adders, Stratix 10 devices can efficiently
support many different precision levels up to and including floating point
implementations. This flexibility can result in increased system performance, reduced
power consumption, and reduce architecture constraints on system algorithm
designers.
Notes:
1. Integrated direct memory access (DMA)
2. Integrated error correction code (ECC)
3. Multiport front-end interface to hard memory controller
Feature Description
System Memory • Enables a unified memory model and extends hardware virtualization into peripherals
Management Unit implemented in the FPGA fabric
Cache Coherency unit • Changes in shared data stored in cache are propagated throughout the system
providing bi-directional coherency for co-processing elements.
Cache • L1 Cache
— 32 KB of instruction cache w/ parity check
— 32 KB of L1 data cache w /ECC
— Parity checking
• L2 Cache
— 1MB shared
— 8-way set associative
— SEU Protection with parity on TAG ram and ECC on data RAM
— Cache lockdown support
External SDRAM and Flash • Hard memory controller with support for DDR4, DDR3, LPDDR3
Memory Interfaces for HPS — 40-bit (32-bit + 8-bit ECC) with select packages supporting 72-bit (64-bit + 8-bit
ECC)
— Support for up to 2666 Mbps DDR4 and 2166 Mbps DDR3 frequencies
— Error correction code (ECC) support including calculation, error correction, write-
back correction, and error counters
— Software Configurable Priority Scheduling on individual SDRAM bursts
— Fully programmable timing parameter support for all JEDEC-specified timing
parameters
— Multiport front-end (MPFE) scheduler interface to the hard memory controller, which
supports the AXI® Quality of Service (QoS) for interface to the FPGA fabric
• NAND flash controller
— ONFI 1.0
— Integrated descriptor based with DMA
— Programmable hardware ECC support
— Support for 8- and 16-bit Flash devices
• Secure Digital SD/SDIO/MMC controller
— eMMC 4.5
— Integrated descriptor based DMA
— CE-ATA digital commands supported
— 50 MHz operating frequency
• Direct memory access (DMA) controller
— 8-channel
— Supports up to 32 peripheral handshake interface
continued...
Feature Description
Communication Interface • Three 10/100/1000 Ethernet media access controls (MAC) with integrated DMA
Controllers — Supports RGMII and RMII external PHY Interfaces
— Option to support other PHY interfaces through FPGA logic
• GMII
• MII
• RMII (requires MII to RMII adapter)
• RGMII (requires GMII to RGMII adapter)
• SGMII (requires GMII to SGMII adapter)
— Supports IEEE 1588-2002 and IEEE 1588-2008 standards for precision networked
clock synchronization
— Supports IEEE 802.1Q VLAN tag detection for reception frames
— Supports Ethernet AVB standard
• Two USB On-the-Go (OTG) controllers with DMA
— Dual-Role Device (device and host functions)
• High-speed (480 Mbps)
• Full-speed (12 Mbps)
• Low-speed (1.5 Mbps)
• Supports USB 1.1 (full-speed and low-speed)
— Integrated descriptor-based scatter-gather DMA
— Support for external ULPI PHY
— Up to 16 bidirectional endpoints, including control endpoint
— Up to 16 host channels
— Supports generic root hub
— Configurable to OTG 1.3 and OTG 2.0 modes
• Five I2C controllers (three can be used by EMAC for MIO to external PHY)
— Support both 100Kbps and 400Kbps modes
— Support both 7-bit and 10-bit addressing modes
— Support Master and Slave operating mode
• Two UART 16550 compatible
— Programmable baud rate up to 115.2Kbaud
• Four serial peripheral interfaces (SPI) (2 Master, 2 Slaves)
— Full and Half duplex
SmartVoltage ID control over VCC is the standard option for the core power supply; a
code is programmed into each device during manufacturing that allows a smart
voltage regulator to operate the device at lower VCC while maintaining performance.
With the new HyperFlex core architecture, designs can run 2X faster than previous
generation FPGAs. With 2X performance and same required throughput, architects can
cut the data path width in half to save power. This optimization is called Hyper-
Folding. Additionally, power gating reduces static power of unused resources in the
FPGA by powering them down. The Quartus Prime software automatically powers
down specific unused resource blocks such as DSP and M20K blocks, at configuration
time.
During configuration, Stratix 10 devices are divided into logical sectors, each of which
is managed by a local sector manager (LSM). The SDM passes configuration data to
each of the LSMs across the on-chip configuration network. This allows the sectors to
be configured independently, one at a time, or in parallel. This approach achieves
simplified sector configuration and reconfiguration, as well as reduced overall
configuration time due to the inherent parallelism. The same sector-based approach is
used to respond to single-event upsets and security attacks.
While the sectors provide a logical separation for device configuration and
reconfiguration, they overlay the normal rows and columns of FPGA logic and routing.
This means there is no impact to the Quartus Prime software place and route, and no
impact to the timing of logic signals that cross the sector boundaries.
The SDM also provides additional capabilities such as register state readback and
writeback to support ASIC prototyping and other applications.
The SDM and associated security services provide a robust, multi-layered security
solution for your Stratix 10 design.
In addition to lowering power and cost, partial reconfiguration also increases the
effective logic density by removing the necessity to place in the FPGA those functions
that do not operate simultaneously. Instead, these functions can be stored in external
memory and loaded as needed. This reduces the size of the required FPGA by allowing
multiple applications on a single FPGA, saving board space and reducing power. The
partial reconfiguration process is built on top of the proven incremental compile design
flow in the Quartus Prime design software
Fitting
Fast Forward Compile
The physical layout of the CRAM array is optimized to make the majority of multi-bit
upsets appear as independent single-bit or double-bit errors which are automatically
corrected by the integrated CRAM ECC circuitry. In addition to the CRAM protection,
the user memories also include integrated ECC circuitry and are layout optimized for
error detection and correction.
The SEU error detection and correction hardware is supported by both soft IP and the
Quartus Prime software to provide a complete SEU mitigation solution. The
components of the complete solution include:
• Hard error detection and correction for CRAM and user M20K memory blocks
• Optimized physical layout of memory cells to minimize probability of SEU
• Sensitivity processing soft IP that reports if CRAM upset affects a used or unused
bit
• Fault injection soft IP with the Quartus Prime software support that changes state
of CRAM bits for testing purposes
• Hierarchy tagging in the Quartus Prime software
• Triple Mode Redundancy (TMR) used for the Secure Device Manager and critical
on-chip state machines
In addition to the SEU mitigation features listed above, the Intel 14-nm Tri-Gate
process technology used for Stratix 10 devices is based on FinFET transistors which
have reduced SEU susceptibility versus conventional planar transistors.