WP 01220 Hyperflex Architecture Fpga Socs
WP 01220 Hyperflex Architecture Fpga Socs
FPGA
Author Introduction
Allan Davidson The Intel® HyperFlex™ FPGA Architecture and Intel’s 14 nm Tri-Gate process
Senior Product Marketing Manager technology enable Intel Stratix® 10 FPGAs and SoCs to deliver levels of
Intel Programmable Solutions Group performance and power efficiency that were unimaginable in previous-generation
high-performance FPGAs. These devices deliver:
• 2X the core performance and 5X the density compared to previous-generation
Stratix V FPGAs
• Up to 70 percent lower power than Stratix V FPGAs for equivalent performance
• Logic, internal memory, and DSP blocks capable of 1 GHz operation
• Embedded quad-core 64 bit ARM* Cortex*-A53 hard processor system (in SoC
variants)
• Familiar FPGA design techniques supported by the proven Intel Quartus® Prime
software
Industry challenges
In all major industries, electronic system developers are being challenged to
pick up the pace of the breakneck speed increases they have delivered over the
past several decades. Not only do customers in markets as diverse as military
communications and computer storage environments want faster systems, they
also want smaller hardware that consumes less electricity.
The need for speed is not new. What is changing is that requirements in wireless,
wireline, military, broadcast, compute and storage are becoming even more
demanding. In many of these fields, demands are rising in high double-digit growth
Table of Contents rates reminiscent of boom markets.
Introduction . . . . . . . . . . . . . . . . . . . . . 1
Industry Challenges. . . . . . . . . . . . . . 1
Delivering the Unimaginable. . . . . . 2
The Intel HyperFlex FPGA
Architecture . . . . . . . . . . . . . . . . . . . 4
The Hyper-Aware Design Flow . . . . 5
Figure 1. Customer Demands Are Increasing in Applications such as Wireline,
Conclusion. . . . . . . . . . . . . . . . . . . . . . . 5 Data Center, and Wireless
References. . . . . . . . . . . . . . . . . . . . . . . 6
Where to Get More Information . . . 6
White Paper | A New FPGA Architecture and Leading-Edge FinFET Process Technology Promise to Meet Next-Generation System Requirements
The information and communications technology sector Reducing power and heat
typifies the customer demands that are challenging the
In all these fields, simply designing faster equipment is no
capabilities of system designers and the semiconductor
longer enough. Power consumption has become a major
suppliers who support them. Globally, bandwidth
issue, driven by environmental concerns as well as by the
consumption is doubling every two to three years as
cost savings that come with power reductions. Chipmakers
staggering amounts of data traverse global networks. In
and system designers alike have made power reduction a
2016, the gigabyte equivalent of all movies ever made will
central element in most projects. System design teams also
crisscross these networks every three minutes. Industry
appreciate the decrease in heat that comes with reduced
researchers at TeleGeography (1) noted that Internet
power consumption, which lets them spend less time on heat
bandwidth more than doubled between 2010 and 2012,
removal.
soaring to 77 Terabits per second.
Data centers are the poster child for power and heat
Communication requirements will soar as everything from
reduction. Between 2011 and 2012, Global data center power
automotive to industrial equipment to consumer products
requirements grew by 63 percent, rising to 38 gigawatts (GW)
like refrigerators connect to the Internet. Gartner predicts
in 2012, up from 24 GW in 2011, the DatacenterDynamics
that the Internet of Things will hit 26 billion units in 2020,
2012 Global Census said. (7) In the U.S., many statisticians
almost 30 times more than 0.9 billion in 2009. (2)
believe that data centers consume around 2 percent of the
Many trends contribute to the never-ending increase in country's electricity usage, citing research by Jonathan
bandwidth. For example, 100 Gigabits per second (Gbps) Koomey. (8)
Ethernet is just beginning to displace the popular 40 Gbps
version, yet the IEEE recently established a task force that will The need for a new approach
pursue a 400 Gbps standard. (3)
Regardless of the industry, next-generation systems require
Wired networks still carry the largest volumes of data, ever-increasing data throughput and higher clock frequency
but wireless markets are poised to reverse that. In 2011, performance. Faced with this reality and the need to get
wired devices accounted for nearly 55 percent of IP traffic, new products to market quickly, many companies are now
according to a Cisco report. The explosive growth of smart using FPGAs as key components in their system designs. The
mobile devices makes it easy to predict that wireless devices data throughput of these FPGAs is often a critical factor in
will soon generate the majority. Cisco predicts that mobile determining the overall system performance.
data traffic will skyrocket from 1.6 exabytes in 2013 to 11.2
To improve the data throughput in an FPGA, the most
exabytes in 2017. (4)
commonly used technique is to make on-chip buses wider
Hefty double digit growth rates in other fields further and wider. It is common to use 512 bit, 1,024 bit, or even
highlight the need for faster data handling equipment. wider buses in FPGAs. These wide buses require costly
Satellite communications, which are still driven by military FPGA resource utilization and power dissipation. Moreover,
usage, are soaring as drones and satellites generate more it is difficult to perform high-speed logic functions like
data that is used by far more personnel than in the past. That comparators or checksums across every bit of the bus.
is prompting surging requirements at land-based backhaul
In addition to wider buses, system designers extensively
stations.
pipeline data paths, increasing the clock frequency.
Northern Sky Research (NSR) forecasts greater than 50 However, pipelining a wide bus requires that each bit of the
percent growth in the global installed base of satellite bus consume additional FPGA resources, which again is
backhaul sites between 2012 and 2022. (5) This growth expensive. It is not practical to continue to make wider and
is driven by the need to serve 3G/4G/LTS backhaul wider buses.
requirements cost effectively for mobile operator clients.
Moving to the next technology node improves performance.
NSR projects that combined high throughput satellite
However, as process geometries continue to shrink, the
capacity demand will grow by 133.5 Gbps by 2022 for
interconnect delays between the logic blocks increasingly
backhaul services alone. NSR forecasts the global satellite
dominate the FPGA’s total delay. Evolving existing FPGA
broadband access market will add over 4.3 million net new
architectures with the next technology node does not
subscribers in the coming ten years with North America
address this concern. A better solution is needed to address
adding nearly 2.4 million new subscribers by 2022.
these increasingly significant interconnect delays.
Back on earth, mobile communications are driving the
demand for faster chips and systems. A Cisco forecast Delivering the unimaginable
predicts mobile traffic growth of 78 per cent from 2011 to
The new Intel HyperFlex FPGA Architecture in Intel Stratix 10
2016. Much of that will be driven by video, which should grow
devices is an innovative approach that addresses these
90 per cent annually between 2011 and 2016. Cisco predicts
concerns. It provides performance and power efficiency that
that by 2016 mobile video will account for over 70 per cent of
is simply not possible with conventional FPGA architectures.
the mobile data traffic. (6)
Using the new Intel HyperFlex FPGA Architecture combined
The public’s demand for video is also fueling strong growth with Intel’s 14 nm Tri-Gate process technology, designers
in the broadcast industry. By the end of this decade, most can achieve 2X the core performance in Intel Stratix 10
countries are expected to complete the transition to digital FPGAs and SoCs compared to previous-generation high-
TV. The growth of HDTV and the emergence of Ultra High performance FPGAs.
Definition technology are expected to drive the need for
faster editing and transmission systems.
White Paper | A New FPGA Architecture and Leading-Edge FinFET Process Technology Promise to Meet Next-Generation System Requirements
The HyperFlex advantage Higher core performance allows designers to use a slower
speed grade device while still exceeding performance
The key innovations that contribute to the HyperFlex
requirements, reducing the cost of the solution.
advantage are:
Higher core performance that allows a design to run 2X faster
Registers everywhere can be implemented at half the original internal bus width,
shrinking the total design size. Therefore, the design fits in a
The “registers everywhere” in the
much smaller device, reducing the cost of the solution.
interconnect routing, called Hyper-Registers,
are distinct from the conventional registers
that are contained within the adaptive logic The Intel advantage
modules (ALMs). A Hyper-Register is associated with each In February of 2013, Altera announced that Intel’s 14 nm Tri-
individual routing segment in the device; Hyper-Registers Gate (FinFET) process technology would be used to fabricate
are also available at the inputs of all functional blocks such the next-generation Intel Stratix 10 FPGAs and SoCs.
as ALMs, embedded memory (M20K) blocks, and digital This technology provides breakthrough levels of density,
signal processing (DSP) blocks. The Hyper-Registers are performance, and power efficiency. It is based on 3D FinFET
bypassable, allowing the design tools to select the optimal (Tri-Gate) transistors, which are replacing conventional 2D
register location automatically, after place-and-route, to planar MOSFET transistors as geometries shrink below
maximize core performance. 20 nm. All major silicon foundries have announced their
intention to move towards the 3D FinFET transistors. In
Having Hyper-Registers throughout the interconnect means
December 2015, Intel completed the acquisition of Altera.
that performance tuning does not require additional ALM
With Intel as the foundry for Intel Stratix 10 devices, Intel®
resources (unlike conventional architectures) and does
FPGA customers have access to a number of unique benefits,
not require additional changes or added complexity to
provided by “The Intel Advantage.” These benefits make
the design’s place-and-route. Additionally, having Hyper-
the Intel 14 nm Tri-Gate technology the ideal process for
Registers built into the interconnect helps to reduce routing
implementing the new Intel HyperFlex FPGA Architecture.
congestion.
The top five benefits that Intel FPGA customers obtain are:
Enhanced core clocking
• Exclusivity—Intel is the only major FPGA vendor that has
The programmable clock tree synthesis allows uses 14 nm Tri-Gate technology. Only Intel FPGA customers
system designers to create localized clock have access to this industry leading process technology.
trees, reducing skew and timing uncertainty to
obtain maximum core clocking performance. • Production Capability—Other major semiconductor
This capability is a key feature that allows the foundries have announced plans to develop new processes
Intel HyperFlex FPGA Architecture to reach 2X performance. based on FinFET transistors. However, there is a steep
In addition, the core clocking uses intelligent branch- learning curve when moving FinFET technology from the
enables to reduce the dynamic power dissipation in the clock research labs into production. So far, only Intel has made
networks. the transition into production—it has already shipped over
500 million FinFET transistor devices.
Hyper-Aware design flow • A Node Ahead—Intel debuted its Tri-Gate process at
The Hyper-Aware design flow includes three 22 nm, over three years ago. This technology has shrunk
new improvements: to 14 nm, which is the technology used in Intel Stratix 10
FPGAs and SoCs. The other semiconductor foundries
• A Fast Forward Compile tool that allows performance are developing FinFET processes that will start out using
exploration and guides the user to maximum design existing 20 nm design rules. They are not employing the
performance. same shrink as Intel, effectively, leaving Intel a node ahead,
resulting in sizable performance, power efficiency, and
• A Hyper-Retimer step that supports performance
density advantages.
optimization after place-and-route.
• Maturity—Intel is utilizing second-generation 14 nm
• Enhanced synthesis and place-and-route algorithms that
Tri-Gate technology. None of the other foundries have
use the Hyper-Registers.
publically stated when they will start building chips with
first-generation FinFET processes. Intel Stratix 10 FPGAs
The benefits of high performance—beyond high and SoCs benefit from the maturity of the Intel 14 nm Tri-
performance Gate process technology.
The Intel HyperFlex FPGA Architecture’s increased core • Design Expertise—Intel has proven its ability to design
performance offers several benefits to the system designer; and produce high-speed logic, analog, digital, and mixed-
benefits that go beyond the obvious one of simply running signal circuits using FinFET transistors. This wealth of
the core faster: design expertise, ensures that Intel Stratix 10 FPGAs and
Higher core performance makes timing closure easier SoCs make the best use of the capabilities of Intel’s 14 nm
and faster, improving the design team’s productivity and Tri-Gate process technology.
shortening the product’s time-to-market. Intel also offers the only major, high-performance FPGAs
and SoCs with US-based manufacturing. It provides access
to world-class package and assembly capability; and it
White Paper | A New FPGA Architecture and Leading-Edge FinFET Process Technology Promise to Meet Next-Generation System Requirements
enables Intel to develop heterogeneous multi-die devices Figure 3 shows a small section of the FPGA fabric with nine
that integrate 14 nm Intel Stratix 10 FPGAs and SoCs with ALMs and the interconnect routing that connects them. The
other advanced components—which may include SRAM, Hyper-Register location is indicated by the squares at the
DRAM, ASICs, processors, and analog components—in a intersection of each horizontal and vertical routing segment.
single package. These benefits form “The Intel Advantage,”
To maximize the performance of a design using the Intel
exclusively available to Intel Stratix 10 FPGA and SoC
HyperFlex FPGA Architecture, designers use a three-step
customers.
process that is based on familiar design techniques: register
retiming, pipelining, and design optimization. The Hyper-
The Intel HyperFlex FPGA Architecture Registers allow designers to use familiar design techniques
The centerpiece of the new Intel HyperFlex FPGA to increase the performance of the design well beyond what
Architecture is its innovative “registers everywhere” design is possible in conventional FPGA architectures. When these
that adds bypassable Hyper-Registers to every routing common techniques are implemented using the Hyper-
segment in the FPGA core and at all functional block inputs. Registers instead of the registers in the ALMs, the techniques
Figure 2 shows a bypassable Hyper-Register where the are renamed as Hyper-Retiming, Hyper-Pipelining, and
routing signal can bypass the register and go straight to the Hyper-Optimization. Table 1 summarizes the performance
multiplexer, or go through the register first. The multiplexer gains achieved in each step.
is controlled by one bit of the FPGA configuration memory
As process geometries shrink, the interconnect delays
(CRAM).
between the ALMs are becoming dominant and are
limiting performance. Locating the Hyper-Registers in the
interconnect routing—where they can best address this
issue—is one of the key innovations of the Intel HyperFlex
FPGA Architecture.
Hyper-Retiming
clk CRAM The design is retimed using the Hyper-Registers in the
Config interconnect routing. This process requires little to no
user effort yet it results in an average performance gain
Figure 2. Bypassable Hyper-Register of 1.5X for Intel Stratix 10 devices compared to previous
generation high-performance FPGAs. Hyper-Retiming
eliminates critical paths by moving registers out of the ALMs productivity. Figure 4 shows the Intel Quartus Prime Hyper-
and into the interconnect, balancing register-to-register Aware design flow.
delays and allowing the design to run at a faster clock
frequency. Because there are Hyper-Registers throughout Fast Forward Compile
the interconnect, the register location is fine-grained.
This new tool guides the user through the performance
Conventional retiming requires additional FPGA logic and
optimization process by identifying performance limiting
routing resources and requires the design to be recompiled,
areas of the design, identifying where and how many
refitted, and rerouted. In contrast, Hyper-Retiming does not
pipelines could be used to boost performance, and
use any additional FPGA resources and is performed after
highlighting critical control-path bottlenecks (such as long
place-and-route, providing a significant core performance
feedback loops). The tool also allows designers to predict the
boost with little or no designer effort.
performance of their existing design if it were implemented
in a Intel Stratix 10 device, enabling optimal use of the new
Hyper-Pipelining Intel HyperFlex FPGA Architecture.
The design is pipelined and retimed using the Hyper-
Registers. This technique requires minor user effort and Hyper-Retimer
results in an average performance gain of 1.65X for Intel
The Hyper-Retimer step occurs near the end of design
Stratix 10 devices compared to previous generation high-
compilation. It performs post place-and-route performance
performance FPGAs. Hyper-Pipelining eliminates long
optimization using the Hyper-Registers for optimal fine-
routing delays by adding additional pipeline stages in the
grained Hyper-Retiming. This step also allows the user
interconnect between the ALMs, allowing the design to run at
to implement Hyper-Pipelining much more easily than
a faster clock frequency. Again, the Hyper-Registers located
conventional pipelining. The Fast Forward Compile report
throughout the interconnect allow a fine-grained selection
identifies which clock domains can benefit from pipeline
of the register location. As with Hyper-Retiming, Hyper-
stages and how many pipeline stages are needed. After the
Pipelining does not use additional FPGA logic and routing
designer modifies the RTL and places the prescribed number
resources, and it is done after place-and-route.
of pipeline stages at the boundaries of each clock domain,
the Hyper-Retimer automatically places the registers within
Hyper-Optimization the clock domain at the optimal locations to maximize the
After accelerating data paths with Hyper-Retiming and performance. This auto-placement along with the Fast
Hyper-Pipelining, some designs are limited by control logic Forward Compile report makes pipelining easier than ever.
such as long feedback loops and state machines. To achieve
higher performance, it is necessary to restructure these logic Hyper-Aware algorithms
sections to use functionally equivalent feed-forward or pre-
Hyper-Aware algorithms used during synthesis and place-
compute paths instead of long combinatorial feedback paths.
and-route allow the tool to reduce logic resources by
This method requires a bit more effort, depending on the
predicting which registers can be moved out of ALMs and
design; however, it results in average performance gains of
into Hyper-Registers in the interconnect routing.
2X or more in Intel Stratix 10 devices compared to previous
generation high-performance FPGAs. In a conventional
architecture, this process is called design optimization. In Conclusion
the Intel HyperFlex FPGA Architecture, this process is called The combination of the new Intel HyperFlex FPGA
Hyper-Optimization because the Hyper-Registers apply the Architecture and the Intel 14 nm Tri-Gate process technology
benefits of Hyper-Retiming and Hyper-Pipelining to the feed- enables Intel Stratix 10 FPGAs and SoCs to deliver previously
forward or pre-compute paths. unimaginable levels of performance, density, and power
efficiency in a programmable logic device. Intel Stratix 10
The Hyper-Aware design flow devices offer:
Intel has developed a powerful set of new tools, integrated • 2X the core performance and 5X the density compared to
into the Intel Quartus Prime design software, that help previous-generation Stratix V FPGAs
system designers take full advantage of the Intel HyperFlex
FPGA Architecture and maximize the developer’s design • Up to 70 percent lower power than Stratix V FPGAs for
equivalent performance
Hyper-Aware Fast Forward • Logic, internal memory, and DSP blocks capable of 1 GHz
Tools Compile operation
Fitting Timing
• Embedded quad-core 64 bit ARM Cortex-A53 hard
Mapping processor system (in SoC variants)
(Plan, Place, Route, Retime) Analysis
References
1
http://www.telegeography.com/press/press-releases/2012/09/06/global-internet-capacity-reaches-77-tbps-despite-slowdown/index.html
2
http://www.gartner.com/newsroom/id/2636073
3
http://www.ieee802.org/3/400GSG/
4
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html
5
http://www.nsr.com/news-resources/the-bottom-line/satellite-backhaul-trunking-are-capacity-driven-markets/
6
http://www.ciscoknowledgenetwork.com/files/222_03-27-2012-CKN_Cisco_Mobile-VNI-Forecast_2012_CKN_Deck.pdf
7
http://www.computerweekly.com/news/2240164589/Datacentre-power-demand-grew-63-in-2012-Global-datacentre-census
8
http://www.koomey.com/post/8323374335
© Intel Corporation. All rights reserved. Intel, the Intel logo, the Intel Inside mark and logo, Altera, Arria, Cyclone, Enpirion, Experience What’s Inside, Intel Atom, Intel Core, Intel Xeon, MAX, Nios,
Quartus, and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Intel reserves the right to make changes to any products and servic-
es at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to
in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.
* Other marks and brands may be claimed as the property of others.
Please Recycle WP-01220-1.4