Final Year Report
Final Year Report
ABINAYA R (71812102003)
BALA AMUDHAN S (71812102021)
BALAJI S (71812102022)
of
BACHELOR OF ENGINEERING
in
ENGINEERING
May 2025
SRI RAMAKRISHNA ENGINEERING COLLEGE
COIMBATORE
BONAFIDE CERTIFICATE
Certified that this project report entitled “: Matrix code based Error Detection and
Correction for Matrix Computation in Systolic Array” is the bonafide work of
ABINAYA.R (71812102003), BALA AMUDHAN.S (71812102021), BALAJI.S
(71812102022) who carried out the 20EC282- PROJECT WORK under my
supervision.
SIGNATURE SIGNATURE
SUPERVISOR HEAD OF THE DEPARTMENT
Mr. M . SELVAGANESH Dr. M. JAGADEESHWARI,
Assistant Professor (Sr.G) Professor and Head
Department of ECE Department of ECE
Sri Ramakrishna Engineering College Sri Ramakrishna Engineering College
Coimbatore - 641022 Coimbatore – 641022
iii
ACKNOWLEDGEMENT
iv
TABLE OF CONTENTS
ABSTRACT iii
LIST OF ABBREVIATIONS ix
1 INTRODUCTION
INTRODUCTION 1
2 LITERATURE SURVEY
2.1 INTRODUCTION 7
2.2 BACKGROUND 8
2.3 SUMMARY 17
3.3 OBJECTIVE 19
v
3.8 UNIQUENESS OF THE PRODUCT 28
3.9 SUMMARY 29
4.3 SUMMARY 37
5.1 ANALYSIS 41
6.1 CONCLUSION 42
REFRENCES 42
vi
LIST OF FIGURES
vii
LIST OF TABLES
viii
LIST OF ABBREVIATIONS
ix
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
2
to simulate various fault scenarios, assessing the system’s robustness and fault recovery
capabilities under stress.
The proposed architecture not only provides a highly reliable platform for matrix
computations but also ensures that the system maintains high throughput and low
latency — critical requirements for real-time applications. This fault-tolerant design
has significant potential for deployment in edge computing devices, medical
instruments, aerospace systems, and other domains where system failures can have
severe consequences. And the design offers room for further enhancement through the
integration of adaptive error mitigation strategies, which could allow the system to
modify its tolerance levels based on operational conditions. Additionally, dynamic
partial reconfiguration on FPGA platforms can be leveraged to isolate and recover
faulty modules during runtime, paving the way toward self-healing systems.
In summary, the project aims to create a scalable and resource-efficient solution
for fault-tolerant matrix computation using systolic arrays. By combining the benefits
of Light ABFT with classic coding techniques such as Hamming and parity codes, and
by harnessing the flexibility of FPGA implementation, this work addresses both
detection and correction of errors in a balanced manner. The outcome is a robust
hardware architecture that prioritizes performance, reliability, and adaptability — key
attributes in the design of modern computing systems.
With the rapid growth of data-intensive applications such as artificial intelligence,
machine learning, and scientific computing, efficient and reliable matrix computation
has become increasingly important. Systolic arrays, due to their regular structure and
high parallelism, have emerged as a prominent hardware architecture for accelerating
matrix multiplication tasks. However, as technology continues to scale down and
devices operate under harsher environmental and voltage conditions, these
architectures become more vulnerable to transient and permanent faults, which can
compromise system reliability and accuracy.
To mitigate these risks, fault tolerance techniques must be embedded within the
architecture. Traditional Algorithm-Based Fault Tolerance (ABFT) methods provide
effective detection mechanisms using mathematical invariants and checksum validation
during computations. However, these approaches often introduce significant area and
3
computational overhead, limiting their practical use in resource-constrained platforms
such as FPGAs.
This project introduces a novel approach combining Lightweight ABFT (Light
ABFT) and a Matrix Code-Based Error Detection and Correction (MCEDEC) scheme,
tailored for systolic array-based matrix multiplication. Light ABFT leverages
simplified checksum calculations on input matrices to provide real-time fault detection
with minimal overhead. Its efficiency makes it particularly suitable for FPGA
deployment where area and power consumption are critical.
For error correction, the system integrates a hybrid scheme using Hamming Codes
and Parity Checks. Hamming Code is employed along the rows of the matrix to correct
single-bit faults, while parity checks along the columns enable the detection of multi-
bit errors. This cross-dimensional coding structure enhances overall reliability without
a significant increase in resource utilization.
The proposed architecture is implemented using Verilog HDL and synthesized on
a Xilinx FPGA using Vivado tools. Fault injection mechanisms are used to evaluate the
system’s error detection and correction capabilities under various scenarios.
In essence, the project presents a robust, low-overhead, and scalable fault-tolerant
design for systolic array-based matrix multiplication, suitable for deployment in
mission-critical and embedded computing systems.
Scientific simulations, signal processing, machine learning, and other computing
domains all depend on matrix calculations. Systolic arrays' capacity for high-speed, parallel
processing makes them popular for speeding up these processes. But as these systems get
bigger and more complicated, they are more susceptible to hardware defects including bit
flips, temporary mistakes, and permanent malfunctions. The precision and dependability
of calculations may be jeopardised by these errors, particularly in crucial applications.
Current fault-tolerant methods frequently have large overheads or aren't tailored for
systolic array shape. Thus, an error detection and correction system that is scalable,
effective, and lightweight and specifically tailored for matrix operations on systolic arrays
is desperately needed. In order to overcome this difficulty, this research will investigate
matrix code-based methods that can identify and fix mistakes with the least amount of
hardware complexity and performance damage. Hardware redundancy and intricate error-
4
correcting codes are two examples of traditional fault-tolerant techniques that frequently
result in higher resource, latency, and power consumption. These characteristics are
undesirable for systolic array implementations where efficiency is crucial. Furthermore, a
large number of current methods are not especially designed to handle the regular and data-
dependent character of matrix calculations in systolic designs. Because scientific
computing and artificial intelligence applications require both high speed and high
accuracy, it is crucial to create specialised methods that balance system performance and
fault coverage. With its organised and computationally aware procedures that complement
systolic processing patterns, matrix code-based error detection and repair offers a viable
path . Designing and incorporating such methods in a way that maintains the array's
computing performance while managing possible faults is the difficult part.
This project focuses on improving the reliability of matrix computations
performed using systolic arrays, which are widely used in high-performance computing
due to their ability to process data in parallel. As these systems are increasingly adopted
in critical domains like machine learning and scientific simulations, ensuring accurate
results in the presence of hardware faults becomes essential. Errors such as bit flips,
timing faults, and permanent hardware failures can significantly affect computation
outcomes. While several fault-tolerant methods exist, many are either resource-
intensive or not optimized for systolic architectures. This work aims to address these
challenges by developing a matrix code-based error detection and correction technique
that is both efficient and scalable. The proposed approach leverages the structured
nature of matrix operations to detect and correct errors with minimal overhead,
ensuring reliable performance without compromising system speed or hardware
efficiency. By creating an effective and scalable error detection and correction method
based on matrix codes, our work seeks to overcome these issues. The suggested method
ensures dependable performance without sacrificing system speed or hardware
efficiency by taking use of the structured nature of matrix operations to identify and fix
faults with little overhead.
5
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
Matrix operations are a fundamental component in various fields such as
scientific computing, signal processing, and artificial intelligence. To perform these
computations efficiently, systolic arrays have become a widely adopted hardware
architecture due to their regular structure, high parallelism, and fast data throughput.
However, as the scale and complexity of these systems increase, ensuring reliability in
the presence of faults becomes essential. Hardware faults—such as transient errors, bit
flips, or permanent failures—can lead to incorrect results during matrix computations.
To address this, researchers have developed various fault-tolerant techniques, including
matrix code-based error detection and correction and algorithm-based fault tolerance
(ABFT). These methods introduce redundancy at different levels to identify and
mitigate errors with minimal performance loss. This literature survey explores key
contributions in this area, focusing on both foundational approaches and recent
advancements, with the goal of understanding how fault resilience can be effectively
integrated into systolic array-based matrix computation systems.
6
2.2 BACKGROUND
7
processing could be achieved in real-time as data moves through the array, making it
ideal for hardware implementations of large-scale matrix operations.
This paper laid the groundwork for future developments in error-tolerant
computing. Over the years, the ABFT method has been adapted to a variety of matrix-
based algorithms and is now considered a fundamental technique in high-performance
and fault-resilient computing. Its relevance has grown in modern computing systems,
especially in fields like artificial intelligence and embedded processing, where
reliability and speed are both crucial
The authors of this work discuss the growing demand for hardware accelerators
to be reliable, particularly in systolic array processors, which are frequently employed
for high-speed matrix calculations in fields such as scientific applications, machine
learning, and signal processing. As these systems become more complicated and scale,
errors—whether brought on by ageing, environmental noise, or hardware flaws—can
have a big influence on the accuracy of the system. The authors address this by
proposing a fault-tolerant systolic array architecture, which is tested and implemented
with FPGA technology. The paper's main contribution is the creation of a systolic array
processor that can maintain accuracy even when processing elements (PEs)
malfunction. Rather of employing conventional redundancy techniques that
significantly raise hardware costs, the authors offer an effective fault tolerance solution
that strikes a compromise between resource consumption and reliability. By using error
detection and recovery techniques built into the systolic architecture, their approach
allows the array to respond to mistakes dynamically without requiring a complete
system reconfiguration. Their approach's usage of reconfigurable PEs is a crucial
component. The system isolates the malfunctioning unit and reroutes the computation
through spare or healthy elements when a PE fault is discovered. This dynamic adaption
guarantees minimum performance loss while maintaining continuous operation.
8
Additionally, the localised and lightweight defect detection method eliminates the need
for costly worldwide checks or intricate monitoring systems. An FPGA implementation
of the fault-tolerant design shows how useful the suggested approach is. The authors
demonstrate through trials that their design increases dependability significantly while
retaining a manageable space and power overhead. Overall, the design exemplifies how
intelligent architectural modifications can enhance system reliability without incurring
excessive costs, making it a valuable reference for researchers and engineers working
in fault-tolerant hardware design.
S. Zhang and K. Roy, "A Low-Overhead Error Detection Scheme for Systolic
Array-Based Matrix Multiplication Accelerators," IEEE Trans. Very Large Scale
Integration (VLSI) Syst., vol. 29, no. 10, pp. 2671-2683, Oct. 2021.
The goal of S. Zhang and K. Roy's paper "A Low-Overhead Error Detection
Scheme for Systolic Array-Based Matrix Multiplication Accelerators" is to increase the
dependability of systolic array processors used in matrix multiplication tasks, which
are found in many domains such as scientific computing and machine learning. Systolic
arrays have drawn a lot of interest for speeding up large-scale matrix operations and
are very effective in parallel processing. However, guaranteeing error resilience is one
of the main hurdles with these systems, particularly when they are implemented in
settings where radiation-related malfunctions, hardware flaws, or other problems might
have a substantial effect on performance. The authors present a novel error detection
technique that smoothly incorporates into the systolic array without introducing a
significant amount of overhead in order to overcome this difficulty. The suggested
method makes use of error detection codes, which are intended to identify errors in the
calculations made by the systolic array with the least possible effect on the accelerator's
performance. The system can monitor its own calculations without the need for extra
or redundant hardware components thanks to these error detection algorithms, which
make use of the array's built-in structure. One of the paper's main features is the error
detection scheme's inexpensive implementation. The authors stress the significant
9
computational or power costs associated with typical fault tolerance techniques like
employing sophisticated error-correction algorithms or adding redundancy.
In the end, the work offers a scalable and effective way to improve systolic array fault
tolerance, resulting in more durable and dependable accelerators for high-performance
computing. Systems that need both a lot of processing power and little overhead, such
AI accelerators and embedded systems, benefit greatly from this strategy.
11
dependability since they also demonstrate that the additional fault tolerance has little
effect on the systolic array's overall performance.
12
reliability and computational speed are essential, such as embedded systems, real-time
processing, and scientific computing.
For parallel computation, systolic arrays are very effective, particularly for
matrix operations like multiplication, which is essential for many computationally
demanding activities like machine learning, scientific simulations, and image
processing. But these systems are prone to errors, whether from hardware malfunctions
or outside disruptions, which can impair their functionality and lead to inaccurate
calculations. This study suggests a fault-tolerant systolic array architecture that can
continue to operate correctly even if one or more of the processing elements (PEs)
experience a problem. Systolic arrays work incredibly well for parallel computation,
especially when it comes to matrix operations like multiplication, which are crucial for
a lot of computationally intensive tasks like image processing, scientific simulations,
and machine learning. However, these systems can malfunction and produce erroneous
estimations due to a variety of factors, including hardware issues and external
disturbances. According to this study, a fault-tolerant systolic array design can function
properly even in the event that one or more of the processing elements (PEs) encounter
an issue. Through simulation experiments and theoretical analysis, the authors
demonstrate the effectiveness of their fault-tolerant systolic array design.
Finally, by providing a workable solution to the issue of fault tolerance in
systolic arrays, Kim and Irwin's paper makes a significant addition to the field of
parallel computing. The authors offer a technique that guarantees the system can carry
out matrix multiplication dependably even in the event of hardware malfunctions by
integrating error-detecting and error-correcting capabilities straight into the systolic
array architecture. This work lays the groundwork for future studies on enhancing the
resilience and effectiveness of parallel computing systems while also greatly increasing
the fault tolerance of systolic arrays.
13
R. Sridhar and V. Srinivasan, "FPGA Implementation of Matrix Multiplication
Using Systolic Arrays," International Journal of VLSI Design & Communication
Systems (VLSICS), vol. 6, no. 2, pp. [insert page numbers], April 2015.
14
M. El-Khamy and J. Lee, "Survey of Fault-Tolerant Techniques for Deep Neural
Networks and Systolic Arrays," ACM Computing Surveys (CSUR), vol. 54, no. 8,
pp. 1–35, 2022.
The literature survey highlights the growing need for fault-tolerant mechanisms
in matrix computations, especially when implemented on systolic array architectures.
As these arrays are widely used for high-speed and parallel processing, any fault in
their components can significantly impact the accuracy of the results. Various research
works have explored strategies to address this issue, including algorithm-based fault
tolerance (ABFT), error-correcting codes, redundant computation, and reconfigurable
architectures. These techniques aim to detect, isolate, and correct errors while
maintaining efficient performance. Several studies have also proposed hardware-level
and hybrid approaches that balance fault coverage with resource usage. Through this
survey, it is evident that while many effective solutions exist, there remains a need for
approaches that offer high reliability with minimal computational overhead. The
collected research provides a strong foundation for developing improved error
detection and correction systems tailored for matrix operations in systolic arrays.
16
CHAPTER 3
METHODOLOGY AND WORKING PRINCIPLE
3.1 PROBLEM IDENTIFICATION
Matrix multiplication is a core operation in many advanced computing tasks,
including scientific simulations, image processing, and machine learning. In real-time
and safety-critical systems, reliability and speed are crucial. Field-Programmable Gate
Arrays (FPGAs) are widely used in such domains due to their reconfigurability and low
power usage. However, they are highly sensitive to radiation-induced faults, especially
in their configuration memory. These faults are persistent, meaning that once a fault
occurs, it can affect all subsequent operations until explicitly corrected. Traditional fault
tolerance methods like Triple Modular Redundancy (TMR) or Dual Modular
Redundancy (DMR) are too resource-intensive and can significantly hinder
performance. Hence, there is an urgent need for an error detection method that is both
efficient and lightweight, tailored specifically for matrix operations implemented on
systolic arrays within SRAM-based FPGAs.
3.3 OBJECTIVE
The main objective of this project is to develop a reliable and efficient fault-
tolerant matrix multiplication system using systolic array architecture. The goal is to
integrate a lightweight error detection technique, specifically Light Algorithm-Based
Fault Tolerance (Light ABFT), to identify faults during computation without incurring
high hardware or processing overhead. To enhance error resilience, a hybrid correction
mechanism is employed, combining Hamming Code for correcting single-bit errors
along the rows and Parity Code for detecting multi-bit errors in the columns. The entire
architecture is implemented on an FPGA platform using Verilog HDL and synthesized
through Xilinx Vivado, enabling real-time testing and performance evaluation. This
project also aims to ensure that the system maintains high throughput and accuracy
while minimizing resource usage, making it suitable for critical applications where data
integrity and system reliability are paramount.
18
codes on the columns of the computation matrix. The Hamming code performs single-
bit error correction per row, while the parity codes are used to help detect multi-bit
errors in the columns, providing better fault coverage overall. This double-dimensional
approach allows both detection and correction to be properly addressed without
considerable degradation of performance or increase in hardware complexity. The
design is executed and verified on an FPGA using Verilog HDL, taking advantage of
its reconfigurability, real-time processing features, and ability for fault injection testing.
The suggested approach provides a strong, low-overhead, and effective solution for
fault-tolerant systolic array architectures, which is suitable for mission-critical
applications in areas like artificial intelligence, signal processing, and embedded
systems.
19
1.1. INPUT CHECKSUM GENERATION:
In this step, checksums of the input matrices are calculated to serve as a reference
for verifying the correctness of the output:
A. For Matrix A: Compute the column-wise checksum
CA[j]=i=1∑m A[i][j]
This results in a 1 × n vector.
B. For Matrix B: Compute the row-wise checksum
RB[i]=j=1∑p B[i][j]
This results in an n × 1 vector.
These values represent the expected behavior of input data.
Using the above input checksums to calculate a reference checksum that represents
the expected behavior of the final result:
Expected Checksum= CA*RB= j=1∑n CA[j]⋅RB[j]
This produces a single scalar value which acts as the predicted total sum of the output
matrix if no fault occurs during multiplication.
20
Or
• The row-wise or column-wise sum of C, depending on implementation. This
computed checksum is the actual result of the matrix operation.
2. MATRIX CODES:
Matrix codes are a hybrid error correction technique that applies Hamming codes
along the rows of a matrix and parity codes along the columns. This 2D protection
strategy improves fault tolerance by enabling both detection and correction of errors in
matrix-based data structures, such as those used in systolic arrays or memory storage.
This technique is especially valuable in applications requiring high data integrity with
minimal hardware complexity, such as FPGA-based matrix multiplication systems. The
various steps involved in matrix codes are:
22
➢ integrity across columns.
▪ Use parity code (column-wise) to verify If a parity check fails, it signals an
unrepaired error (e.g., if Hamming couldn’t detect it due to multiple-bit errors).
▪ Helps in localizing errors further in conjunction with row data.
3. HAMMING CODE:
• Hamming Code is one of the oldest and most widely used error-correcting codes in
digital systems. Invented by Richard Hamming in 1950, it gives a simple but
effective way to detect and correct single-bit errors and detect two-bit errors. This
makes it particularly valuable in systems where data integrity is important, like
computer memory, digital communication, and hardware implementations such as
FPGAs.
• The basic concept of Hamming Code is to add redundancy in the shape of parity bits
to the original data. These parity bits are positioned at data points that are powers of
two (1, 2, 4, 8, etc.). These are added with the intent to check certain sets of bits, so
that each single bit in the encoded message is accounted for by a distinct set of parity
checks. This smart design enables the system not just to identify whether an error
has taken place but also to locate the precise position of the error bit. In order to
ascertain the number of parity bits that will be required for a specified number of
data bits, we apply the formula (2^r >= m + r + 1), where `r` represents the number
of parity bits and `m` represents the number of data bits.
• Once the right number of parity bits has been determined, data and parity bits are
put into the proper locations. Parity bits are then computed on the basis of XOR
operations on some bits chosen in the message. When receiving or processing the
encoded message, the same parity checks are recalculated. If the values do not agree
23
with the expected ones, the system calculates a binary error syndrome, which
directly indicates the position of the bad bit. That bit is flipped to recover the data.
Suppose we wish to send four data bits (let's say, 1011). Then we require three parity
bits, so there is a 7-bit Hamming code.
• The last bits that are transmitted could be 0110011, with the parity bits computed
and added. If, upon transmission, there is a mistake in one of these bits, the parity
checks can be used at the receiver end to detect and correct it. This renders Hamming
Code not just space- and time-efficient but also reliable in low-error environments.
On the whole, Hamming Code represents a basic component for more complicated
schemes of error correction and is still extensively employed in both industrial and
academic domains because of its simplicity and efficacy.
4. PARITY CODES:
• Parity codes are some of the simplest and most popular techniques for detecting
errors in digital systems. As opposed to Hamming codes that can detect as well as
correct errors, parity codes are often employed for the purpose of detecting errors
alone. The strategy in parity coding is to append one parity bit to a set of data bits
such that the number of 1s in the set—either counting or not counting the parity
bit—is a particular rule: even parity or odd parity.
• In even parity, the parity bit is arranged in a way that the total number of 1s in the
whole group (data + parity) is even. In the same way, in odd parity, the parity bit
makes the total number of 1s odd. This extra bit aids the receiver to identify whether
there was a single-bit error while transmitting or storing data. If a single error flips
a bit from 0 to 1 or 1 to 0, the group parity will change too. The receiver will sense
this discrepancy and mark it as an error.
• For instance, if we are employing even parity and we wish to transmit the data
`1011`, which contains three 1s (an odd number), we would append a parity bit of 1
to ensure that the total number of 1s is four (even). The code to be transmitted is
24
`10111`. On the receiving end, the system re-computes the parity. If it is not even
as anticipated, the system identifies that an error has been made.
• Whereas parity codes are effective at indicating single-bit errors, they have limited
capability—they can neither specify the position of the error nor correct it. Further,
they do not catch all forms of multiple-bit errors (e.g., if two bits are inverted, the
parity may remain intact). Nevertheless, their simplicity and low cost of
computation render them extremely efficient in situations where error correction is
not required or other types of redundancy are present.
25
3.3 INNOVATIVENESS OF THE SOLUTION
26
correction methods appropriately. This adaptability helps the system remain
functional and reliable in varied environments, thus suitable for mission-critical
operations where error robustness is of the highest priority
27
3.6 SUMMARY
This chapter explains the application of LABFT in systolic arrays to detect and mask
faults through a hybrid Hamming-parity coding approach. It is executed on an FPGA,
using parallelism to handle errors in real-time. The design guarantees low performance
overhead while having high fault tolerance, flexibility, and scalability in a wide range
of computational applications.
28
CHAPTER 4
SOFTWARE AND HARDWARE DESCRIPTION
29
Type of FPGA
The FPGA chip type determines the processing capability of the board, resources, and
application suitability. Xilinx boards employ FPGA families such as Zynq-7000 (SoC
with FPGA and ARM CPUs) or Artix-7 (budget-friendly, high-end). The FPGA type
determines the scale of Verilog designs and Vivado synthesis/implementation
strategies:
a. ZedBoard and Pynq-Z2: With the Zynq-7000 SoC (XC7Z020), which contains
85K logic cells and a dual-core ARM Cortex-A9 processor (650 MHz). The SoC
accommodates Verilog designs (e.g., a soft-core processor) and software
applications (e.g., Linux-based control), perfect for embedded systems and AI .
b. Arty A7 and Nexys A7: Utilize Artix-7 FPGAs (XC7A35T for Arty A7-35T, 33K
logic cells; XC7A100T for Nexys A7, 101K logic cells). These are appropriate for
signal processing, IoT, and education projects, Verilog designs such as digital filters
or state machines .
Verilog HDL
The versatility of Verilog lies in the fact that it can be used to model hardware at
various levels of abstraction, enabling designers to design at high-level functional
models or low-level gate-level implementations. It can support multiple design stages,
ranging from prototyping early ideas to final synthesis. For instance, a designer can
model a UART (Universal Asynchronous Receiver-Transmitter) controller using
Verilog, defining its data transmission logic at a high level before synthesizing it to
30
FPGA hardware. Verilog is extensively used in industries for ASIC, FPGA, and SoC
design, as its support for EDA tools such as Xilinx Vivado enables seamless integration
into existing design flows. Its standardization enables interoperability, with designers
being able to simulate, synthesize, and verify complex systems efficiently.
Verilog provides three main levels of abstraction, each suited for particular design
requirements:
a. Behavioral Level:
c. Gate Level:
This low-level of abstraction describes systems as logic gates (e.g., AND, OR,
NOT) and hard-wired primitives, whose signal values are 0, 1, X (unknown), and Z
(high-impedance). Gate-level descriptions, typically synthesized from RTL code by
31
synthesis tools, are intended for simulation, timing analysis, and backend activities such
as place-and-route. For instance, a gate-level netlist can represent a circuit as a net of
NAND gates cascaded together, and this provides a good representation of the physical
realization.
The Xilinx Vivado Design Suite is a full-featured Electronic Design Automation (EDA)
tool for designing, synthesizing, implementing, and analyzing FPGA-based systems.
Designed for Xilinx FPGAs like the Zynq UltraScale+, Artix-7, and Kintex UltraScale
series, Vivado accommodates a broad spectrum of design entry styles, including
Verilog, VHDL, schematic capture, and high-level synthesis (HLS) for the translation
of C/C++ code to HDL. Vivado handles the whole FPGA design flow, from Verilog
code writing to the generation of a bitstream file that programs the FPGA's
programmable logic, memory, and interconnects. Its sophisticated algorithms optimize
designs for performance (e.g., clock speed), power, and resource usage, which makes
it particularly suitable for use in applications such as embedded systems, 5G
communications, and machine learning accelerators.
32
Vivado IDE Interface
Project Manager:
Shows all project resources, such as Verilog source files, timing constraints (clock
period), and IP cores. It provides a hierarchical presentation for navigation
convenience, allowing users to add, modify, or arrange components. For instance, a
designer can import a Verilog module for a PWM (Pulse Width Modulation) controller
or add a Xilinx IP core for an Ethernet MAC.
Flow Navigator:
A vertical panel that organizes the design flow into linear tasks, including synthesis,
implementation, and bitstream generation. Every task is one-click accessible, and
Vivado indicates the current stage, making it easy to navigate for novices and experts
alike. For example, clicking "Run Synthesis" initiates the translation of Verilog code to
a netlist.
Workspace:
An multi-document window for display and edit of design files, simulation waveforms,
and analysis reports. It provides environments such as the HDL Editor to create Verilog
code, the Timing Analyzer to analyze signal delays, and the Power Estimator to
measure energy consumption. Multiple files can be opened at one time, e.g., a Verilog
testbench and its waveform output, to improve productivity.
Tcl Console:
Supports automation by Tcl scripting so that users can run commands to automate
repetitive activities (e.g., batch generation of multiple designs) or customize the
workflow. The console displays all operations with full transparency, and debugging is
supported. The IDE's flexibility permits users to resize, dock, or undock panels to suit
33
their workflow styles. The Layout menu provides predefined layouts or reset to default
options, providing access for new users while accommodating detailed customization
for expert designers.
Design Entry:
Emphasized the HDL Editor features of Vivado (e.g., syntax highlighting) and project
management, with the demonstration of a 32-bit counter to illustrate useful application.
Simulation:
Synthesis:
Implementation:
Verification IP:
IP Integrator:
Design Optimization:
34
Discussed timing and resource optimization, with matrix multiplier and CRC examples
to illustrate performance trade-offs.
Mixed-Language Support:
4.3 SUMMARY
Verilog HDL and Xilinx Vivado Design Suite make up a sound ecosystem for
designing FPGAs, catering to the demands of contemporary electronics engineering.
Verilog's multi-level abstraction supports efficient, scalable description of digital
systems, ranging from high-level behavior to synthesizable RTL and gate-level netlist.
Its standardization and C-like syntax result in ease of use and portability, suitable for
various uses such as telecommunication, automobile, and artificial intelligence
hardware. Vivado augments Verilog with a complete, intuitive platform for synthesis,
implementation, simulation, timing analysis, and debugging specifically designed for
Xilinx FPGAs. Together, they enable engineers to design, verify, and implement
sophisticated FPGA systems with accuracy and efficiency, satisfying high-performance
and reliability demands in industries across the globe.
35
CHAPTER 5
Throughout the project development and testing, the work was conducted in a
laboratory setting with industry-standard FPGA development tools and simulation
software. The main hardware platform utilized was a Xilinx FPGA board, which offered
a flexible and efficient platform for implementing and testing the error detection and
correction architecture. The toolchain of FPGA, such as Vivado Design Suite, was used
for synthesis, simulation, and deployment of VHDL/Verilog code. The simulator played
an important role in verifying the logic design prior to hardware deployment.
Testbenches were implemented to simulate different fault conditions, both single-bit
faults and multiple-bit faults, to see the behavior of the system and test its fault-handling
ability. These fault injection simulations enabled a deliberate test of the LABFT and
hybrid Hamming-parity mechanisms under various stress levels.
With the transition to hardware testing, the FPGA board was set up and observed in
real-time, enabling direct system performance and error correction response
observation. The environment facilitated debugging using on-chip tools such as
Integrated Logic Analyzer (ILA), which allowed deeper insight into internal signal
behavior under fault conditions. In general, the development and test environments
offered a solid platform to iteratively design, verify, and optimize the system for fault-
tolerant, high-speed applications.
36
Fig.1 The Input Matrices A & B are Multiplied and Accumulated using
Light ABFT method
Fig.2 In p22 location Error is injected and the Output Matrix P is obtained
using Light ABFT method
37
Fig.3 The Input Matrices and the Error injected Output Matrix are
compared using Light ABFT method and Error is detected
Fig.4 The Output Matrix P is corrected using Matrix Code which contains
Hamming Code for rows and Parity Code for columns and obtains Matrix
Q
38
5.1 ANALYSIS
a11 5 b11 1
a12 10 b12 2
a13 15 b13 3
a14 10 b14 4
a21 2 b21 2
a22 4 b22 3
a23 6 b23 4
a24 8 b24 5
a31 4 b31 3
a32 8 b32 4
a33 12 b33 5
a34 16 b34 6
a41 3 b41 4
a42 6 b42 5
a43 9 b43 6
a44 12 b44 7
MATRICES
6.1 CONCLUSION
The proposed structure effectively combines efficient correction procedures with
lightweight real-time error detection in a fault-tolerant systolic array architecture for
matrix multiplication. While the employment of parity and Hamming codes offers
strong error correction across all dimensions of the computing matrix, the application
of LABFT for fault detection guarantees low performance overhead. The system's area,
power, and real-time error recovery efficiency are confirmed by the FPGA-based
implementation. All things considered, the study shows a workable and trustworthy
way to improve the reliability of matrix operations in systolic arrays, especially for
crucial applications in signal processing and embedded systems.
40
REFERNCES
4. S. Zhang and K. Roy, "A Low-Overhead Error Detection Scheme for Systolic
Array-Based Matrix Multiplication Accelerators," IEEE Trans. Very Large Scale
Integration (VLSI) Syst., vol. 29, no. 10, pp. 2671-2683, Oct. 2021.
41
6. J. de Fine Licht, G. Kwasniewski, and T. Hoefler, "Flexible Communication
Avoiding Matrix Multiplication on FPGA with High-Level Synthesis," in
Proceedings of the 2020 ACM/SIGDA International Symposium on Field-
Programmable Gate Arrays (FPGA'20), Seaside, CA, USA, Feb. 2020.
42
Appendix
module tb_light_abft();
reg clk,rst,rst1;
reg [1:0] s1,s2,s3;
reg [7:0] c1,c2,c3,c4,r1,r2,r3,r4;
reg [15:0] p1,p2,p3,p4;
wire error;
wire [7:0]q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;
initial
begin
clk=1'b1;
forever #50 clk=~clk;
end
always
begin
#100
rst=1'b1;rst1=1'b1;s1=2'b00;s2=2'b00;s3=2'b00;c1=5;c2=10;c3=15;c4=10;r1=1;r2=2;
r3=3;r4=4;p1=110;p2=150;p3=190;p4=230;
#100 rst=1'b0;
#100
c1=2;c2=4;c3=6;c4=8;r1=2;r2=3;r3=4;r4=5;p1=60;p2=70;p3=100;p4=120;//fault at
p2 80-70
#100
c1=4;c2=8;c3=12;c4=16;r1=3;r2=4;r3=5;r4=6;p1=120;p2=160;p3=200;p4=240;
#100 c1=3;c2=6;c3=9;c4=12;r1=4;r2=5;r3=6;r4=7;p1=90;p2=120;p3=150;p4=180;
#100
s1=2'b01;s2=2'b01;s3=2'b01;c1=0;c2=0;c3=0;c4=0;r1=0;r2=0;r3=0;r4=0;p1=0;p2=0;
p3=0;p4=0;rst1=1'b0;
#100 s1=2'b10;s2=2'b10;s3=2'b10;
#100 s1=2'b11;s2=2'b11;s3=2'b11;
43
#200 rst=1'b1;rst1=1'b1;
end
light_abft
u0(clk,rst,rst1,s1,s2,s3,c1,c2,c3,c4,r1,r2,r3,r4,p1,p2,p3,p4,error,q11,q12,q13,q14,q21,
q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
endmodule
module light_abft(clk,rst,rst1,s1,s2,s3,c1,c2,c3,c4,r1,r2,r3,r4,p1,p2,p3,p4,error,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
input clk,rst,rst1;
input [1:0] s1,s2,s3;
input [7:0] c1,c2,c3,c4,r1,r2,r3,r4;
input [15:0] p1,p2,p3,p4;
output reg error;
output [7:0] q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;
wire [7:0] a1,a2,a3,a4,b1,b2,b3,b4,m1,m2;
wire [15:0] q1,q2,q3,q4,m3;
wire [15:0] mac_op,acc_op;
//wire [15:0] acc_op_f;
wire [7:0]
p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44;
assign p11=110;assign p12=150;assign p13=190;assign p14=230;
assign p21=60;assign p22=80;assign p23=100;assign p24=120;assign p22_f=70;
assign p31=120;assign p32=160;assign p33=200;assign p34=240;
assign p41=90;assign p42=120;assign p43=150;assign p44=180;
acc_8 u0(clk,rst,c1,a1);
acc_8 u1(clk,rst,c2,a2);
acc_8 u2(clk,rst,c3,a3);
acc_8 u3(clk,rst,c4,a4);
mux_41_8 u4(a1,a2,a3,a4,s1,m1);
44
acc_8 u5(clk,rst,r1,b1);
acc_8 u6(clk,rst,r2,b2);
acc_8 u7(clk,rst,r3,b3);
acc_8 u8(clk,rst,r4,b4);
mux_41_8 u9(b1,b2,b3,b4,s2,m2);
mac_8 u10(clk,rst1,m1,m2,mac_op);
acc_16 u11(clk,rst,p1,q1);
acc_16 u12(clk,rst,p2,q2);
acc_16 u13(clk,rst,p3,q3);
acc_16 u14(clk,rst,p4,q4);
mux_41_16 u15(q1,q2,q3,q4,s3,m3);
acc_16 u16(clk,rst1,m3,acc_op);
//assign acc_op_f={acc_op[15:1],~acc_op[0]};
matrix_code
u17(error,p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
always@(posedge clk)
begin
if(rst==1'b1 && rst1==1'b1 && mac_op!=acc_op)
error=1'b1;
else
error=1'b0;
end
endmodule
module acc_8(clk,rst,ip,op);
input clk,rst;
input [7:0]ip;
output [7:0]op;
reg [7:0]acc;
45
always@(posedge clk)
begin
if(rst)
acc=8'b0;
else
acc=acc+ip;
end
assign op=acc;
endmodule
module acc_16(clk,rst,ip,op);
input clk,rst;
input [15:0]ip;
output [15:0]op;
reg [15:0]acc;
always@(posedge clk)
begin
if(rst)
acc=16'b0;
else
acc=acc+ip;
end
assign op=acc;
endmodule
module mux_41_8(a,b,c,d,s,q);
input [7:0] a,b,c,d;
input [1:0] s;
46
output reg [7:0] q;
always@(a,b,c,d,s)
begin
case(s)
2'b00:q=a;
2'b01:q=b;
2'b10:q=c;
2'b11:q=d;
endcase
end
endmodule
module mux_41_16(a,b,c,d,s,q);
input [15:0] a,b,c,d;
input [1:0] s;
output reg [15:0] q;
always@(a,b,c,d,s)
begin
case(s)
2'b00:q=a;
2'b01:q=b;
2'b10:q=c;
2'b11:q=d;
endcase
end
endmodule
module mac_8(clk,rst,a,b,mac_op);
input clk,rst;
input [7:0]a,b;
47
output [15:0]mac_op;
wire [15:0]mul_op;
reg [15:0]acc;
assign mul_op=a*b;
always@(posedge clk)
begin
if(rst)
acc=16'b0;
else
acc=acc+mul_op;
end
assign mac_op=acc;
endmodule
module
matrix_code(err,p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p
44,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
input err;
input [7:0] p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44;
output [7:0] q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;
wire [7:0]pa1,pa2,pa3,pa4,pf1,pf2,pf3,pf4;
wire p1,p2,p3,p4;
wire [7:0]z11,z12,z13;
wire [7:0]z21,z22,z23;
wire [7:0]z31,z32,z33;
wire [7:0]z41,z42,z43;
assign p1=|(pa1^pf1);
assign p2=|(pa2^pf2);
assign p3=|(pa3^pf3);
assign p4=|(pa4^pf4);
matrix_code_row u0(err,p11,p12,p13,p14,z11,z12,z13,q11,q12,q13,q14);
matrix_code_row u1(err,p21,p22_f,p23,p24,z21,z22,z23,q21,q22,q23,q24);
matrix_code_row u2(err,p31,p32,p33,p34,z31,z32,z33,q31,q32,q33,q34);
matrix_code_row u3(err,p41,p42,p43,p44,z41,z42,z43,q41,q42,q43,q44);
endmodule
module matrix_code_row(err,p11,p12,p13,p14,z11,z12,z13,q11,q12,q13,q14);
49
input err;
input [7:0] p11,p12,p13,p14,z11,z12,z13;
output reg [7:0] q11,q12,q13,q14;
wire [7:0]c11,c12,c13;
wire [7:0]zz11,zz12,zz13;
always@(err,p11,p12,p13,p14,zz11,zz12,zz13)
begin
if(err==1'b1)
begin
if(zz11==8'b0 && zz12==8'b0 && zz13==8'b0)
begin
q11 <= p11;
q12 <= p12;
q13 <= p13;
q14 <= p14;
end
else if(zz11!=8'b0 && zz12!=8'b0 && zz13!=8'b0)
begin
q11 <= z11 ^ p12 ^ p13;
q12 <= p12;
q13 <= p13;
q14 <= p14;
50