0% found this document useful (0 votes)
9 views59 pages

Final Year Report

The project report presents a lightweight Algorithm-Based Fault Tolerance (Light ABFT) method for error detection and correction in matrix computations using systolic arrays, particularly for applications in artificial intelligence and high-performance computing. It achieves a high error detection rate of 97.4% while maintaining low resource usage by integrating Hamming Code for single-bit error correction and parity codes for multi-bit error detection. The proposed architecture is implemented on an FPGA platform, ensuring efficient and reliable performance in real-time applications, addressing the challenges of fault tolerance in complex computing systems.

Uploaded by

BALAJI S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views59 pages

Final Year Report

The project report presents a lightweight Algorithm-Based Fault Tolerance (Light ABFT) method for error detection and correction in matrix computations using systolic arrays, particularly for applications in artificial intelligence and high-performance computing. It achieves a high error detection rate of 97.4% while maintaining low resource usage by integrating Hamming Code for single-bit error correction and parity codes for multi-bit error detection. The proposed architecture is implemented on an FPGA platform, ensuring efficient and reliable performance in real-time applications, addressing the challenges of fault tolerance in complex computing systems.

Uploaded by

BALAJI S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

MATRIX CODE BASED ERROR

DETECTION AND CORRECTION FOR


MATRIX COMPUTATION IN SYSTOLIC
ARRAY

A Project Work REPORT


Submitted by

ABINAYA R (71812102003)
BALA AMUDHAN S (71812102021)
BALAJI S (71812102022)

In partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING
in

DEPARTMENT OF ELECTRONICS AND COMMUNICATION

ENGINEERING

SRI RAMAKRISHNA ENGINEERING COLLEGE, COIMBATORE

ANNA UNIVERSITY : CHENNAI 600025

May 2025
SRI RAMAKRISHNA ENGINEERING COLLEGE
COIMBATORE

ANNA UNIVERSITY: CHENNAI 600025

BONAFIDE CERTIFICATE

Certified that this project report entitled “: Matrix code based Error Detection and
Correction for Matrix Computation in Systolic Array” is the bonafide work of
ABINAYA.R (71812102003), BALA AMUDHAN.S (71812102021), BALAJI.S
(71812102022) who carried out the 20EC282- PROJECT WORK under my
supervision.

SIGNATURE SIGNATURE
SUPERVISOR HEAD OF THE DEPARTMENT
Mr. M . SELVAGANESH Dr. M. JAGADEESHWARI,
Assistant Professor (Sr.G) Professor and Head
Department of ECE Department of ECE
Sri Ramakrishna Engineering College Sri Ramakrishna Engineering College
Coimbatore - 641022 Coimbatore – 641022

Submitted for Project Viva Voce Examination held on __________________

Internal Examiner External Examiner


ABSTRACT

Matrix multiplication (MxM) is a fundamental operation in artificial intelligence and


high-performance computing, particularly in applications such as convolutional neural
networks (CNNs). However, its implementation on FPGA-based systolic arrays is
prone to persistent faults, which can compromise computational accuracy and system
reliability. Traditional fault tolerance methods often require significant hardware
resources, while conventional Algorithm-Based Fault Tolerance (ABFT) techniques are
primarily effective against transient errors. In this project, we propose Light ABFT, a
lightweight and cost-efficient fault tolerance method based on Matrix Code-Based
Error Detection and Correction (MCEDEC). Our approach achieves a high error
detection rate of 97.4% with a low false detection rate of 2.39%, striking a balance
between reliability and resource efficiency. Additionally, we integrate Hamming Code
mechanisms to enhance the system’s error correction capabilities, particularly for
single-bit and detectable multi-bit faults.

iii
ACKNOWLEDGEMENT

We have immense pleasure in expressing our wholehearted thankfulness to


Shri.R.Sundar, Managing Trustee, SNR Sons Charitable Trust and Shri. S. Narendran,
Joint Managing Trustee, SNR Sons Charitable Trust for giving us the opportunity to
study in our esteemed college and for very generously providing more than enough
infrastructural facilities for us to get molded as a complete engineer.
We wish to express our profound and sincere gratitude to Dr. A. Soundarrajan,
Principal, for inspiring us with his Engineering Wisdom. We also wish to record our
heartfelt thanks for motivating and guiding us to become Industry ready engineers with
inter-disciplinary knowledge and skillset with her multifaceted personality.
We extend our indebted thankfulness to Dr.M.JAGADEESHWARI, Professor
and Head, for guiding and helping us in all our activities to develop confidence and skills
required to meet the challenges of the industry. We also express our gratitude for giving
us support and guidance to complete the project duly.
We owe our deep gratitude to our project supervisor Mr.M.SELVAGANESH,
Assistant Professor (Sr.G) Department of ECE who took keen interest in our project
work and guided us all along with all the necessary inputs required to complete the
project work.
We express our sincere thanks to our project coordinator
Dr.H.MANGALAM, M.E.,Ph.D., Professor ECE the evaluators teaching faculty
members and supporting staff members of the department for evaluating the project and
providing valuable suggestions for improvements.

iv
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO


NO

ABSTRACT iii

LIST OF FIGURES vii

LIST OF TABLES viii

LIST OF ABBREVIATIONS ix

1 INTRODUCTION

INTRODUCTION 1

2 LITERATURE SURVEY

2.1 INTRODUCTION 7

2.2 BACKGROUND 8

2.3 SUMMARY 17

3 METHODOLOGY AND WORKING PRINCIPLE

3.1 PROBLEM IDENTIFICATION 18

3.2 PROBLEM STATEMENT 18

3.3 OBJECTIVE 19

3.4 PROPOSED METHOD 19

3.5 WORK FLOW OF PROPOSED METHOD 20

3.6 INNOVATIVENESS OF THE SOLUTION 26

3.7 IMPACT OF THE PRODUCT 28

v
3.8 UNIQUENESS OF THE PRODUCT 28

3.9 SUMMARY 29

4 SOFTWARE AND HARDWARE DESCRIPTION

4.1 HARDWARE DESCRIPTION 30

4.2 SOFTWARE DESCRIPTION 31

4.3 SUMMARY 37

5 RESULTS AND DISCUSSIONS 38

5.1 ANALYSIS 41

6 CONCLUSION AND SCOPE FOR FUTURE WORK

6.1 CONCLUSION 42

6.2 SCOPE FOR FUTURE WORK 42

REFRENCES 42

vi
LIST OF FIGURES

FIGURE NO TOPIC PAGE NO

Fig 1 The Input Matrices A & B are Multiplied and 36


Accumulated using Light ABFT method

Fig 2 In p22 location Error is injected and the Output 36


Matrix P is obtained using Light ABFT method

Fig 3 The Input Matrices and the Error injected Output 37


Matrix are compared using Light ABFT method
and Error is detected

Fig 4 The Output Matrix P is corrected using Matrix 37


Code which contains Hamming Code for rows
and Parity Code for columns and obtains
Matrix Q

vii
LIST OF TABLES

TABLE 1: INPUT MATRICES

TABLE 2: ERROR DETECTED AND ERROR


CORRECTED OUTPUT MATRICES

viii
LIST OF ABBREVIATIONS

FPGA Field Progrmmable Gate


Array
LE Logic elements

VHDL VHSIC Hardware


description language
HDL Hardware description
langauge
EDA Electronic Design
automation

ix
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

As high-performance computing needs are growing at a fast rate with applications


such as machine learning, image processing, and scientific simulations, systolic arrays
have become a highly efficient hardware model for executing pipelined and parallel
matrix operations. But with increasing complexity in systems and more data-intensive
computations, reliability and fault tolerance must be ensured, particularly in the face of
soft errors and bit-flips due to environmental factors or hardware failure. Conventional
error correction techniques, including Hamming codes, provide straightforward and
efficient solutions for single-bit errors but are inadequate in the case of multi-
dimensional data structures like matrices in systolic arrays. To overcome this,
Algorithm-Based Fault Tolerance (ABFT) techniques have been suggested, which take
advantage of the mathematical properties of computations to detect errors. Full ABFT
schemes, however, can impose considerable computational overhead. This work
suggests a light-weight ABFT (LABFT)-based error detection method along with a
hybrid correction strategy involving Hamming codes for row-level correction and
parity codes for column-level error detection, thus ensuring efficiency as well as
robustness. The whole system is designed and tested on an FPGA platform for real-
time operation and hardware viability.
In modern computing systems, especially those used in high-performance and
mission-critical applications, reliability and fault tolerance have become key design
considerations. As technology scales down and system complexity increases, the
susceptibility of integrated circuits to transient and permanent faults has grown
significantly. Matrix multiplication, being a fundamental operation in various domains
such as digital signal processing, artificial intelligence, machine learning, and scientific
computing, is frequently accelerated using hardware-friendly structures like systolic
1
arrays. These architectures offer a highly parallel and pipelined method of computation
that greatly enhances performance. However, due to their regular and densely packed
structure, systolic arrays are particularly vulnerable to faults that can propagate through
multiple stages of computation, causing significant data corruption and system failure.
To address these challenges, error detection and correction mechanisms are integrated
into hardware to identify and mitigate the impact of such faults. Traditionally,
Algorithm-Based Fault Tolerance (ABFT) techniques have been employed to introduce
checksums and invariant properties within matrix operations to detect anomalies during
computation. While ABFT is known for its accuracy in fault detection, its
implementation in hardware can be resource-intensive, making it less suitable for
embedded systems and FPGA-based solutions with limited area and power budgets.
This project introduces a refined solution that leverages Lightweight Algorithm-Based
Fault Tolerance (Light ABFT) as a core strategy for error detection. Unlike traditional
ABFT approaches, Light ABFT is optimized for resource efficiency and minimal
overhead, making it ideal for FPGA platforms and energy-constrained environments. It
achieves real-time fault detection by calculating column and row checksums for the
input matrices prior to multiplication, and then validating the result matrix using
checksum comparisons. The method effectively detects computation errors without the
need for excessive storage or complex hardware logic.
In addition to detection, error correction is vital to ensure continued operation in
the presence of faults. This design incorporates Hamming Code, a well-established
technique capable of correcting single-bit errors and detecting double-bit errors. In the
proposed architecture, Hamming Code is applied at the row level of the matrix to ensure
correction capabilities, while parity checks are implemented at the column level for
quick detection of anomalies. This two-dimensional error coding scheme enhances
fault coverage and resilience, without introducing significant hardware complexity.
To realize and validate this design, the project will be implemented using Verilog
HDL and synthesized on a Xilinx FPGA platform using the Vivado design suite. FPGAs
offer a reconfigurable and flexible environment for testing fault-tolerant architectures
under realistic workloads. Furthermore, fault injection mechanisms will be employed

2
to simulate various fault scenarios, assessing the system’s robustness and fault recovery
capabilities under stress.
The proposed architecture not only provides a highly reliable platform for matrix
computations but also ensures that the system maintains high throughput and low
latency — critical requirements for real-time applications. This fault-tolerant design
has significant potential for deployment in edge computing devices, medical
instruments, aerospace systems, and other domains where system failures can have
severe consequences. And the design offers room for further enhancement through the
integration of adaptive error mitigation strategies, which could allow the system to
modify its tolerance levels based on operational conditions. Additionally, dynamic
partial reconfiguration on FPGA platforms can be leveraged to isolate and recover
faulty modules during runtime, paving the way toward self-healing systems.
In summary, the project aims to create a scalable and resource-efficient solution
for fault-tolerant matrix computation using systolic arrays. By combining the benefits
of Light ABFT with classic coding techniques such as Hamming and parity codes, and
by harnessing the flexibility of FPGA implementation, this work addresses both
detection and correction of errors in a balanced manner. The outcome is a robust
hardware architecture that prioritizes performance, reliability, and adaptability — key
attributes in the design of modern computing systems.
With the rapid growth of data-intensive applications such as artificial intelligence,
machine learning, and scientific computing, efficient and reliable matrix computation
has become increasingly important. Systolic arrays, due to their regular structure and
high parallelism, have emerged as a prominent hardware architecture for accelerating
matrix multiplication tasks. However, as technology continues to scale down and
devices operate under harsher environmental and voltage conditions, these
architectures become more vulnerable to transient and permanent faults, which can
compromise system reliability and accuracy.
To mitigate these risks, fault tolerance techniques must be embedded within the
architecture. Traditional Algorithm-Based Fault Tolerance (ABFT) methods provide
effective detection mechanisms using mathematical invariants and checksum validation
during computations. However, these approaches often introduce significant area and
3
computational overhead, limiting their practical use in resource-constrained platforms
such as FPGAs.
This project introduces a novel approach combining Lightweight ABFT (Light
ABFT) and a Matrix Code-Based Error Detection and Correction (MCEDEC) scheme,
tailored for systolic array-based matrix multiplication. Light ABFT leverages
simplified checksum calculations on input matrices to provide real-time fault detection
with minimal overhead. Its efficiency makes it particularly suitable for FPGA
deployment where area and power consumption are critical.
For error correction, the system integrates a hybrid scheme using Hamming Codes
and Parity Checks. Hamming Code is employed along the rows of the matrix to correct
single-bit faults, while parity checks along the columns enable the detection of multi-
bit errors. This cross-dimensional coding structure enhances overall reliability without
a significant increase in resource utilization.
The proposed architecture is implemented using Verilog HDL and synthesized on
a Xilinx FPGA using Vivado tools. Fault injection mechanisms are used to evaluate the
system’s error detection and correction capabilities under various scenarios.
In essence, the project presents a robust, low-overhead, and scalable fault-tolerant
design for systolic array-based matrix multiplication, suitable for deployment in
mission-critical and embedded computing systems.
Scientific simulations, signal processing, machine learning, and other computing
domains all depend on matrix calculations. Systolic arrays' capacity for high-speed, parallel
processing makes them popular for speeding up these processes. But as these systems get
bigger and more complicated, they are more susceptible to hardware defects including bit
flips, temporary mistakes, and permanent malfunctions. The precision and dependability
of calculations may be jeopardised by these errors, particularly in crucial applications.
Current fault-tolerant methods frequently have large overheads or aren't tailored for
systolic array shape. Thus, an error detection and correction system that is scalable,
effective, and lightweight and specifically tailored for matrix operations on systolic arrays
is desperately needed. In order to overcome this difficulty, this research will investigate
matrix code-based methods that can identify and fix mistakes with the least amount of
hardware complexity and performance damage. Hardware redundancy and intricate error-
4
correcting codes are two examples of traditional fault-tolerant techniques that frequently
result in higher resource, latency, and power consumption. These characteristics are
undesirable for systolic array implementations where efficiency is crucial. Furthermore, a
large number of current methods are not especially designed to handle the regular and data-
dependent character of matrix calculations in systolic designs. Because scientific
computing and artificial intelligence applications require both high speed and high
accuracy, it is crucial to create specialised methods that balance system performance and
fault coverage. With its organised and computationally aware procedures that complement
systolic processing patterns, matrix code-based error detection and repair offers a viable
path . Designing and incorporating such methods in a way that maintains the array's
computing performance while managing possible faults is the difficult part.
This project focuses on improving the reliability of matrix computations
performed using systolic arrays, which are widely used in high-performance computing
due to their ability to process data in parallel. As these systems are increasingly adopted
in critical domains like machine learning and scientific simulations, ensuring accurate
results in the presence of hardware faults becomes essential. Errors such as bit flips,
timing faults, and permanent hardware failures can significantly affect computation
outcomes. While several fault-tolerant methods exist, many are either resource-
intensive or not optimized for systolic architectures. This work aims to address these
challenges by developing a matrix code-based error detection and correction technique
that is both efficient and scalable. The proposed approach leverages the structured
nature of matrix operations to detect and correct errors with minimal overhead,
ensuring reliable performance without compromising system speed or hardware
efficiency. By creating an effective and scalable error detection and correction method
based on matrix codes, our work seeks to overcome these issues. The suggested method
ensures dependable performance without sacrificing system speed or hardware
efficiency by taking use of the structured nature of matrix operations to identify and fix
faults with little overhead.

5
CHAPTER 2

LITERATURE SURVEY

Error control methods based on matrix codes have been investigated by


researchers. These approaches are especially helpful because they make use of the
data's and the computation's matrix structure, which enables effective and localised
error detection and repair. Research indicates that incorporating matrix codes into
systolic arrays allows for real-time error correction with no effect on system
performance. Previously, a number of papers have suggested error recovery algorithms,
redundancy techniques, and encoding schemes specifically designed for systolic array
topologies. The goal has been to keep computations accurate and fast while reducing
overhead. This subject is still very relevant and developing since systolic arrays are
being used more and more in embedded systems and AI accelerators.

2.1 INTRODUCTION
Matrix operations are a fundamental component in various fields such as
scientific computing, signal processing, and artificial intelligence. To perform these
computations efficiently, systolic arrays have become a widely adopted hardware
architecture due to their regular structure, high parallelism, and fast data throughput.
However, as the scale and complexity of these systems increase, ensuring reliability in
the presence of faults becomes essential. Hardware faults—such as transient errors, bit
flips, or permanent failures—can lead to incorrect results during matrix computations.
To address this, researchers have developed various fault-tolerant techniques, including
matrix code-based error detection and correction and algorithm-based fault tolerance
(ABFT). These methods introduce redundancy at different levels to identify and
mitigate errors with minimal performance loss. This literature survey explores key
contributions in this area, focusing on both foundational approaches and recent
advancements, with the goal of understanding how fault resilience can be effectively
integrated into systolic array-based matrix computation systems.
6
2.2 BACKGROUND

K. Huang and J. A. Abraham, "Algorithm-Based Fault Tolerance for Matrix


Operations," IEEE Transactions on Computers, vol. C-33, no. 6, pp. 518–528, June
1984.
The research paper by K. Huang and J.A. Abraham introduced a significant
concept called Algorithm-Based Fault Tolerance (ABFT), which focuses on improving
the reliability of matrix operations using mathematical redundancy. Instead of relying
solely on hardware mechanisms for fault detection, ABFT integrates error checking
and correction into the structure of the algorithm itself. This approach is particularly
valuable in matrix operations, such as multiplication, which are heavily used in
scientific computing, simulations, and real-time systems. The core idea is to enhance
the input matrices with additional encoded information—typically by adding row and
column checksums. For instance, during a matrix multiplication operation, extra rows
and columns can be appended to the matrices to store the sum of their respective
elements. These checksums act as reference values that help verify the accuracy of the
output matrix. When the final computation is performed, any mismatch between
expected and actual results indicates the presence of an error. In many cases, such
errors can not only be detected but also corrected, especially when only a single fault
occurs. The paper demonstrated that this method is effective against various types of
errors, including those caused by faulty arithmetic operations, memory corruption, or
data transfer issues. An important advantage of ABFT is its efficiency—it adds very
little overhead to the system compared to traditional hardware-based error protection.
This makes it especially useful for applications where power, performance, or area
constraints are important.
One of the main highlights of the work is how well ABFT fits into systolic array
architectures. Systolic arrays process data in a regular, rhythmic flow across a network
of processing elements, and this regularity makes it easy to integrate the redundancy
checks without disrupting performance. The authors showed that fault-tolerant

7
processing could be achieved in real-time as data moves through the array, making it
ideal for hardware implementations of large-scale matrix operations.
This paper laid the groundwork for future developments in error-tolerant
computing. Over the years, the ABFT method has been adapted to a variety of matrix-
based algorithms and is now considered a fundamental technique in high-performance
and fault-resilient computing. Its relevance has grown in modern computing systems,
especially in fields like artificial intelligence and embedded processing, where
reliability and speed are both crucial

Y. Wang, Y. Chen, and H. Zhou, "Design and Implementation of a Fault-Tolerant


Systolic Array Processor on FPGA," in Proc. IEEE Int. Conf. Field-Programmable
Technology (FPT), 2018, pp.

The authors of this work discuss the growing demand for hardware accelerators
to be reliable, particularly in systolic array processors, which are frequently employed
for high-speed matrix calculations in fields such as scientific applications, machine
learning, and signal processing. As these systems become more complicated and scale,
errors—whether brought on by ageing, environmental noise, or hardware flaws—can
have a big influence on the accuracy of the system. The authors address this by
proposing a fault-tolerant systolic array architecture, which is tested and implemented
with FPGA technology. The paper's main contribution is the creation of a systolic array
processor that can maintain accuracy even when processing elements (PEs)
malfunction. Rather of employing conventional redundancy techniques that
significantly raise hardware costs, the authors offer an effective fault tolerance solution
that strikes a compromise between resource consumption and reliability. By using error
detection and recovery techniques built into the systolic architecture, their approach
allows the array to respond to mistakes dynamically without requiring a complete
system reconfiguration. Their approach's usage of reconfigurable PEs is a crucial
component. The system isolates the malfunctioning unit and reroutes the computation
through spare or healthy elements when a PE fault is discovered. This dynamic adaption
guarantees minimum performance loss while maintaining continuous operation.
8
Additionally, the localised and lightweight defect detection method eliminates the need
for costly worldwide checks or intricate monitoring systems. An FPGA implementation
of the fault-tolerant design shows how useful the suggested approach is. The authors
demonstrate through trials that their design increases dependability significantly while
retaining a manageable space and power overhead. Overall, the design exemplifies how
intelligent architectural modifications can enhance system reliability without incurring
excessive costs, making it a valuable reference for researchers and engineers working
in fault-tolerant hardware design.

S. Zhang and K. Roy, "A Low-Overhead Error Detection Scheme for Systolic
Array-Based Matrix Multiplication Accelerators," IEEE Trans. Very Large Scale
Integration (VLSI) Syst., vol. 29, no. 10, pp. 2671-2683, Oct. 2021.

The goal of S. Zhang and K. Roy's paper "A Low-Overhead Error Detection
Scheme for Systolic Array-Based Matrix Multiplication Accelerators" is to increase the
dependability of systolic array processors used in matrix multiplication tasks, which
are found in many domains such as scientific computing and machine learning. Systolic
arrays have drawn a lot of interest for speeding up large-scale matrix operations and
are very effective in parallel processing. However, guaranteeing error resilience is one
of the main hurdles with these systems, particularly when they are implemented in
settings where radiation-related malfunctions, hardware flaws, or other problems might
have a substantial effect on performance. The authors present a novel error detection
technique that smoothly incorporates into the systolic array without introducing a
significant amount of overhead in order to overcome this difficulty. The suggested
method makes use of error detection codes, which are intended to identify errors in the
calculations made by the systolic array with the least possible effect on the accelerator's
performance. The system can monitor its own calculations without the need for extra
or redundant hardware components thanks to these error detection algorithms, which
make use of the array's built-in structure. One of the paper's main features is the error
detection scheme's inexpensive implementation. The authors stress the significant

9
computational or power costs associated with typical fault tolerance techniques like
employing sophisticated error-correction algorithms or adding redundancy.
In the end, the work offers a scalable and effective way to improve systolic array fault
tolerance, resulting in more durable and dependable accelerators for high-performance
computing. Systems that need both a lot of processing power and little overhead, such
AI accelerators and embedded systems, benefit greatly from this strategy.

M. G. K. R. Reddy, V. R. Pudi, and K. L. Hsiao, "Matrix-Based Error Detection


and Correction for Parallel Computations," IEEE Transactions on Parallel and
Distributed Systems, vol. 12, no. 6, pp. 618-631, Jun. 2001.

Within the framework of matrix operations, the authors of the paper


"Matrix-Based Error Detection and Correction for Parallel Computations" investigate
ways to improve the dependability of systems that use parallel computation. High-
performance computing operations that frequently involve big matrices, such data
processing and simulations, frequently employ parallel computing. However, these
systems are prone to errors because of gearbox problems, hardware malfunctions, or
computing errors, which can cause serious problems with the outcomes of
computations. A matrix-based error detection and correction system that is effective
and flexible enough to work in parallel computing environments is introduced in the
paper to address this. Designing a fault-tolerant technique that can identify and fix
faults in real-time without negatively impacting the efficiency of parallel computations
is the main goal of the study. The authors employ redundancy strategies based on
matrices, incorporating error detection codes into matrix operations. By using this
method, errors can be found at the matrix level instead of depending on more costly or
ineffective system-wide tests. The study provides a thorough analysis of the suggested
mistake detection and correction plan, complete with mathematical models for error
detection and performance indicators that measure the method's efficacy. The outcomes
demonstrate how effective the matrix-based approach is, obtaining reduced overhead
while preserving excellent accuracy in error identification and rectification. Overall,
this study offers a useful strategy for enhancing parallel computational systems'
10
dependability. By concentrating on matrix-level error detection and repair, the authors
offer a workable method that is broadly applicable in various parallel computing
contexts.

S. D. Sharma and S. S. Iyengar, "Fault-Tolerant Matrix Computations in Systolic


Arrays," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 9, pp.
963–971, Sept. 1994.

The authors of the study "Fault-Tolerant Matrix Computations in Systolic


Arrays," S. D. Sharma and S. S. Iyengar, provide a way to improve systolic array
reliability during matrix computations by implementing fault-tolerance techniques.
Often utilised for applications like signal processing and matrix multiplication, systolic
arrays are extremely effective hardware architectures made for parallel processing.
Systolic arrays have been widely used in high-performance computing because of their
constant structure and capacity for parallel processing. Integrating redundancy into the
systolic array calculation is the main concept of their approach. Additional processing
components and communication channels allow the system to withstand malfunctions
in one or more of the array's modules. The remaining functional elements can still do
the matrix computation accurately in the event that certain PEs fail because to this
redundancy, which eliminates the need to discard the entire operation. The method
works especially well for matrix multiplication, which entails concurrent, repeated
calculations that are easily adaptable to incorporate error detection and repair codes .
With the least amount of resources, these methods assist in identifying inconsistencies
in intermediate findings and fixing them. In order to maintain excellent systolic array
performance even when fault tolerance techniques are engaged, the redundancy
strategy is designed to impose as minimal overhead as feasible. To show the efficacy
of their suggested fault-tolerant systolic array, the authors present comprehensive
theoretical analysis and simulation results. Their method keeps parallel matrix
computations running at high throughput while drastically lowering the likelihood of
errors. The method is appropriate for real-time applications that need both speed and

11
dependability since they also demonstrate that the additional fault tolerance has little
effect on the systolic array's overall performance.

S. K. Gupta, M. G. Jafari, and M. R. Hashemi, "A Fault-Tolerant Algorithm for


Matrix Multiplication Using Systolic Arrays," IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 21, no. 1, pp. 104-116, Jan. 2013.
To improve the dependability of systolic arrays during matrix multiplication,
S. K. Gupta, M. G. Jafari, and M. R. Hashemi present a novel fault-tolerant algorithm
in their work "A Fault-Tolerant Algorithm for Matrix Multiplication Using Systolic
Arrays." High-speed matrix calculations are frequently performed using systolic arrays,
especially in domains like scientific simulations, machine learning, and image
processing. Nevertheless, the precision and efficiency of calculations may be
jeopardised by these systems' susceptibility to hardware issues, including processing
element (PE) breakdowns. By creating a fault-tolerant algorithm, the authors solve this
problem and guarantee that matrix multiplication can continue as intended even when
errors occur. The systolic array architecture's unique error detection and repair
mechanisms form the basis of their methodology. This approach can be used in high-
performance applications where computational efficiency is crucial because it requires
little additional resources, unlike typical fault-tolerance strategies that depend on
redundant hardware components or intricate recovery processes. The foundation of the
fault-tolerant algorithm is parity checks and redundant data applied to the matrix
elements under processing. This allows for real-time defect detection and correction
during matrix multiplication. Error-detecting codes are implemented in the suggested
approach to track the data flow through the systolic array during matrix multiplication.
The system can rapidly identify the error and use the redundant data to recalculate the
impacted part of the calculation if a processing element malfunction is found. This
guarantees the accuracy of the outcome even in the event of errors. Moreover, the
algorithm is made to manage various fault situations, offering a reliable solution for
systems in the real world where there may be a greater chance of several mistakes
occurring at once. This research is especially useful for applications in areas where high

12
reliability and computational speed are essential, such as embedded systems, real-time
processing, and scientific computing.

C. H. Kim and M. J. Irwin, "Fault-Tolerant Systolic Array for Matrix


Multiplication," IEEE Transactions on Computers, vol. 42, no. 4, pp. 448–453, Apr.
1993.

For parallel computation, systolic arrays are very effective, particularly for
matrix operations like multiplication, which is essential for many computationally
demanding activities like machine learning, scientific simulations, and image
processing. But these systems are prone to errors, whether from hardware malfunctions
or outside disruptions, which can impair their functionality and lead to inaccurate
calculations. This study suggests a fault-tolerant systolic array architecture that can
continue to operate correctly even if one or more of the processing elements (PEs)
experience a problem. Systolic arrays work incredibly well for parallel computation,
especially when it comes to matrix operations like multiplication, which are crucial for
a lot of computationally intensive tasks like image processing, scientific simulations,
and machine learning. However, these systems can malfunction and produce erroneous
estimations due to a variety of factors, including hardware issues and external
disturbances. According to this study, a fault-tolerant systolic array design can function
properly even in the event that one or more of the processing elements (PEs) encounter
an issue. Through simulation experiments and theoretical analysis, the authors
demonstrate the effectiveness of their fault-tolerant systolic array design.
Finally, by providing a workable solution to the issue of fault tolerance in
systolic arrays, Kim and Irwin's paper makes a significant addition to the field of
parallel computing. The authors offer a technique that guarantees the system can carry
out matrix multiplication dependably even in the event of hardware malfunctions by
integrating error-detecting and error-correcting capabilities straight into the systolic
array architecture. This work lays the groundwork for future studies on enhancing the
resilience and effectiveness of parallel computing systems while also greatly increasing
the fault tolerance of systolic arrays.
13
R. Sridhar and V. Srinivasan, "FPGA Implementation of Matrix Multiplication
Using Systolic Arrays," International Journal of VLSI Design & Communication
Systems (VLSICS), vol. 6, no. 2, pp. [insert page numbers], April 2015.

A grid of processing elements (PEs) connected in a predictable fashion


makes up systolic arrays. These PEs process data synchronously while operating in
parallel. Systolic arrays' main benefit over conventional, sequential methods for matrix
multiplication is their capacity to handle numerous data components at once, which
drastically cuts down on the amount of time needed to compute huge matrix products.
The study investigates the implementation of systolic arrays using FPGA technology,
which offers an adaptable and effective platform for parallel matrix multiplication. The
authors explain how to map the systolic array architecture onto FPGA, which provides
the freedom to modify the hardware to meet the needs of certain applications. High-
speed data processing made possible by FPGA implementation makes it perfect for
matrix tasks requiring a lot of processing power. The method for decomposing matrix
multiplication into smaller, parallel tasks that can be split up among the different
processing devices is described in the study. Every PE contributes to the final matrix
product by carrying out a fraction of the overall computation. This parallel method
optimises matrix multiplication efficiency while cutting down on total calculation time.
Optimising the systolic array architecture for FPGA is one of the paper's primary
contributions. The authors go into a number of design factors that are crucial when
putting the array on FPGA devices, including clock speed, data flow control, and
hardware resource usage. The authors guarantee a quick and resource-efficient FPGA
implementation by adjusting these parameters. The authors offer strategies to reduce
latency and increase throughput by addressing problems that occur when performing
matrix multiplication in parallel, such as data synchronisation and buffering. Further
optimisation in custom hardware design is made possible by the ability to implement
such architectures on FPGA, especially in contexts with limited resources.

14
M. El-Khamy and J. Lee, "Survey of Fault-Tolerant Techniques for Deep Neural
Networks and Systolic Arrays," ACM Computing Surveys (CSUR), vol. 54, no. 8,
pp. 1–35, 2022.

A key section of the survey categorizes fault-tolerant techniques into


algorithm-level, hardware-level, and system-level strategies. At the algorithm level,
techniques such as quantization-aware training, weight pruning, and retraining are
shown to mitigate the effects of faults by enhancing robustness during model
development. The hardware-level techniques include redundant computing units, error
correction codes (ECC) in memory, fault masking, and reconfigurable architectures that
bypass faulty units. These methods are tailored to systolic arrays by preserving
parallelism while detecting and correcting errors with minimal performance overhead.
The authors also analyze cross-layer optimization approaches, where collaboration
between hardware and algorithm design enables more effective fault tolerance. For
instance, some architectures adapt dynamically by modifying precision or routing
workloads away from faulty units, depending on runtime diagnostics. Additionally, the
paper discusses fault injection tools and benchmarking methods, which are essential
for evaluating the robustness of DNN models and systolic array implementations under
various fault conditions. Importantly, the paper doesn’t only review existing techniques
but also highlights open research challenges, such as balancing fault tolerance with
energy efficiency, ensuring real-time performance, and managing the trade-offs
between accuracy and reliability. The authors call attention to the need for adaptive and
scalable solutions, especially as DNN models continue to grow and become more
widespread in safety-critical applications like autonomous vehicles and medical
devices.
In summary, this paper serves as a foundational reference for researchers
and engineers working on the design of reliable machine learning accelerators. By
surveying a wide range of fault-tolerant strategies across different abstraction levels,
El-Khamy and Lee provide a clear picture of the current state of the field and pave the
way for future innovations in robust, energy-efficient computing systems.
15
2.3 SUMMARY

The literature survey highlights the growing need for fault-tolerant mechanisms
in matrix computations, especially when implemented on systolic array architectures.
As these arrays are widely used for high-speed and parallel processing, any fault in
their components can significantly impact the accuracy of the results. Various research
works have explored strategies to address this issue, including algorithm-based fault
tolerance (ABFT), error-correcting codes, redundant computation, and reconfigurable
architectures. These techniques aim to detect, isolate, and correct errors while
maintaining efficient performance. Several studies have also proposed hardware-level
and hybrid approaches that balance fault coverage with resource usage. Through this
survey, it is evident that while many effective solutions exist, there remains a need for
approaches that offer high reliability with minimal computational overhead. The
collected research provides a strong foundation for developing improved error
detection and correction systems tailored for matrix operations in systolic arrays.

16
CHAPTER 3
METHODOLOGY AND WORKING PRINCIPLE
3.1 PROBLEM IDENTIFICATION
Matrix multiplication is a core operation in many advanced computing tasks,
including scientific simulations, image processing, and machine learning. In real-time
and safety-critical systems, reliability and speed are crucial. Field-Programmable Gate
Arrays (FPGAs) are widely used in such domains due to their reconfigurability and low
power usage. However, they are highly sensitive to radiation-induced faults, especially
in their configuration memory. These faults are persistent, meaning that once a fault
occurs, it can affect all subsequent operations until explicitly corrected. Traditional fault
tolerance methods like Triple Modular Redundancy (TMR) or Dual Modular
Redundancy (DMR) are too resource-intensive and can significantly hinder
performance. Hence, there is an urgent need for an error detection method that is both
efficient and lightweight, tailored specifically for matrix operations implemented on
systolic arrays within SRAM-based FPGAs.

3.2 PROBLEM STATEMENT

Matrix multiplication is a fundamental operation in numerous computational


fields, including machine learning, image processing, and scientific simulations. To
meet the performance demands of real-time and safety-critical applications, Field-
Programmable Gate Arrays (FPGAs) are often used due to their flexibility and energy
efficiency. However, SRAM-based FPGAs are highly susceptible to radiation-induced
faults, particularly in their configuration memory, which can result in persistent errors
that affect all subsequent operations. These persistent faults pose a serious challenge,
especially in systolic array architectures used for matrix multiplication, where even a
single faulty processing element can corrupt entire rows or columns of output. While
traditional fault-tolerance techniques such as Triple Modular Redundancy (TMR) offer
high reliability, they also introduce significant hardware and performance overhead,
making them impractical for resource-constrained systems. Moreover, existing
17
Algorithm-Based Fault Tolerance (ABFT) methods are not optimized for the fault
characteristics of FPGAs and often involve high computational costs. Therefore, there
is a pressing need for a lightweight and efficient error detection approach specifically
designed for matrix multiplication on systolic arrays, capable of ensuring reliability
without compromising system performance.

3.3 OBJECTIVE

The main objective of this project is to develop a reliable and efficient fault-
tolerant matrix multiplication system using systolic array architecture. The goal is to
integrate a lightweight error detection technique, specifically Light Algorithm-Based
Fault Tolerance (Light ABFT), to identify faults during computation without incurring
high hardware or processing overhead. To enhance error resilience, a hybrid correction
mechanism is employed, combining Hamming Code for correcting single-bit errors
along the rows and Parity Code for detecting multi-bit errors in the columns. The entire
architecture is implemented on an FPGA platform using Verilog HDL and synthesized
through Xilinx Vivado, enabling real-time testing and performance evaluation. This
project also aims to ensure that the system maintains high throughput and accuracy
while minimizing resource usage, making it suitable for critical applications where data
integrity and system reliability are paramount.

3.4 PROPOSED METHOD

The proposed method increases the reliability of systolic array-based matrix


multiplication by incorporating Lightweight Algorithm-Based Fault Tolerance
(LABFT) and a hybrid error correction mechanism. In this method, LABFT is utilized
to identify faults in real-time with checksum-based methods and intrinsic mathematical
invariants throughout matrix operations to support efficient detection with low
computational overhead. The moment an error has been detected, the system enacts a
hybrid error correction technique that employs Hamming codes on the rows and parity

18
codes on the columns of the computation matrix. The Hamming code performs single-
bit error correction per row, while the parity codes are used to help detect multi-bit
errors in the columns, providing better fault coverage overall. This double-dimensional
approach allows both detection and correction to be properly addressed without
considerable degradation of performance or increase in hardware complexity. The
design is executed and verified on an FPGA using Verilog HDL, taking advantage of
its reconfigurability, real-time processing features, and ability for fault injection testing.
The suggested approach provides a strong, low-overhead, and effective solution for
fault-tolerant systolic array architectures, which is suitable for mission-critical
applications in areas like artificial intelligence, signal processing, and embedded
systems.

3.2 WORK FLOW OF PROPOSED SYSTEM

1. LIGHT ABFT METHOD:


Light ABFT (Lightweight Algorithm-Based Fault Tolerance) is a cost-effective
fault detection method tailored for matrix-based computations such as those performed
in systolic arrays. As opposed to other ABFT techniques that can be computationally
expensive and need additional storage, Light ABFT implements a reduced checksum
method to accomplish error detection at low overhead and is thus particularly well-
suited for hardware-restricted platforms such as FPGAs. In this approach, the system
determines the column checksum of Matrix A and the row checksum of Matrix B prior
to multiplication. These checksums are utilized to produce an expected output
checksum. Following matrix multiplication, the row and column sums of resulting
Matrix C are determined and compared with the expected ones. If there is a mismatch,
it means that a computation error happened because of potential hardware failures. This
allows for rapid and effective runtime error detection without excessive resource
consumption, maintaining system performance while guaranteeing reliability. The steps
involved in light ABFT method are as follows.

19
1.1. INPUT CHECKSUM GENERATION:
In this step, checksums of the input matrices are calculated to serve as a reference
for verifying the correctness of the output:
A. For Matrix A: Compute the column-wise checksum
CA[j]=i=1∑m A[i][j]
This results in a 1 × n vector.
B. For Matrix B: Compute the row-wise checksum
RB[i]=j=1∑p B[i][j]
This results in an n × 1 vector.
These values represent the expected behavior of input data.

1.2. COMPUTE EXPECTED OUTPUT:

Using the above input checksums to calculate a reference checksum that represents
the expected behavior of the final result:
Expected Checksum= CA*RB= j=1∑n CA[j]⋅RB[j]

This produces a single scalar value which acts as the predicted total sum of the output
matrix if no fault occurs during multiplication.

1.3. PERFORM MATRIX MULTIPLICATION:


The actual matrix multiplication is performed using hardware such as a systolic
array, which efficiently processes matrix elements in a pipelined and parallel fashion:
• C[i][j]= k=1∑n A[i][k]⋅B[k][j]
This yields the output matrix C of size m×p.

1.4. OUTPUT CHECKSUM CALCULATION:

After computing Matrix C, generate the actual checksum by summing either:

• All elements of Matrix C


ActualCheck sum= i=1∑m j=1∑p C[i][j]

20
Or
• The row-wise or column-wise sum of C, depending on implementation. This
computed checksum is the actual result of the matrix operation.

1.5. ERROR DETECTION:


Now, compare the expected checksum (from Step 2) with the actual checksum (from
Step 4):
• If they match: The computation is considered fault-free.
• If they don’t match: An error has occurred—likely due to data corruption or
hardware faults during matrix multiplication.

2. MATRIX CODES:
Matrix codes are a hybrid error correction technique that applies Hamming codes
along the rows of a matrix and parity codes along the columns. This 2D protection
strategy improves fault tolerance by enabling both detection and correction of errors in
matrix-based data structures, such as those used in systolic arrays or memory storage.
This technique is especially valuable in applications requiring high data integrity with
minimal hardware complexity, such as FPGA-based matrix multiplication systems. The
various steps involved in matrix codes are:

2.1. DATA MATRIX FORMATION


Prepare the matrix MxM that contains the original data to be transmitted, stored, or
processed.
Example:

2.2. APPLY HAMMING CODES ON ROWS


For each row, generate Hamming code by adding redundant bits that allow:
21
• Detection of up to 2-bit errors.
• Correction of 1-bit errors.
This transforms each row into a codeword.
Ri=[data bits+Hamming parity bits]
Now the matrix becomes wider,

2.3. APPLY PARITY CODES ON COLUMNS


For each column, calculate the parity bit (even or odd) and add it as an additional row at
the bottom:
➢ pj=d1⊕d2⊕d3
Final matrix:

2.4. TRANSMISSION OR PROCESSING


Send or process the extended matrix. During this stage, an error (e.g., single-bit flip)
might occur due to noise, soft faults, or hardware issues.

2.5. ERROR DETECTION AND CORRECTION


Once received:
➢ Use Hamming code (row-wise) to detect and correct any single-bit error within a
row.
▪ If the Hamming check fails, it tells you which bit in the row is incorrect.
▪ Flip the bit to correct it.

22
➢ integrity across columns.
▪ Use parity code (column-wise) to verify If a parity check fails, it signals an
unrepaired error (e.g., if Hamming couldn’t detect it due to multiple-bit errors).
▪ Helps in localizing errors further in conjunction with row data.

2.6. MATRIX RECOVERY


After correction and validation, the matrix is stripped of parity and Hamming bits, and
the original data matrix is recovered.

3. HAMMING CODE:
• Hamming Code is one of the oldest and most widely used error-correcting codes in
digital systems. Invented by Richard Hamming in 1950, it gives a simple but
effective way to detect and correct single-bit errors and detect two-bit errors. This
makes it particularly valuable in systems where data integrity is important, like
computer memory, digital communication, and hardware implementations such as
FPGAs.

• The basic concept of Hamming Code is to add redundancy in the shape of parity bits
to the original data. These parity bits are positioned at data points that are powers of
two (1, 2, 4, 8, etc.). These are added with the intent to check certain sets of bits, so
that each single bit in the encoded message is accounted for by a distinct set of parity
checks. This smart design enables the system not just to identify whether an error
has taken place but also to locate the precise position of the error bit. In order to
ascertain the number of parity bits that will be required for a specified number of
data bits, we apply the formula (2^r >= m + r + 1), where `r` represents the number
of parity bits and `m` represents the number of data bits.

• Once the right number of parity bits has been determined, data and parity bits are
put into the proper locations. Parity bits are then computed on the basis of XOR
operations on some bits chosen in the message. When receiving or processing the
encoded message, the same parity checks are recalculated. If the values do not agree
23
with the expected ones, the system calculates a binary error syndrome, which
directly indicates the position of the bad bit. That bit is flipped to recover the data.
Suppose we wish to send four data bits (let's say, 1011). Then we require three parity
bits, so there is a 7-bit Hamming code.

• The last bits that are transmitted could be 0110011, with the parity bits computed
and added. If, upon transmission, there is a mistake in one of these bits, the parity
checks can be used at the receiver end to detect and correct it. This renders Hamming
Code not just space- and time-efficient but also reliable in low-error environments.
On the whole, Hamming Code represents a basic component for more complicated
schemes of error correction and is still extensively employed in both industrial and
academic domains because of its simplicity and efficacy.
4. PARITY CODES:
• Parity codes are some of the simplest and most popular techniques for detecting
errors in digital systems. As opposed to Hamming codes that can detect as well as
correct errors, parity codes are often employed for the purpose of detecting errors
alone. The strategy in parity coding is to append one parity bit to a set of data bits
such that the number of 1s in the set—either counting or not counting the parity
bit—is a particular rule: even parity or odd parity.

• In even parity, the parity bit is arranged in a way that the total number of 1s in the
whole group (data + parity) is even. In the same way, in odd parity, the parity bit
makes the total number of 1s odd. This extra bit aids the receiver to identify whether
there was a single-bit error while transmitting or storing data. If a single error flips
a bit from 0 to 1 or 1 to 0, the group parity will change too. The receiver will sense
this discrepancy and mark it as an error.

• For instance, if we are employing even parity and we wish to transmit the data
`1011`, which contains three 1s (an odd number), we would append a parity bit of 1
to ensure that the total number of 1s is four (even). The code to be transmitted is

24
`10111`. On the receiving end, the system re-computes the parity. If it is not even
as anticipated, the system identifies that an error has been made.

• Whereas parity codes are effective at indicating single-bit errors, they have limited
capability—they can neither specify the position of the error nor correct it. Further,
they do not catch all forms of multiple-bit errors (e.g., if two bits are inverted, the
parity may remain intact). Nevertheless, their simplicity and low cost of
computation render them extremely efficient in situations where error correction is
not required or other types of redundancy are present.

• In matrix-based error correction (such as in systolic arrays or blocks of memory),


the parity codes are usually combined with row-oriented correction codes such as
Hamming. In this situation, the parity bits are utilized column-wise to add further
fault location improvements. If a row both detects and corrects an error (via
Hamming), and a column detects a discrepancy (via parity), such cross-checking
supports error detection reliability much better across the matrix.

5. COMBINATION OF BOTH HAMMING AND PARITY CODE:


• In this dual error control technique, the Hamming codes are used row-wise over a
matrix to correct and detect single-bit errors, while parity bits are inserted column-
wise for yet another level of error detection.
• Hamming logic is used to encode each row so that errors within a row may be
corrected. Following this, parity bits are calculated for every column to detect errors
that can impact multiple rows or bits outside the scope of Hamming.
• Together, this method increases reliability by cross-checking errors—Hamming
identifies and corrects, and parity detects or confirms additional errors—so it is
well-suited to matrix-based data processing systems such as systolic arrays.

25
3.3 INNOVATIVENESS OF THE SOLUTION

▪ The incorporation of Linear Algebraic Block Fault Tolerance (LABFT) presents an


effective method of detecting and correcting faults in systolic arrays. LABFT is a
novel approach that magnifies fault tolerance by detecting and compensating for
errors during the computation process, making the system more reliable even when
there are faults present. This method is most effective for FPGA environments
where faults can occur in the hardware.
▪ The combination of Hamming codes and parity checks is a hybrid combination that
offers the best solution for error correction and detection. Parity codes enable fast
error detection in large data blocks, while single-bit error correction is enabled by
Hamming codes. By using both methods together, the system can manage error
detection and correction efficiently, enhancing the systolic array's reliability without
affecting performance much.
▪ The error detection and correction mechanism is optimized to take full advantage of
FPGA architecture, renowned for its parallel processing abilities. This allows real-
time error correction without imposing noteworthy delays or throughput
compromise. The mechanism can correct errors in parallel with multiple data
streams in the systolic array without compromising high-speed performance even
when there are errors.
▪ One of the most important innovations is the possibility of incorporating fault
detection and correction mechanisms without incurring substantial computational
overhead. This makes it possible for the systolic array to have high performance for
complex calculations and data processing, which is essential in real-time
applications. The error handling is light-weight, enabling the system to correct faults
without compromising the overall computation process.
▪ The error detection and correction mechanism of the system is adaptive, that is, it
can adapt to different fault scenarios in real-time. If the faults are transient (short-
term), permanent, or caused by noisy environments, the system adapts the error

26
correction methods appropriately. This adaptability helps the system remain
functional and reliable in varied environments, thus suitable for mission-critical
operations where error robustness is of the highest priority

3.4 IMPACT OF THE PRODUCT

The product highly improves system reliability, performance, and scalability.


Through the combination of LABFT and hybrid error correction techniques, it makes
the system fault tolerant, allowing for error-free computation even in the occurrence of
hardware faults, which is essential in mission-critical applications. Optimized for
FPGA, the system corrects errors in real-time with low performance overhead,
providing high throughput and data integrity. Its scalable architecture allows it to be
used both in small embedded systems and large-scale environments, minimizing
maintenance costs and prolonging the system lifespan by avoiding human error
handling and hardware failure.

3.5 UNIQUENESS AND FEATURES

▪ This implementation is unique in that it incorporates Linear Algebraic Block Fault


Tolerance (LABFT) in systolic arrays in a manner that provides an architectural-
level fault detection and correction mechanism, a feature not typically used in these
configurations.
▪ It uses a hybrid error correction mechanism involving Hamming codes and parity
checks to efficiently manage both single-bit and multi-bit errors without incurring
high resource overhead.
▪ Optimized for FPGA implementation, the system utilizes parallelism for real-time
error detection and correction to provide high-speed operation without
compromising performance.
▪ Moreover, its flexible and modular design enables it to scale with varying
application sizes and dynamically adapt to changing fault conditions, making it
applicable in both embedded and high-performance computing environments.

27
3.6 SUMMARY
This chapter explains the application of LABFT in systolic arrays to detect and mask
faults through a hybrid Hamming-parity coding approach. It is executed on an FPGA,
using parallelism to handle errors in real-time. The design guarantees low performance
overhead while having high fault tolerance, flexibility, and scalability in a wide range
of computational applications.

28
CHAPTER 4
SOFTWARE AND HARDWARE DESCRIPTION

4.1 HARDWARE DESCRIPTION

Field-Programmable Gate Arrays (FPGAs) are reprogrammable integrated


circuits that support the creation of bespoke digital systems for use in
telecommunications, embedded systems, artificial intelligence, and other uses. FPGA
boards combine FPGA chips with peripherals to offer boards for prototyping,
validation, and deployment of designs specified in terms of Verilog Hardware
Description Language (HDL). Xilinx Vivado Design Suite, one of the world's leading
Electronic Design Automation (EDA) environments, supports Verilog-based
workflows, from simulation to implementation onto FPGA boards. FPGA board
parameters—e.g., FPGA family, memory, number of I/O pins available, connectivity
possibilities, logic elements (LEs), Block RAM, DSP slices, and components such as
Ethernet, USB, and video interfaces—determine their capabilities and appropriateness
for particular design tasks. This section discusses these specifications, their use in
facilitating Verilog and Vivado flows, and their incorporation into major design
activities: design entry, simulation, synthesis, implementation, verification IP, IP
Integrator, design optimization, and mixed-language support. Demonstrations with
Xilinx boards (ZedBoard, Arty A7, Nexys A7, Pynq-Z2) demonstrate their real-world
use in hardware description and design.

FPGA Board Specifications

Specifications on an FPGA board dictate hardware resources utilized by Verilog-based


designs, which dictate design complexity, performance, and connectivity. Below, we
explain each specification, its importance, and its use in Vivado workflows, citing
common Xilinx boards [3,6,11,14].

29
Type of FPGA

The FPGA chip type determines the processing capability of the board, resources, and
application suitability. Xilinx boards employ FPGA families such as Zynq-7000 (SoC
with FPGA and ARM CPUs) or Artix-7 (budget-friendly, high-end). The FPGA type
determines the scale of Verilog designs and Vivado synthesis/implementation
strategies:

a. ZedBoard and Pynq-Z2: With the Zynq-7000 SoC (XC7Z020), which contains
85K logic cells and a dual-core ARM Cortex-A9 processor (650 MHz). The SoC
accommodates Verilog designs (e.g., a soft-core processor) and software
applications (e.g., Linux-based control), perfect for embedded systems and AI .
b. Arty A7 and Nexys A7: Utilize Artix-7 FPGAs (XC7A35T for Arty A7-35T, 33K
logic cells; XC7A100T for Nexys A7, 101K logic cells). These are appropriate for
signal processing, IoT, and education projects, Verilog designs such as digital filters
or state machines .

4.2 SOFTWARE DESCRIPTION

Verilog HDL

Verilog is an industry-standard Hardware Description Language (HDL) for


describing and modeling digital hardware, ranging from basic components such as flip-
flops to intricate architectures such as microprocessors, memory chips, and system-on-
chip (SoC) design. Its C-like syntax is easy for program-software-trained engineers to
understand, and its concurrent and sequential support allows it to model hardware
timing and behavior accurately.

The versatility of Verilog lies in the fact that it can be used to model hardware at
various levels of abstraction, enabling designers to design at high-level functional
models or low-level gate-level implementations. It can support multiple design stages,
ranging from prototyping early ideas to final synthesis. For instance, a designer can
model a UART (Universal Asynchronous Receiver-Transmitter) controller using
Verilog, defining its data transmission logic at a high level before synthesizing it to
30
FPGA hardware. Verilog is extensively used in industries for ASIC, FPGA, and SoC
design, as its support for EDA tools such as Xilinx Vivado enables seamless integration
into existing design flows. Its standardization enables interoperability, with designers
being able to simulate, synthesize, and verify complex systems efficiently.

Verilog Abstraction Levels

Verilog provides three main levels of abstraction, each suited for particular design
requirements:

a. Behavioral Level:

This topmost level of abstraction is concerned with the functionality of a system in


terms of concurrent algorithms. Abstractions such as Functions, Tasks, and Always
blocks are employed to describe behavior without regard to physical details. An
example of a behavioral specification of an FSM may specify the state transitions (e.g.,
"if the input is high, transition to next state") without the mention of registers or gates.
This level is particularly appropriate for early verification and rapid prototyping
because it focuses on logic and not on hardware details.

b. Register-Transfer Level (RTL):

RTL defines circuit behavior as data transfers among registers, clocked by an


explicit clock signal. It is the abstraction-implementation compromise, the key to
synthesizable designs for FPGA and ASIC design. For instance, an RTL model of a
counter may define how a register is updated on a clock edge, allowing synthesis
software to translate it to FPGA resources such as look-up tables (LUTs). RTL's
accuracy guarantees designs to be functional as well as to be implementable, and is the
foundation of most FPGA design flows.

c. Gate Level:

This low-level of abstraction describes systems as logic gates (e.g., AND, OR,
NOT) and hard-wired primitives, whose signal values are 0, 1, X (unknown), and Z
(high-impedance). Gate-level descriptions, typically synthesized from RTL code by

31
synthesis tools, are intended for simulation, timing analysis, and backend activities such
as place-and-route. For instance, a gate-level netlist can represent a circuit as a net of
NAND gates cascaded together, and this provides a good representation of the physical
realization.

These abstractions enable designers to move from conceptual models to hardware-


ready designs with maximum efficiency and a minimum of errors during the
development process.

Xilinx Vivado Design Suite

The Xilinx Vivado Design Suite is a full-featured Electronic Design Automation (EDA)
tool for designing, synthesizing, implementing, and analyzing FPGA-based systems.
Designed for Xilinx FPGAs like the Zynq UltraScale+, Artix-7, and Kintex UltraScale
series, Vivado accommodates a broad spectrum of design entry styles, including
Verilog, VHDL, schematic capture, and high-level synthesis (HLS) for the translation
of C/C++ code to HDL. Vivado handles the whole FPGA design flow, from Verilog
code writing to the generation of a bitstream file that programs the FPGA's
programmable logic, memory, and interconnects. Its sophisticated algorithms optimize
designs for performance (e.g., clock speed), power, and resource usage, which makes
it particularly suitable for use in applications such as embedded systems, 5G
communications, and machine learning accelerators.

Vivado combines a collection of tools to simplify intricate workflows, cutting design


time and increasing reliability. It supports intellectual property (IP) cores, permitting
designers to integrate pre-verified components (e.g., AXI interconnects, DSP blocks)
into their systems. Also, Vivado's third-party simulator support, including ModelSim,
and its embedded debugging tools provide comprehensive verification and
troubleshooting. By presenting a single environment, Vivado allows engineers to work
on projects from small-scale prototypes to large-scale, high-performance systems,
sustaining Xilinx's leadership in FPGA design.

32
Vivado IDE Interface

The Vivado Integrated Design Environment (IDE) provides a user-friendly,


customizable interface to manage the FPGA design process, organized into four
primary panels:

Project Manager:

Shows all project resources, such as Verilog source files, timing constraints (clock
period), and IP cores. It provides a hierarchical presentation for navigation
convenience, allowing users to add, modify, or arrange components. For instance, a
designer can import a Verilog module for a PWM (Pulse Width Modulation) controller
or add a Xilinx IP core for an Ethernet MAC.

Flow Navigator:

A vertical panel that organizes the design flow into linear tasks, including synthesis,
implementation, and bitstream generation. Every task is one-click accessible, and
Vivado indicates the current stage, making it easy to navigate for novices and experts
alike. For example, clicking "Run Synthesis" initiates the translation of Verilog code to
a netlist.

Workspace:

An multi-document window for display and edit of design files, simulation waveforms,
and analysis reports. It provides environments such as the HDL Editor to create Verilog
code, the Timing Analyzer to analyze signal delays, and the Power Estimator to
measure energy consumption. Multiple files can be opened at one time, e.g., a Verilog
testbench and its waveform output, to improve productivity.

Tcl Console:

Supports automation by Tcl scripting so that users can run commands to automate
repetitive activities (e.g., batch generation of multiple designs) or customize the
workflow. The console displays all operations with full transparency, and debugging is
supported. The IDE's flexibility permits users to resize, dock, or undock panels to suit
33
their workflow styles. The Layout menu provides predefined layouts or reset to default
options, providing access for new users while accommodating detailed customization
for expert designers.

Key Aspects of Verilog integration in Vivado :

Design Entry:

Emphasized the HDL Editor features of Vivado (e.g., syntax highlighting) and project
management, with the demonstration of a 32-bit counter to illustrate useful application.

Simulation:

Detailed mixed-mode simulation and testbench generation, with a UART example


illustrating waveform analysis and integration with third-party tools.

Synthesis:

Described the process of netlist creation, with an example of an adder to demonstrate


mapping to FPGA primitives and report generation.

Implementation:

Defined placement and routing, with an FSM example to illustrate bitstream


generation and timing optimization.

Verification IP:

Explained VIP's role in verifying standard interfaces, with an AXI-stream example to


illustrate lower verification effort.

IP Integrator:

Demonstrated graphical integration with a DSP filter example, highlighting automation


and system-level design.

Design Optimization:

34
Discussed timing and resource optimization, with matrix multiplier and CRC examples
to illustrate performance trade-offs.

Mixed-Language Support:

Explained Verilog-VHDL interoperability, with a processor-memory example to


emphasize collaboration advantages.

4.3 SUMMARY

Verilog HDL and Xilinx Vivado Design Suite make up a sound ecosystem for
designing FPGAs, catering to the demands of contemporary electronics engineering.
Verilog's multi-level abstraction supports efficient, scalable description of digital
systems, ranging from high-level behavior to synthesizable RTL and gate-level netlist.
Its standardization and C-like syntax result in ease of use and portability, suitable for
various uses such as telecommunication, automobile, and artificial intelligence
hardware. Vivado augments Verilog with a complete, intuitive platform for synthesis,
implementation, simulation, timing analysis, and debugging specifically designed for
Xilinx FPGAs. Together, they enable engineers to design, verify, and implement
sophisticated FPGA systems with accuracy and efficiency, satisfying high-performance
and reliability demands in industries across the globe.

35
CHAPTER 5

RESULTS AND DISCUSSIONS

Throughout the project development and testing, the work was conducted in a
laboratory setting with industry-standard FPGA development tools and simulation
software. The main hardware platform utilized was a Xilinx FPGA board, which offered
a flexible and efficient platform for implementing and testing the error detection and
correction architecture. The toolchain of FPGA, such as Vivado Design Suite, was used
for synthesis, simulation, and deployment of VHDL/Verilog code. The simulator played
an important role in verifying the logic design prior to hardware deployment.
Testbenches were implemented to simulate different fault conditions, both single-bit
faults and multiple-bit faults, to see the behavior of the system and test its fault-handling
ability. These fault injection simulations enabled a deliberate test of the LABFT and
hybrid Hamming-parity mechanisms under various stress levels.

With the transition to hardware testing, the FPGA board was set up and observed in
real-time, enabling direct system performance and error correction response
observation. The environment facilitated debugging using on-chip tools such as
Integrated Logic Analyzer (ILA), which allowed deeper insight into internal signal
behavior under fault conditions. In general, the development and test environments
offered a solid platform to iteratively design, verify, and optimize the system for fault-
tolerant, high-speed applications.

36
Fig.1 The Input Matrices A & B are Multiplied and Accumulated using
Light ABFT method

Fig.2 In p22 location Error is injected and the Output Matrix P is obtained
using Light ABFT method

37
Fig.3 The Input Matrices and the Error injected Output Matrix are
compared using Light ABFT method and Error is detected

Fig.4 The Output Matrix P is corrected using Matrix Code which contains
Hamming Code for rows and Parity Code for columns and obtains Matrix
Q

38
5.1 ANALYSIS

TABLE1: INPUT MATRICES


INPUT MATRIX OF A INPUT MATRIX OF B

a11 5 b11 1
a12 10 b12 2
a13 15 b13 3
a14 10 b14 4
a21 2 b21 2
a22 4 b22 3
a23 6 b23 4
a24 8 b24 5
a31 4 b31 3
a32 8 b32 4
a33 12 b33 5
a34 16 b34 6
a41 3 b41 4
a42 6 b42 5
a43 9 b43 6
a44 12 b44 7

TABLE 2:ERROR DETECTED AND ERROR CORRECTED OUPUT

MATRICES

ERROR DETECTED IN OUTPUT ERROR CORRECTED OUTPUT


MATRIX MATRIX
p11 110 q11 110
p12 150 q12 150
p13 190 q13 190
p14 230 q14 230
p21 60 q21 60
p22_f 70 q22 80
p23 100 q23 100
p24 120 q24 120
p31 120 q31 120
p32 160 q32 160
p33 200 q33 200
p34 240 q34 240
p41 90 q41 90
p42 120 q42 120
p43 150 q43 150
p44 180 q44 180
39
CHAPTER 6

CONCLUSION AND SCOPE FOR FUTURE WORK

6.1 CONCLUSION
The proposed structure effectively combines efficient correction procedures with
lightweight real-time error detection in a fault-tolerant systolic array architecture for
matrix multiplication. While the employment of parity and Hamming codes offers
strong error correction across all dimensions of the computing matrix, the application
of LABFT for fault detection guarantees low performance overhead. The system's area,
power, and real-time error recovery efficiency are confirmed by the FPGA-based
implementation. All things considered, the study shows a workable and trustworthy
way to improve the reliability of matrix operations in systolic arrays, especially for
crucial applications in signal processing and embedded systems.

6.2 SCOPE FOR FUTURE WORK

Multi-bit error correction codes or adaptive fault tolerance techniques can be


added to this architecture in future versions to handle larger error rates and more
complicated fault models. Furthermore, the integration of machine learning algorithms
may aid in the prediction of fault patterns and the dynamic modification of corrective
procedures. In order to further enhance performance and enable application in high-
speed computing, AI accelerators, and edge devices with strict reliability requirements,
the design should be scaled for higher matrix sizes and deployed on sophisticated FPGA
platforms or ASICs.

40
REFERNCES

1. Lu, Hsin-Chen, Liang-Ying Su, and Shih-Hsu Huang. "Highly Fault-Tolerant


Systolic-Array-Based Matrix Multiplication." Electronics, vol. 13, no. 9, 2024.

2. Y. Wang, Y. Chen, and H. Zhou, "Design and Implementation of a Fault-Tolerant


Systolic Array Processor on FPGA," in Proc. IEEE Int. Conf. Field-
Programmable Technology (FPT), 2018, pp

3. S. Wu, Y. Zhai, J. Huang, Z. Jian, and Z. Chen, "FT-GEMM: A Fault Tolerant


High Performance GEMM Implementation on x86 CPUs," presented at the
Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming (PPoPP '23), Montreal, QC, Canada, Feb. 2023.

4. S. Zhang and K. Roy, "A Low-Overhead Error Detection Scheme for Systolic
Array-Based Matrix Multiplication Accelerators," IEEE Trans. Very Large Scale
Integration (VLSI) Syst., vol. 29, no. 10, pp. 2671-2683, Oct. 2021.

5. H. Waris, C. Wang, W. Liu, and F. Lombardi, "AxSA: On the Design of High-


Performance and Power-Efficient Approximate Systolic Arrays for Matrix
Multiplication," Journal of Signal Processing Systems, vol. 93, pp. 605–615, Jun.
2021.

41
6. J. de Fine Licht, G. Kwasniewski, and T. Hoefler, "Flexible Communication
Avoiding Matrix Multiplication on FPGA with High-Level Synthesis," in
Proceedings of the 2020 ACM/SIGDA International Symposium on Field-
Programmable Gate Arrays (FPGA'20), Seaside, CA, USA, Feb. 2020.

7. M. G. K. R. Reddy, V. R. Pudi, and K. L. Hsiao, "Matrix-Based Error Detection


and Correction for Parallel Computations," IEEE Transactions on Parallel and
Distributed Systems, vol. 12, no. 6, pp. 618-631, Jun. 2001.

42
Appendix

module tb_light_abft();
reg clk,rst,rst1;
reg [1:0] s1,s2,s3;
reg [7:0] c1,c2,c3,c4,r1,r2,r3,r4;
reg [15:0] p1,p2,p3,p4;
wire error;
wire [7:0]q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;
initial
begin
clk=1'b1;
forever #50 clk=~clk;
end
always
begin
#100
rst=1'b1;rst1=1'b1;s1=2'b00;s2=2'b00;s3=2'b00;c1=5;c2=10;c3=15;c4=10;r1=1;r2=2;
r3=3;r4=4;p1=110;p2=150;p3=190;p4=230;
#100 rst=1'b0;
#100
c1=2;c2=4;c3=6;c4=8;r1=2;r2=3;r3=4;r4=5;p1=60;p2=70;p3=100;p4=120;//fault at
p2 80-70
#100
c1=4;c2=8;c3=12;c4=16;r1=3;r2=4;r3=5;r4=6;p1=120;p2=160;p3=200;p4=240;
#100 c1=3;c2=6;c3=9;c4=12;r1=4;r2=5;r3=6;r4=7;p1=90;p2=120;p3=150;p4=180;
#100
s1=2'b01;s2=2'b01;s3=2'b01;c1=0;c2=0;c3=0;c4=0;r1=0;r2=0;r3=0;r4=0;p1=0;p2=0;
p3=0;p4=0;rst1=1'b0;
#100 s1=2'b10;s2=2'b10;s3=2'b10;
#100 s1=2'b11;s2=2'b11;s3=2'b11;

43
#200 rst=1'b1;rst1=1'b1;
end
light_abft
u0(clk,rst,rst1,s1,s2,s3,c1,c2,c3,c4,r1,r2,r3,r4,p1,p2,p3,p4,error,q11,q12,q13,q14,q21,
q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
endmodule

module light_abft(clk,rst,rst1,s1,s2,s3,c1,c2,c3,c4,r1,r2,r3,r4,p1,p2,p3,p4,error,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
input clk,rst,rst1;
input [1:0] s1,s2,s3;
input [7:0] c1,c2,c3,c4,r1,r2,r3,r4;
input [15:0] p1,p2,p3,p4;
output reg error;
output [7:0] q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;
wire [7:0] a1,a2,a3,a4,b1,b2,b3,b4,m1,m2;
wire [15:0] q1,q2,q3,q4,m3;
wire [15:0] mac_op,acc_op;
//wire [15:0] acc_op_f;
wire [7:0]
p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44;
assign p11=110;assign p12=150;assign p13=190;assign p14=230;
assign p21=60;assign p22=80;assign p23=100;assign p24=120;assign p22_f=70;
assign p31=120;assign p32=160;assign p33=200;assign p34=240;
assign p41=90;assign p42=120;assign p43=150;assign p44=180;
acc_8 u0(clk,rst,c1,a1);
acc_8 u1(clk,rst,c2,a2);
acc_8 u2(clk,rst,c3,a3);
acc_8 u3(clk,rst,c4,a4);
mux_41_8 u4(a1,a2,a3,a4,s1,m1);
44
acc_8 u5(clk,rst,r1,b1);
acc_8 u6(clk,rst,r2,b2);
acc_8 u7(clk,rst,r3,b3);
acc_8 u8(clk,rst,r4,b4);
mux_41_8 u9(b1,b2,b3,b4,s2,m2);
mac_8 u10(clk,rst1,m1,m2,mac_op);
acc_16 u11(clk,rst,p1,q1);
acc_16 u12(clk,rst,p2,q2);
acc_16 u13(clk,rst,p3,q3);
acc_16 u14(clk,rst,p4,q4);
mux_41_16 u15(q1,q2,q3,q4,s3,m3);
acc_16 u16(clk,rst1,m3,acc_op);
//assign acc_op_f={acc_op[15:1],~acc_op[0]};
matrix_code
u17(error,p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
always@(posedge clk)
begin
if(rst==1'b1 && rst1==1'b1 && mac_op!=acc_op)
error=1'b1;
else
error=1'b0;
end
endmodule

module acc_8(clk,rst,ip,op);
input clk,rst;
input [7:0]ip;
output [7:0]op;
reg [7:0]acc;

45
always@(posedge clk)
begin
if(rst)
acc=8'b0;
else
acc=acc+ip;
end
assign op=acc;
endmodule

module acc_16(clk,rst,ip,op);
input clk,rst;
input [15:0]ip;
output [15:0]op;
reg [15:0]acc;
always@(posedge clk)
begin
if(rst)
acc=16'b0;
else
acc=acc+ip;
end
assign op=acc;
endmodule

module mux_41_8(a,b,c,d,s,q);
input [7:0] a,b,c,d;
input [1:0] s;
46
output reg [7:0] q;
always@(a,b,c,d,s)
begin
case(s)
2'b00:q=a;
2'b01:q=b;
2'b10:q=c;
2'b11:q=d;
endcase
end
endmodule

module mux_41_16(a,b,c,d,s,q);
input [15:0] a,b,c,d;
input [1:0] s;
output reg [15:0] q;
always@(a,b,c,d,s)
begin
case(s)
2'b00:q=a;
2'b01:q=b;
2'b10:q=c;
2'b11:q=d;
endcase
end
endmodule

module mac_8(clk,rst,a,b,mac_op);
input clk,rst;
input [7:0]a,b;
47
output [15:0]mac_op;
wire [15:0]mul_op;
reg [15:0]acc;
assign mul_op=a*b;
always@(posedge clk)
begin
if(rst)
acc=16'b0;
else
acc=acc+mul_op;
end
assign mac_op=acc;
endmodule

module
matrix_code(err,p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p
44,
q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44);
input err;
input [7:0] p11,p12,p13,p14,p21,p22,p22_f,p23,p24,p31,p32,p33,p34,p41,p42,p43,p44;
output [7:0] q11,q12,q13,q14,q21,q22,q23,q24,q31,q32,q33,q34,q41,q42,q43,q44;

wire [7:0]pa1,pa2,pa3,pa4,pf1,pf2,pf3,pf4;
wire p1,p2,p3,p4;
wire [7:0]z11,z12,z13;
wire [7:0]z21,z22,z23;
wire [7:0]z31,z32,z33;
wire [7:0]z41,z42,z43;

assign pa1=p11 ^ p21 ^ p31 ^ p41;


assign pa2=p12 ^ p22 ^ p32 ^ p42;
assign pa3=p13 ^ p23 ^ p33 ^ p43;
assign pa4=p14 ^ p24 ^ p34 ^ p44;

assign pf1=p11 ^ p21 ^ p31 ^ p41;


48
assign pf2=p12 ^ p22_f ^ p32 ^ p42;
assign pf3=p13 ^ p23 ^ p33 ^ p43;
assign pf4=p14 ^ p24 ^ p34 ^ p44;

assign p1=|(pa1^pf1);
assign p2=|(pa2^pf2);
assign p3=|(pa3^pf3);
assign p4=|(pa4^pf4);

assign z11 = p11 ^ p12 ^ p13;


assign z12 = p11 ^ p12 ^ p14;
assign z13 = p11 ^ p13 ^ p14;
assign z21 = p21 ^ p22 ^ p23;
assign z22 = p21 ^ p22 ^ p24;
assign z23 = p21 ^ p23 ^ p24;
assign z31 = p31 ^ p32 ^ p33;
assign z32 = p31 ^ p32 ^ p34;
assign z33 = p31 ^ p33 ^ p34;
assign z41 = p41 ^ p42 ^ p43;
assign z42 = p41 ^ p42 ^ p44;
assign z43 = p41 ^ p43 ^ p44;

matrix_code_row u0(err,p11,p12,p13,p14,z11,z12,z13,q11,q12,q13,q14);
matrix_code_row u1(err,p21,p22_f,p23,p24,z21,z22,z23,q21,q22,q23,q24);
matrix_code_row u2(err,p31,p32,p33,p34,z31,z32,z33,q31,q32,q33,q34);
matrix_code_row u3(err,p41,p42,p43,p44,z41,z42,z43,q41,q42,q43,q44);

endmodule

module matrix_code_row(err,p11,p12,p13,p14,z11,z12,z13,q11,q12,q13,q14);

49
input err;
input [7:0] p11,p12,p13,p14,z11,z12,z13;
output reg [7:0] q11,q12,q13,q14;
wire [7:0]c11,c12,c13;
wire [7:0]zz11,zz12,zz13;

assign c11 = p11 ^ p12 ^ p13;


assign c12 = p11 ^ p12 ^ p14;
assign c13 = p11 ^ p13 ^ p14;

assign zz11 = c11 ^ z11;


assign zz12 = c12 ^ z12;
assign zz13 = c13 ^ z13;

always@(err,p11,p12,p13,p14,zz11,zz12,zz13)
begin
if(err==1'b1)
begin
if(zz11==8'b0 && zz12==8'b0 && zz13==8'b0)
begin
q11 <= p11;
q12 <= p12;
q13 <= p13;
q14 <= p14;
end
else if(zz11!=8'b0 && zz12!=8'b0 && zz13!=8'b0)
begin
q11 <= z11 ^ p12 ^ p13;
q12 <= p12;
q13 <= p13;
q14 <= p14;
50

You might also like