XLA Final Report
XLA Final Report
Students’ names:
Huỳnh Chí Nguyên – 22151032
Lê Khổng Bảo Minh – 22151255
2
CHAPTER 1. OVERVIEW
1.2 Objectives
The project aims to develop a program to recognize simple objects in input images
using the AlexNet neural network, along integrating a feature to check and equalize
the histogram of all three RGB channels to ensure image quality. Then, the program
will display both the original and equalized images (if needed) in a single frame and
announce the name of the recognized object for enhancing program interactivity.
3
CHAPTER 2. METHODOLOGY AND CALCULATIONS
4
d) Layer 3: Convolutional Layer (Conv3)
Filters: 384
Kernel Size: 3x3
Padding: 1 (same padding)
Activation: ReLU
Output: Feature maps of size 13x13x384.
Filters: 384
Kernel Size: 3x3
Padding: 1
Activation: ReLU
Output: Feature maps of size 13x13x384.
5
2.1.3 AlexNet Architecture
Overlap
ping Max Pooling:
Max Pooling layer is often used to reduce the width and length of a tensor but keep the
depth the same. Overlapping Max Pool layer is similar to Max Pool layer, except that one
window of this step will have a part overlapping the window of the next step. We use
pooling of size 3x3 and a step of 2 between pooling. That means between this pooling
and another pooling will overlap with each other by 1 pixel.
a) Image Classification:
AlexNet was originally designed for image classification tasks, particularly on large-
scale datasets like ImageNet.
It can categorize images into thousands of classes, making it ideal for object
recognition and categorization.
b) Object Detection:
AlexNet serves as a backbone for many object detection models like R-CNN.
It helps detect and localize multiple objects within an image.
Feature Extraction: The convolutional layers of AlexNet are used to extract high-
quality features from images. These features are applied to various downstream tasks
like transfer learning.
6
c) Medical Imaging:
d) Autonomous Vehicles:
The network is used to process visual data from cameras in self-driving cars.
It aids in recognizing road signs, obstacles, and pedestrians.
f) Application experiments
Here are the eight ILSVRC-2010 test images and the five labels considered most likely in
the AlexNet model. What the network has learned can be qualitatively assessed by
calculating the top 5 estimates on the eight test images.
7
2.2 Theoretical foundation of histogram
2.2.1 Definition of Histogram
A histogram is a graphical representation of the frequency distribution of pixel intensity
values in an image. For digital images, each pixel value typically ranges between [0,255].
The histogram displays the brightness distribution of the image, from dark regions (low
pixel values) to bright regions (high pixel values).
Basically, we have three categories of images with different histogram. An image with
histogram concentrated in the low-value range is too dark. On the other hand, an image
with histogram concentrated in the high-value range is too bright. An image with a well-
distributed histogram tends to be better-looking.
2.2.2 Histogram balance
A balanced histogram is one in which the intensity values are distributed relatively evenly
across the range, avoiding concentrations in specific regions. This indicates the image has
good brightness and contrast and there is no overly dark or overly bright area. The
balance of a histogram can be assessed by calculating the deviation of pixel distributions
The balance of a histogram can be assessed by calculating the deviation of pixel
distributions.
Firstly, compute the relative frequency p(i) of each intensity level i:
h(i)
p(i) = N
h(i):the number of pixels with intensity i.
N: the total number of pixels in the image.
Secondly, compute the mean of the normalized histogram:
L−1
1
μ = L ∑ p (i)
i=0
8
2.2.3 Histogram equalization
Histogram equalization is a technique used to improve its overall constract by adjusting
the intensity distribution. The goal is to redistribute pixel values such that the histogram
becomes more uniform.
For this program, we use the global histogram for discrete case because the desired
objectives are for regular images with simple objects. The formulas for the discrete of
histogram equalization are presented as:
L−1
n L−1
P L−1= where n=∑ nl
n l=0
k
gk =T [ f k ]=∑ Pi
i=0
K k k
nj L−1
sk = T(r k ) = (L−1) ∑ p r ( r j) =( L−1) ∑ = ∑n
j=0 j=0 MN MN j=0 j
for k = 0,1,…,L-1
9
2.3 Program Workflow
2.3.1 Block diagram for program
10
2.3.2 Functions of each block
Start: marks the beginning of the program.
Load image: loads the input image.
Check histogram balance: checks whether the histogram of the image's R, G, and
B channels is balanced.
Equalize histogram: performs histogram equalization for each R, G, and B channel
and combines the channels back into an RGB image.
Resize image: resizes the image to 227x227 dimensions to meet AlexNet's input
size requirements.
Object recognition: classifies the resized image to recognize the object.
Display images: displays the original image and the equalized image (if needed)
side by side.
Speak object name: announces the recognized object's name.
End: marks the end of the program.
11
In order to use this program, the computer must be integrated with the Deep Learning
Toolbox Model for AlexNet Network support package, made by MathWorks Deep
Learning Toolbox Team.
The threshold value of 0.1 is chosen because this is the most common value of processing
image using Matlab, pixel levels can deviate by up to 10% from the mean, which is
generally considered acceptable in histogram processing.
The program uses discrete equation because it is based on fixed pixel intensity levels.
This is a common approach to regular digital image processing, where pixel intensity
levels are represented discretely. The same reason is for the usage of global histogram
equalization.
12
CHAPTER 3. RESULTS
3.1 The original image is balanced in histogram
13
3.3 The original image is too bright
14
CHAPTER 4. CONCLUSION
The project has completed the requirements, including recognizing common objects in
the simple input images, performing image preprocessing steps such as checking and
balancing the histogram to ensure the recognition process is effective, and pronouncing
the name of the recognized object. However, the program still has some limitations when
there are cases of mistakenly recognizing objects with similar shapes. This requires
further improvement in the input image normalization or preprocessing step, as well as
using another CNN network with larger and more diverse training data to increase the
ability to recognize and classify accurately. In the future, the program can be further
optimized to achieve higher performance in more complex recognition problems.
15
REFERENCES
[1]. Đặng Thị Hằng | Phạm Duy Tùng _ Tìm Hiểu Về Mạng Neural Network AlexNet _
2018 https://www.phamduytung.com/blog/2018-06-15-understanding-alexnet/
[2]. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, ImageNet Classification
with Deep Convolutional Neural Networks _ 2012
[3]. Đặng Thị Hằng | Phạm Duy Tùng _ Tìm Hiểu Mạng AlexNet, Mô Hình Giành Chiến
Thắng Tại Cuộc Thi ILSVRC 2012 _ 2019 https://www.phamduytung.com/blog/2019-
05-27-alexnet/
16