0% found this document useful (0 votes)
3 views17 pages

XLA Final Report

Uploaded by

lekhongbaominh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

XLA Final Report

Uploaded by

lekhongbaominh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION

FACULTY OF INTERNATIONAL EDUCATION


Image Processing in Industrial

FINAL PROJECT REPORT

APPLICATION OF CONVOLUTIONAL NEURAL NETWORK


ALEXNET IN OBJECT RECOGNITION WITH HISTOGRAM
CHECKING AND EQUALIZATION
Lecturer: Lê Mỹ Hà Ph.D

Students’ names:
Huỳnh Chí Nguyên – 22151032
Lê Khổng Bảo Minh – 22151255

Ho Chi Minh city, December 19th, 2024


1
Table of contents
CHAPTER 1. OVERVIEW.........................................................................................................................2
1.1 Problem statement.............................................................................................................................2
1.2 Objectives..........................................................................................................................................2
CHAPTER 2. METHODOLOGY AND CALCULATIONS.......................................................................3
2.1 AlexNet convolutional neural network (CNN)..................................................................................3
2.1.1 Introduction to AlexNet..............................................................................................................3
2.1.2 Structure of AlexNet Network....................................................................................................3
2.1.3 AlexNet Architecture..................................................................................................................5
2.1.4 AlexNet Applications:.................................................................................................................5
2.2 Theoretical foundation of histogram..................................................................................................7
2.2.1 Definition of Histogram..............................................................................................................7
2.2.2 Histogram balance......................................................................................................................7
2.2.3 Histogram equalization...............................................................................................................8
2.3 Program Workflow............................................................................................................................8
2.3.1 Block diagram for program.........................................................................................................8
2.3.2 Functions of each block..............................................................................................................9
2.3.3 Matlab program..........................................................................................................................9
CHAPTER 3. RESULTS...........................................................................................................................11
3.1 The original image is balanced in histogram....................................................................................11
3.2 The original image is too dark.........................................................................................................11
3.3 The original image is too bright.......................................................................................................12
CHAPTER 4. CONCLUSION..................................................................................................................13
REFERENCES..........................................................................................................................................14

2
CHAPTER 1. OVERVIEW

1.1 Problem statement


Nowadays, the development of artificial intelligence and neural networks has become
more rapid than ever. Among its applications, object recognition in images has
become one of the most prominent ones, serving various purposes such as
surveillance, traffic control, and life assistance. However, the quality of input images
greatly affects the accuracy of the recognition process. This necessitates
preprocessing steps, such as histogram equalization, to enhance image quality and
recognition results.

The project "APPLICATION OF ALEXNET IN OBJECT RECOGNITION WITH


HISTOGRAM CHECKING AND EQUALIZATION" focuses on solving the
problem of object recognition in images using the AlexNet neural network while
integrating histogram checking and equalization to ensure the quality of the input
images.

1.2 Objectives
The project aims to develop a program to recognize simple objects in input images
using the AlexNet neural network, along integrating a feature to check and equalize
the histogram of all three RGB channels to ensure image quality. Then, the program
will display both the original and equalized images (if needed) in a single frame and
announce the name of the recognized object for enhancing program interactivity.

3
CHAPTER 2. METHODOLOGY AND CALCULATIONS

2.1 AlexNet convolutional neural network (CNN)


2.1.1 Introduction to AlexNet
AlexNet, first introduced by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in
2012, is one of the most influential convolutional neural networks (CNNs) in the history
of deep learning. It had 60 million parameters, 650,000 neurons, a training set of 1.2
million images, which revolutionized computer vision by significantly improving
performance on image classification tasks, particularly in the Image Net Large Scale
Visual Recognition Challenge (ILSVRC).
The architecture design provides training that is almost black box, plus the ability to self-
learn features through hidden layers.

2.1.2 Structure of AlexNet Network


a) Input Layer:
 Input: RGB images of size 227x227x3.
 Images are preprocessed (normalized and resized) to fit this input size.

b) Layer 1: Convolutional Layer (Conv1)


 Filters: 96
 Kernel Size: 11x11
 Stride: 4
 Activation: ReLU
 Output: Feature maps of size 55x55x96
 Additional Step: Max pooling with a 3x3 filter and stride 2 reduces the size to
27x27x96.

c) Layer 2: Convolutional Layer (Conv2)


 Filters: 256
 Kernel Size: 5x5
 Padding: 2 (same padding)
 Activation: ReLU
 Output: Feature maps of size 27x27x256
 Additional Step: Max pooling reduces the size to 13x13x256.

4
d) Layer 3: Convolutional Layer (Conv3)
 Filters: 384
 Kernel Size: 3x3
 Padding: 1 (same padding)
 Activation: ReLU
 Output: Feature maps of size 13x13x384.

e) Layer 4: Convolutional Layer (Conv4)

 Filters: 384
 Kernel Size: 3x3
 Padding: 1
 Activation: ReLU
 Output: Feature maps of size 13x13x384.

f) Layer 5: Convolutional Layer (Conv5)


 Filters: 256
 Kernel Size: 3x3
 Padding: 1
 Activation: ReLU
 Output: Feature maps of size 13x13x256.
 Additional Step: Max pooling reduces the size to 6x6x256.
 Flattening Layer:
 The 3D tensor (6x6x256) is flattened into a 1D vector with 9216 units.

g) Layer 6: Fully Connected Layer (FC1)


 Nodes: 4096
 Activation: ReLU
 Dropout: Applied to prevent overfitting.

h) Layer 7: Fully Connected Layer (FC2)


 Nodes: 4096
 Activation: ReLU
 Dropout: Applied again for regularization.

i) Layer 8: Output Layer (FC3)


 Nodes: 1000 (corresponding to the 1000 classes in the ImageNet dataset).
 Activation: Softmax, which outputs probabilities for each class.

5
2.1.3 AlexNet Architecture

Overlap
ping Max Pooling:

Max Pooling layer is often used to reduce the width and length of a tensor but keep the
depth the same. Overlapping Max Pool layer is similar to Max Pool layer, except that one
window of this step will have a part overlapping the window of the next step. We use
pooling of size 3x3 and a step of 2 between pooling. That means between this pooling
and another pooling will overlap with each other by 1 pixel.

2.1.4 AlexNet Applications:

a) Image Classification:

 AlexNet was originally designed for image classification tasks, particularly on large-
scale datasets like ImageNet.
 It can categorize images into thousands of classes, making it ideal for object
recognition and categorization.

b) Object Detection:

 AlexNet serves as a backbone for many object detection models like R-CNN.
 It helps detect and localize multiple objects within an image.
 Feature Extraction: The convolutional layers of AlexNet are used to extract high-
quality features from images. These features are applied to various downstream tasks
like transfer learning.

6
c) Medical Imaging:

 AlexNet is applied in medical fields for diagnosing diseases through imaging


techniques such as X-rays, CT scans, and MRIs.
 It helps identify abnormalities like tumors or organ damage.

d) Autonomous Vehicles:

 The network is used to process visual data from cameras in self-driving cars.
 It aids in recognizing road signs, obstacles, and pedestrians.

e) Facial Recognition: AlexNet is a foundational model for facial recognition systems.

 It helps in tasks such as identifying individuals, analyzing expressions, or detecting


faces in images.
 Agriculture: AlexNet assists in identifying plant diseases, analyzing crop health, and
classifying plant species from images.

f) Application experiments
Here are the eight ILSVRC-2010 test images and the five labels considered most likely in
the AlexNet model. What the network has learned can be qualitatively assessed by
calculating the top 5 estimates on the eight test images.

7
2.2 Theoretical foundation of histogram
2.2.1 Definition of Histogram
A histogram is a graphical representation of the frequency distribution of pixel intensity
values in an image. For digital images, each pixel value typically ranges between [0,255].
The histogram displays the brightness distribution of the image, from dark regions (low
pixel values) to bright regions (high pixel values).
Basically, we have three categories of images with different histogram. An image with
histogram concentrated in the low-value range is too dark. On the other hand, an image
with histogram concentrated in the high-value range is too bright. An image with a well-
distributed histogram tends to be better-looking.
2.2.2 Histogram balance
A balanced histogram is one in which the intensity values are distributed relatively evenly
across the range, avoiding concentrations in specific regions. This indicates the image has
good brightness and contrast and there is no overly dark or overly bright area. The
balance of a histogram can be assessed by calculating the deviation of pixel distributions
The balance of a histogram can be assessed by calculating the deviation of pixel
distributions.
Firstly, compute the relative frequency p(i) of each intensity level i:
h(i)
p(i) = N
h(i):the number of pixels with intensity i.
N: the total number of pixels in the image.
Secondly, compute the mean of the normalized histogram:
L−1
1
μ = L ∑ p (i)
i=0

L: the number of intensity levels


Calculate the maximum deviation from the mean

Dmax =max ⁡(| p ( i )−μ|)

If Dmax exceeds a predefined threshold, the histogram is considered unbalanced.

8
2.2.3 Histogram equalization
Histogram equalization is a technique used to improve its overall constract by adjusting
the intensity distribution. The goal is to redistribute pixel values such that the histogram
becomes more uniform.
For this program, we use the global histogram for discrete case because the desired
objectives are for regular images with simple objects. The formulas for the discrete of
histogram equalization are presented as:
L−1
n L−1
P L−1= where n=∑ nl
n l=0

k
gk =T [ f k ]=∑ Pi
i=0

K k k
nj L−1
sk = T(r k ) = (L−1) ∑ p r ( r j) =( L−1) ∑ = ∑n
j=0 j=0 MN MN j=0 j

for k = 0,1,…,L-1

9
2.3 Program Workflow
2.3.1 Block diagram for program

10
2.3.2 Functions of each block
 Start: marks the beginning of the program.
 Load image: loads the input image.
 Check histogram balance: checks whether the histogram of the image's R, G, and
B channels is balanced.
 Equalize histogram: performs histogram equalization for each R, G, and B channel
and combines the channels back into an RGB image.
 Resize image: resizes the image to 227x227 dimensions to meet AlexNet's input
size requirements.
 Object recognition: classifies the resized image to recognize the object.
 Display images: displays the original image and the equalized image (if needed)
side by side.
 Speak object name: announces the recognized object's name.
 End: marks the end of the program.

2.3.3 Matlab program

11
In order to use this program, the computer must be integrated with the Deep Learning
Toolbox Model for AlexNet Network support package, made by MathWorks Deep
Learning Toolbox Team.
The threshold value of 0.1 is chosen because this is the most common value of processing
image using Matlab, pixel levels can deviate by up to 10% from the mean, which is
generally considered acceptable in histogram processing.
The program uses discrete equation because it is based on fixed pixel intensity levels.
This is a common approach to regular digital image processing, where pixel intensity
levels are represented discretely. The same reason is for the usage of global histogram
equalization.

12
CHAPTER 3. RESULTS
3.1 The original image is balanced in histogram

Original Image tiger cat

3.2 The original image is too dark

Original Image mountain bike

13
3.3 The original image is too bright

Original Image snowmobile

14
CHAPTER 4. CONCLUSION

The project has completed the requirements, including recognizing common objects in
the simple input images, performing image preprocessing steps such as checking and
balancing the histogram to ensure the recognition process is effective, and pronouncing
the name of the recognized object. However, the program still has some limitations when
there are cases of mistakenly recognizing objects with similar shapes. This requires
further improvement in the input image normalization or preprocessing step, as well as
using another CNN network with larger and more diverse training data to increase the
ability to recognize and classify accurately. In the future, the program can be further
optimized to achieve higher performance in more complex recognition problems.

15
REFERENCES

[1]. Đặng Thị Hằng | Phạm Duy Tùng _ Tìm Hiểu Về Mạng Neural Network AlexNet _
2018 https://www.phamduytung.com/blog/2018-06-15-understanding-alexnet/

[2]. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, ImageNet Classification
with Deep Convolutional Neural Networks _ 2012

[3]. Đặng Thị Hằng | Phạm Duy Tùng _ Tìm Hiểu Mạng AlexNet, Mô Hình Giành Chiến
Thắng Tại Cuộc Thi ILSVRC 2012 _ 2019 https://www.phamduytung.com/blog/2019-
05-27-alexnet/

[4]. Nuruzzaman Faruqui _ Matlab Tutorial: Text To Speech _2018

[5]. Lê Mỹ Hà _ Lecture 3: Image enhancement

16

You might also like