0% found this document useful (0 votes)
90 views

Project Report Final

The document discusses image prediction using machine learning. It describes gathering and organizing image data by classifying images and extracting features. A predictive model is then built and used to recognize images. Key frameworks discussed include TensorFlow and Keras for building models, as well as ImageAI, a Python library for image recognition, object detection, custom object detection, video object tracking and custom image recognition. The document examines techniques like Haar-like features, HOG, SIFT and SURF for feature extraction.

Uploaded by

Nandhini Nandy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Project Report Final

The document discusses image prediction using machine learning. It describes gathering and organizing image data by classifying images and extracting features. A predictive model is then built and used to recognize images. Key frameworks discussed include TensorFlow and Keras for building models, as well as ImageAI, a Python library for image recognition, object detection, custom object detection, video object tracking and custom image recognition. The document examines techniques like Haar-like features, HOG, SIFT and SURF for feature extraction.

Uploaded by

Nandhini Nandy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 69

IMAGE PREDICTION

SUMMER PROJECT REPORT

Submitted inpartial fulfilment fortheaward of the degree of

MASTEROFCOMPUTERAPPLICATIONS

By

NANDHINI R P

1913323037007

Under the guidance of

Mrs.U. ShanthaVisalakshi

MCADEPARTMENT
ETHIRAJCOLLEGEFORWOMEN(AUTONOMOUS)

CHENNAI-600 008 JULY2018

BONAFIDE CERTIFICATE ETHIRAJ COLLEGE FOR WOMEN


(AUTONOMOUS)
This is to certify that the report entitled

IMAGE PREDICTION

Being submitted to the Ethiraj College for Women,

Affiliated to the University of Madras, Chennai

by

NANDHINI R P

1913323037007
In partial fulfilment for the award of the degree of

MASTEROFCOMPUTERAPPLICATIONS

is a bonafide report of work carried out by her under my guidance and supervision

Guide

Head of the Department

Place:

Date:

Submitted for the viva-voice examination at…………...…………………on……………..


Examiner–1:……………………………………

(Signature and name of the Examiner)

Examiner–2:……………………………………

(SignatureandnameoftheExaminer)

CERTIFICATE OF ORIGINALITY

I here by declare that the Project entitled “TRANSFER CERTIFICATE

GENERATION”submitted to the MCA DEPARTMENT,ETHIRAJ COLLEGE FOR


WOMEN (AUTONOMOUS)in partial fulfillment for the award of the degree of

MASTER OF COMPUTER APPLICATIONS during 2019-2022 is an authentic record of

my own work carried out under the guidance of Ms.U. SHANTHAVISALAKSI,MCA,

M. Phil.,M.E.,and that the project has not previously formed the basis for the award of any
other degree.

Place:Chennai Signature of the candidate,

Date: NANDHINI R P,

1913323037007.

ACKNOWLEDGEMENT
Apart from the effort ofanyone,thesuccessofanyproject
dependslargelyontheencouragementandguidelinesofmanyothers.I
takethisopportunitytoexpressmygratitudetothosepeoplewhohave
beeninstrumentalinthesuccessoftheproject.
IexpressmythankstoDr.S.BHUVANESWARI,Principali/cforher
supportinincludingthissubjectinourcurriculum. Iexpress my deep sense ofgratitude to ourHead
ofthe DepartmentMrs.A.JOSEPHINEANITHA,MCA,M.Phil.,whogaveme
thiswonderfulopportunityandsupporttodevelopthisproject.
Iowemygreatthankstogreatmanypeoplewhohelpedand
supportedmeduringdevelopmentandwritingofthisprojectreport.My
projectwasguidedandcorrectedwithattentionandcarebymyproject
guide.Shetookimmensecaretogothroughmyprojectandmade
necessarycorrectionsasandwhenneeded.Iexpressmydeepest
thankstomyprojectguideMs.K.VIJAYALAKSHMI,MCA,M.Phil.,M.E.,
forherguidance,withoutwhichthisworkwouldhavebeenareality. I am thankfulto and fortunate
enough to get constant encouragement,supportand guidance from allteaching staffof
DepartmentofMCAwhichhelpedmeinsuccessfullycompletingmy
projectwork.Also,IwouldliketoextendmysincereregardstoourLab assistantforhertimelysupport.
Iwishtoavailmyselfthisopportunity,toexpressmysenseof
gratitudeandlovetomyfriends,mybelovedparentsfortheirmanual
support,strength,andhelpandeverything.

MASTER OF COMPUTER APPLICATIONS

AUGUST 2020
NAME : NANDHINI R.P

REG NO : 1913323037007

SUB : SUMMER PROJECT

SEM : I MCA

ETHIRAJ COLLEGE FOR WOMEN

(AUTONOMOUS)
Chennai – 600 008.

CERTIFICATE

This is to certify that this is the bonafide record work carried out under my supervision in the Computer
Laboratory Course “COMP. LAB. III: DATABASE MANAGEMENT SYSTEMS”, submitted to MCA Department, Ethiraj
College for Women (Autonomous) by,

NANDHINI.R.P
(1913323037007)

as part of the course work leading to the award of the degree of

MASTER OF COMPUTER APPLICATIONS

S. NO
Faculty-In-Charge
TITLE PAGE NO
Head of the Department

1 INTRODUCTION

Submitted For the Laboratory Examination at Ethiraj College For Women (Autonomous) on ……………..
2 ABSTRACT

3 OBJECTIVES OF PROJECT

4 SYSTEM REQUIRMENT
Examiner – 1
5 TENSORFLOW AND KERAS
(Signature and Name of the Examiner)

6 PYTHON

7 IMAGE AI

8 7 STEPS OF MACHINE LEARNING


Examiner – 2
9 Name
(Signature and
DISADVANTAGE OF MACHINE LEARNING
of the Examiner)

10 CLASSIFICATION OF MODELS

11 MULTIPLE CLASSIFICATION

12 CONCLUSION

13 APPENDIX

14 REFERENCE
CHAPTER 1

1. Introduction:

Image recognition refers to technologies that identify places,


logos, people, objects, buildings, and several other variables in images.
Users are sharing vast amounts of data through apps, social networks,
and websites. Additionally, mobile phones equipped with cameras are
leading to the creation of limitless digital images and videos. The large
volume of digital data is being used by companies to deliver better and
smarter services to the people accessing it. Image recognition is a part
of computer vision and a process to identify and detect an object or
attribute in a digital video or image. Computer vision is a broader term
which includes methods of gathering, processing and analyzing data
from the real world. The data is high-dimensional and produces
numerical or symbolic information in the form of decisions. Apart from
image recognition, computer vision also includes event detection,
object recognition, learning, image reconstruction and video tracking.
Facebook can now perform face recognize at 98% accuracy which
is comparable to the ability of humans. Facebook can identify your
friend’s face with only a few tagged pictures. The efficacy of this
technology depends on the ability to classify images. Classification is
pattern matching with data. Images are data in the form of 2-
dimensional matrices. In fact, image recognition is classifying data into
one category out of many. One common and an important example is
optical character recognition (OCR). OCR converts images of typed or
handwritten text into machine- encoded text. The major steps in image
recognition process are gather and organize data, build a predictive
model and use it to recognize images.

Gather and Organize Data

The human eye perceives an image as a set of signals which are


processed by the visual cortex in the brain. This results in a vivid
experience of a scene, associated with concepts and objects recorded
in one’s memory. Image recognition tries to mimic this process.
Computer perceives an image as either a raster or a vector image.
Raster images are a sequence of pixels with discrete numerical values
for colors while vector images are a set of color-annotated polygons
.

To analyze images the geometric encoding is transformed into


constructs depicting physical features and objects. These constructs can
then be logically analyzed by the computer. Organizing data involves
classification and feature extraction. The first step in image
classification is to simplify the image by extracting important
information and leaving out the rest. For example, in the below image if
you want to extract cat from the background you will notice a
significant variation in RGB pixel values.However, by running an edge
detector on the image we can simplify it. You can still easily discern the
circular shape of the face and eyes in these edge images and so we can
conclude that edge detection retains the essential information while
throwing away non-essential information. Some well-known feature
descriptor techniques are Haar-like features introduced by Viola and
Jones, Histogram of Oriented Gradients (HOG), Scale-Invariant Feature
Transform (SIFT), Speeded Up Robust Feature (SURF) etc.
ABSTRACT

To recognize image and to determine their poses in a scene we need to find the
correspondences between the features extracted from the image and those of
the image models.

A python library built to empower developers to build applications and systems


with self-contained computer vision capabilities.Built with simplicity in mind,
ImageAI supports a list of state-of-the-art Machine Learning algorithms for image
recognition,object detection,custom object detection,vedio object
tracking,custom image recognition training and custom predicition.In this project,
Image prediction requires that you have python 3.8 installed as well as some
other Python libraries and frameworks. Before you install ImageAI you must
install the Python 3.8, pip3, Tenserflow 1.13.1, OpenCV, Keras, Numpy. Once
ImageAI is installed, you can start running very few lines of code to perform very
powerful computer vision tasks.
CHAPTER 2

2.OPENCV:

Visual search has become a necessity for many multimedia


applications running on todays computing systems. Tasks such as
recognizing parts of a scene in an image, detecting items in a retail
store, navigating an autonomous drone, etc. have great relevance to
todays rapidly changing environment. Many of the underlying
functions in these tasks rely on high-resolution camera sensors that
are used for capturing light intensity-based data. The data is then
processed by different algorithms to perform tasks such as object
detection, object recognition, image segmentation, etc.
This Fig 1: Visual saliency
Visual attention has gained a lot of traction in computational
neuroscience research over the past few years. Various
computational models have used low-level features to build
information maps, which are then fused together to form what is
popularly called as a saliency map. Given an image to observe, this
saliency map, in essence, provides a compact representation of what
is most important in the image. The map can then be used as a
mechanism to zoom in on identifying the regions of interest (RoIs)
that are most important in the image. For example, in Figure 1,
different saliency models show the extent to which pixels
representing the bicyclist pop out in stark contrast to the
background.
These models assume that the human eye uses its full resolution
across the entire field of view (FOV). However, the resolution drops
off from the center of the fovea towards the periphery and the
human visual system (HVS) is adept at foveating so as to investigate
other areas in the periphery when attention is drawn in that
direction. In other words, our eyes foveate to allow points of interest
to fall on the fovea, which is the region of highest resolution. It is
only after this foveation process that we are capable of gathering
complete information from the object of interest that drew our
attention to it. The HVS has thus been built in such a way that it
becomes necessary to move the eyes in order to facilitate processing
information all around ones environment. It is due to this reason
that humans tend to select nearby locations more frequently than
distant targets and salience maps need to be computed taking this
into account to improve the predictive power of the models.
Understanding the efficiency with which our eyes intelligently take in
pertinent information to perform different tasks has a significant
impact to building the next-generation of autonomous systems.
Building a foveation framework to test and improve saliency models
for real-time autonomous navigation is the focus of this work.
This Fig 2:Framework of AIM [1]
We choose an information theoretic computational saliency model,
Attention based on Information Maximization (AIM) as a building block
for our foveation framework. AIM has been benchmarked against many
other saliency models and it has proven to come significantly close to
human fixations. The model looks to compute visual content as a
measure of surprise or rareness using Shannons self-information
theory. The algorithm is divided into three major sections as shown in
Figure 2. The first section involves creating a sparse representation of
the image by projecting it on a set of learnt basis functions. The next
section involves a density estimation using a histogram back projection
technique. Finally a log-likelihood is computed to give the final
information map. For more details on the algorithm and the theory
behind it, one is pointed to [1].

In order to model the steep roll-off in resolution from fovea to


periphery, once the image is captured by the camera sensor, we build a
three level Gaussian pyramid as shown in Figure 3. To do this, we first
extract a 50% high-resolution center region from Level 1 as our fovea.
After blurring and downsampling, a second region is cropped out from
Level 2, representing the mid-resolution region. Another round of
blurring and downsampling leaves us with the entire FOV but at a much
lower resolution (Level 3). It should be noted that as the resolution
drops off, the FOV is gradually increasing in our framework.

For our experiments, we use a ½” Format C-Mount Fisheye Lens having


a focal length of 1.4 mm and a Field of View (FOV) of 185°. The images
captured are 1920×1920 in size. The images have some inherent
nonlinearity as one moves away from the center, which is similar to the
way the human eyes perceive the world around. We run AIM on each
of these three regions, which returns corresponding information maps.
These information maps represent the salient regions at different
resolutions as shown in Figure 4 (c). There are a number of ways in
which to fuse these information maps to give a final multi-resolution
saliency map. We believe that an adaptive weighting function on each
of these maps will be a valuable parameter to tune in a dynamic
environment. However, for this work, which focuses on static images,
we use weights of w1 = 1/3, w2 = 2/3 and w3 = 1 for the high-resolution
fovea, the mid-resolution region and the low-resolution region
respectively. We use these weights since pixels in the fovea occur thrice
across the pyramid while pixels in the mid-resolution region occur
twice. These weights thus prevent the final saliency map from being
overly center- biased. Since these maps are of different size, they are
appropriately up-sampled and zero-padded before adding them up.

This Fig 4:
Methodology of the proposed framework from left to right: (a) Input
Image (b) Image Pyramids with increasing FOV (c) Visual Attention
Saliency Maps (d) Multi-resolution Attention Map by fusing (c) with
different weights To validate our model termed as Multi-Resolution
AIM (MR-AIM), we ran experiments on a series of patterns as shown in
Figure 5. First we considered a series of spatially distributed red dots of
same dimensions against a black background (Figures 5 (a) and 5 (b)).
As can be seen in the saliency result (Figures 5 (e) and 5 (f)) there is a
gradual decrease in saliency as one moves away from the fovea (Red
corresponds to regions of higher saliency while Blue corresponds to
regions of lower saliency).
This Figure 5: Saliency results for different spatial perturbations (a)-(d)
Input Image, (e)-(h) Saliency Result Onsets are considered to drive
visual attention in a dynamic environment, so in Figure 5(c) we next
considered the arrivals of new objects of interest within the fovea (red
dot) and towards the periphery (yellow dot). Maximum response is
obtained in the region around the yellow dot (Figure 5 (g)). Next, we
consider a movement of the yellow dot further away from the fovea
(Figure 5 (d)). Again we notice a slight shift in saliency moving attention
towards the center (Figure 5 (h)). These experiments give us valuable
information on the mechanisms of our model when the object of
interest is moving relative to the fovea.

6: Qualitative comparisons:

Our next set of experiments was to compare the multi-resolution


model with the original AIM model and evaluate the former, both, in
terms of quality and performance. It should be noted here that the
dataset provided in [4] has images of maximum size 1024×768, while
the framework designed here is ideally targeted towards high-
resolution images that contain a lot of salient objects. Figure 6 (Row 1)
shows an example of such an image with increasing size from left to
right. Row 2 depicts results from the original AIM model. Row 3 shows
the output of the MR-AIM. For smaller image sizes, AIM does a very
good job in spotting the main RoIs. But as the image size starts to
increase, it starts to pick edges as most salient. This is due to the
limited size (21×21) of the basis kernels used. Increasing the size of the
kernels is not a viable option for a real-time system since that would in
turn increase the computation time. MR-AIM has no such problem.
Since it operates on smaller image sizes at different resolutions, it can
detect objects at different scales. There is a bias towards objects in the
center, but the weights do a significant job in capturing RoIs towards
the periphery as well. It should be noted here that MR-AIM would not
pick up objects that become extremely salient in the periphery, but
adding other channels of saliency, such as motion, will make the model
more robust in a dynamic environment.
CHAPTER 3:OBJECTIVES OF THE PROJECT

3.1 Objectives
One such technique is to create model that predict multiple images at
same to process using threading to process every images at same
time.so it can be used in any application at same type.

3.2 Domain Explanation:

Machine learning (ML) is the study of computer algorithms that


improve automatically through experience.[1][2] It is seen as a subset
of artificial intelligence. Machine learning algorithms build a
mathematical model based on sample data, known as "training data",
in order to make predictions or decisions without being explicitly
programmed to do so.[3] Machine learning algorithms are used in a
wide variety of applications, such as email filtering and computer vision,
where it is difficult or infeasible to develop conventional algorithms to
perform the needed tasks. Machine learning is closely related to
computational statistics, which focuses on making predictions using
computers. The study of mathematical optimization delivers methods,
theory and application domains to the field of machine learning. Data
mining is a related field of study, focusing on exploratory data analysis
through unsupervised learning.[5][6] In its application across business
problems, machine learning is also referred to as predictive analytics.

.
CHAPTER 4: SYSTEM REQUIREMENTS

4.1 Hardware Specification


 Processor : i5 and above
 RAM : 8GB 
 Hard Disk : 215GB
 GPU :RTX 2070 & RTX 2080 
4.2 Software Specification
 Environment : : vs code editor / sublime
 Program language : Python 3.8
 Operating System: Windows 10
CHAPTER 5

5. Tensorflow:
TensorFlow is an end-to-end open source
platform for machine learning. It has a comprehensive, flexible
ecosystem of tools, libraries and community resources that lets
researchers push the state-of-the-art in ML and developers easily build
and deploy ML powered applications.

TensorFlow is Google Brain's second-generation system. Version 1.0.0


was released on February 11, 2017. While the reference
implementation runs on single devices, TensorFlow can run on
multiple CPUs and GPUs (with optional CUDA and SYCL extensions
for general-purpose computing on graphics processing
units). TensorFlow is available on 64-bit Linux, macOS, Windows, and
mobile computing platforms including Android and iOS.
Its flexible architecture allows for the easy deployment of computation
across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to
clusters of servers to mobile and edge devices.
TensorFlow computations are expressed as stateful dataflow graphs.
The name TensorFlow derives from the operations that such neural
networks perform on multidimensional data arrays, which are referred
to as tensors. During the Google I/O Conference in June 2016, Jeff Dean
stated that 1,500 repositories on GitHub mentioned TensorFlow, of
which only 5 were from Google.
In December 2017, developers from Google, Cisco, RedHat, CoreOS,
and CaiCloud introduced Kubeflow at a conference. Kubeflow allows
operation and deployment of TensorFlow on Kubernetes.
In March 2018, Google announced TensorFlow.js version 1.0 for
machine learning in JavaScript.
In Jan 2019, Google announced TensorFlow 2.0. It became officially
available in Sep 2019.
In May 2019, Google announced TensorFlow Graphics for deep learning
in computer graphics.

Easy model building :


Build and train ML models easily using intuitive high-level APIs like
Keras with eager execution, which makes for immediate model
iteration and easy debugging.
Robust ML production anywhere:
Easily train and deploy models in the cloud, on-prem, in the browser, or
on-device no matter what language you use.

Powerful experimentation for research :


A simple and flexible architecture to take new ideas from concept to
code, to state- of-the-art models, and to publication faster.

FEATURES OF TENSORFLOW:

 Faster debugging with Python tools


 Dynamic models with Python control flow
 Support for custom and higher-order gradients
 TensorFlow offers multiple levels of abstraction, which helps you
to build and train models.
 TensorFlow allows you to train and deploy your model quickly, no
matter what language or platform you use.
 TensorFlow provides the flexibility and control with features like
the Keras Functional API and Model
 Well-documented so easy to understand
 Probably the most popular easy to use with Python

KERAS:
Keras is an open-source neural-network library written
in Python. It is capable of running on top of TensorFlow, Microsoft
Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-
friendly, modular, and extensible. It was developed as part of the
research effort of project ONEIROS (Open-ended Neuro-Electronic
Intelligent Robot Operating System), and its primary author and
maintainer is François Chollet, a Google engineer. Chollet also is the
author of the XCeption deep neural network model.

FEATURES OF KERAS:

 Focus on user experience.


 Multi-backend and multi-platform.
 Easy production of models
 Allows for easy and fast prototyping
 Convolutional networks support
 Recurrent networks support
 Keras is expressive, flexible, and apt for innovative research.
 Keras is a Python-based framework that makes it easy to debug
and explore.
 Highly modular neural networks library written in Python
 Developed with a focus on allows on fast experimentation

KERAS VS TENSORFLOW:
Keras is a neural network library
while TensorFlow is the open-source library for a number of various
tasks in machine learning. TensorFlow provides both high-
level and low-level APIs while Keras provides only high-level
APIs. ... Keras is built in Python which makes it way more user-friendly
than TensorFlow.

CHAPTER 6
6. PYTHON 3

1. Introduction:

Python is an interpreted, high-level, general-purpose programming
language. Created by Guido van Rossum and first released in 1991,
Python's design philosophy emphasizes code readability with its
notable use of significant whitespace. Its language
constructs and object-oriented approach aim to
help programmers write clear, logical code for small and large-scale
projects.
Python is dynamically typed and garbage-collected. It supports
multiple programming paradigms,
including structured (particularly, procedural), object-oriented,
and functional programming. Python is often described as a "batteries
included" language due to its comprehensive standard library.
Python was conceived in the late 1980s as a successor to the ABC
language. Python 2.0, released in 2000, introduced features like list
comprehensions and a garbage collection system with reference
counting.
Python 3.0, released in 2008, was a major revision of the language that
is not completely backward-compatible, and much Python 2 code does
not run unmodified on Python 3.
The Python 2 language was officially discontinued in 2020 (first planned
for 2015), and "Python 2.7.18 is the last Python 2.7 release and
therefore the last Python 2 release." No more security patches or other
improvements will be released for it. With Python 2's end-of-life, only
Python 3.5.x and later are supported.
Python interpreters are available for many operating systems. A global
community of programmers develops and maintains CPython, a free
and open-source reference implementation. A non-profit organization,
the Python Software Foundation, manages and directs resources for
Python and CPython development.

This reference manual describes the Python programming language. It


is not intended as a tutorial.
While I am trying to be as precise as possible, I chose to use English
rather than formal specifications for everything except syntax and
lexical analysis. This should make the document more understandable
to the average reader, but will leave room for ambiguities.
Consequently, if you were coming from Mars and tried to re-implement
Python from this document alone, you might have to guess things and
in fact you would probably end up implementing quite a different
language. On the other hand, if you are using Python and wonder what
the precise rules about a particular area of the language are, you should
definitely be able to find them here. If you would like to see a more
formal definition of the language, maybe you could volunteer your time
— or invent a cloning machine.
It is dangerous to add too many implementation details to a language
reference document — the implementation may change, and other
implementations of the same language may work differently. On the
other hand, CPython is the one Python implementation in widespread
use (although alternate implementations continue to gain support), and
its particular quirks are sometimes worth being mentioned, especially
where the implementation imposes additional limitations. Therefore,
you’ll find short “implementation notes” sprinkled throughout the text.

Every Python implementation comes with a number of built-in and


standard modules. These are documented in The Python Standard
Library. A few built-in modules are mentioned when they interact in a
significant way with the language definition.

#Sample program:
Hello world:-

# This program prints Hello, world!

print('Hello, world!')

O/P:-

Hello, world!
To Install Packages:-
python -m pip install SomePackage

py -3.4 -m pip install SomePackage # specifically Python 3.4

CHAPTER 7

IMAGE AI:

Prediction Classes

ImageAI provides very powerful yet easy to use classes to perform


Image Recognition tasks. You can perform all of these state-of-the-art
computer vision tasks with python code that ranges between just 5
lines to 12 lines. Once you have Python, other dependencies and
ImageAI installed on your computer system, there is no limit to the
incredible applications you can create. Find below the classes and their
respective functions available for you to use. These classes can be
integrated into any traditional python program you are developing, be
it a website, Windows/Linux/MacOS application or a system that
supports or part of a Local- Area-Network.

======= imageai.Prediction.ImagePrediction =======

The ImagePrediction class provides you the functions to use state-of-


the-art image recognition models like SqueezeNet, ResNet, InceptionV3
and DenseNet that were pre-trained on the the ImageNet-1000
dataset.This means you can use this class to predict/recognize 1000
different objects in any image or number of images. To initiate

the class in your code, you will create a new instance of the class in
your code as seen below

from imageai.Prediction import ImagePredictionprediction =


ImagePrediction()

We have provided pretrained SqueezeNet, ResNet, InceptionV3 and


DenseNet image recognition models which you use with your
ImagePrediction class to recognize images. Find below the link to
download the pre-trained models. You can download the model you
want to use.

Download SqueezeNet Model

Download ResNet Model

Download InceptionV3 Model


Download DenseNet Model

After creating a new instance of the ImagePrediction class, you can use
the functions below to set your instance property and start recognizing
objects in images.

.setModelTypeAsSqueezeNet() , This function sets the model type of


the image recognition instance you created to the SqueezeNet model,
which means you will be performing your image prediction tasks using
the pre-trained “SqueezeNet” model you downloaded from the links
above.

setModelTypeAsResNet() , This function sets the model type of the


image recognition instance you created to the ResNet model, which
means you will be performing your image prediction tasks using the
pre-trained “ResNet” model you downloaded from the links above.

.setModelTypeAsInceptionV3() , This function sets the model type of


the image recognition instance you created to the InecptionV3 model,
which means you will be performing your image prediction tasks using
the pre-trained “InceptionV3” model you downloaded from the links
above.

This function sets the model type of the image recognition instance
you created to the DenseNet model, which meansyou will be
performing your image prediction tasks using the pre-trained
“DenseNet” model you downloaded from the links above.
.setModelPath() , This function accepts a string which must be the path
to the model file you downloaded and must corresponds to the model
type you set for your image prediction instance.

prediction.setModelPath("resnet50_weights_tf_dim_ordering_tf_kerne
ls.h5") – parameter model_path (required) : This is the path to your
downloaded model file. .loadModel() , This function loads the model
from the path you specified in the function call above into your image
prediction instance.

– parameter prediction_speed (optional) : This parameter allows you to


reduce the time it takes to predict in an image by up to 80% which
leads to slight reduction in accuracy. This parameter accepts string
values. The available values are “normal”, “fast”, “faster” and “fastest”.
The default values is “normal” .predictImage() , This is the function that
performs actual prediction of an image. It can be called many times on
many images once the model as been loaded into your prediction
instance. Find example code,parameters of the function and returned
values below predictions, probabilities =
prediction.predictImage("image1.jpg", result_count=10) – parameter
image_input (required) : This refers to the path to your image file,
Numpy array of your image or image file stream of your image,
depending on the input type you specified

—parameter result_count (optional) : This refers to the number of


possible predictions that should be returned. The parameter is set to 5
by default. – parameter input_type (optional) : This refers to the type of
input you are parse into the image_input parameter. It is “file” by
default and it accepts “array” and “stream” as well.
—returns prediction_results (a python list) : The first value returned by
the predictImage function is a list that contains all the possible
prediction results. The results are arranged in descending order of the
percentage probability. – returns prediction_probabilities (a python list)
: The second value returned by the predictImage function is a list that
contains the corresponding percentage probability of all the possible
predictions in the prediction_results.

results_array =
multiple_prediction.predictMultipleImages(all_images_ar
ray, result_count_per_image=5)

for each_result in results_array:


predictions, percentage_probabilities =
each_result["predictions"],
each_result["percentage_probabilities"]
for index in range(len(predictions)):
print(predictions[index] , " : " ,

percentage_probabilities[index])
print("-----------------------")
– parameter sent_images_array (required) : This refers to a list that
contains the path to your image files, Numpy array of your images or
image file stream of your images, depending on the input type you
specified.

– parameter result_count_per_image (optional) : This refers to the


number of possible predictions that should be returned for each of the
images. The parameter is set to 2 by default.

– parameter input_type (optional) : This refers to the format in which


your images are in the list you parsed into the sent_images_array
parameter. It is “file” by default and it accepts “array” and “stream” as
well.

– returns output_array (a python list) : The value returned by the


predictMultipleImages function is a list that contains dictionaries. Each
dictionary correspondes the images contained in the array you parsed
into the sent_images_array. Each dictionary has “prediction_results”
property which is a list of athe prediction result for the image in that
index as well as the “prediction_probabilities” which is a list of the
corresponding percentage probability for each result.
#Sample Code For Predicting One Image:

from imageai.Prediction import

ImagePredictionimport os

execution_path = os.getcwd()

prediction =

ImagePrediction()prediction.setModelTypeAsResNet
()prediction.setModelPath(os.path.join(execution_pa
th,
"resnet50_weights_tf_dim_ordering_tf_kernels.h5"))

prediction.loadModel()

predictions, probabilities =

prediction.predictImage(os.path.join(execution_path,
"image1.jpg"), result_count=10)for eachPrediction,
eachProbability in zip(predictions, probabilities):

print(eachPrediction , " : " , eachProbability)


DETECTION CASES:

ImageAI provides very powerful yet easy to use classes and functions to


perform Image Object Detection and Extraction.
ImageAI allows you to perform all of these with state-of-the-art deep
learning algorithms like RetinaNet, YOLOv3 and TinyYOLOv3.
With ImageAI you can run detection tasks and analyse images.

Find below the classes and their respective functions available for you
to use. These classes can be integrated into any traditional python
program you are developing, be it a website, Windows/Linux/MacOS
application or a system that supports or part of a Local-Area-Network.

======= imageai.Detection.ObjectDetection =======

This ObjectDetection class provides you function to perform object


detection on any image or set of images, using pre-trained models that
was trained on the COCO dataset. The models supported
are RetinaNet, YOLOv3 and TinyYOLOv3. This means you can detect
and recognize 80 different kind of common everyday objects. To get
started, download any of the pre-trained model that you want to use
via the links below.

Download RetinaNet Model - resnet50_coco_best_v2.0.1.h5

Download YOLOv3 Model - yolo.h5

Download TinyYOLOv3 Model - yolo-tiny.h5

Once you have downloaded the model of your choice, you should
create a new instance of the ObjectDetection class as seen in the
sample below:

from imageai.Detection import ObjectDetection

detector = ObjectDetection()
Once you have created an instance of the class, you can use the
functions below to set your instance property and start detecting
objects in images.

.setModelTypeAsRetinaNet() , This function sets the model type of the


object detection instance you created to the RetinaNet model, which
means you will be performing your object detection tasks using the pre-
trained “RetinaNet” model you downloaded from the links above. Find
example code below:
detector.setModelTypeAsRetinaNet()

.setModelTypeAsYOLOv3() , This function sets the model type of the


object detection instance you created to the YOLOv3 model, which
means you will be performing your object detection tasks using the pre-
trained “YOLOv3” model you downloaded from the links above. Find
example code below:

detector.setModelTypeAsYOLOv3()

.setModelTypeAsTinyYOLOv3() , This function sets the model type of


the object detection instance you created to the TinyYOLOv3 model,
which means you will be performing your object detection tasks using
the pre-trained “TinyYOLOv3” model you downloaded from the links
above. Find example code below:

detector.setModelTypeAsTinyYOLOv3()
.setModelPath() , This function accepts a string which must be the path
to the model file you downloaded and must corresponds to the model
type you set for your object detection instance. Find example code,and
parameters of the function below:

detector.setModelPath("yolo.h5")

– parameter model_path (required) : This is the path to your


downloaded model file.

.loadModel() , This function loads the model from the path you
specified in the function call above into your object detection instance.
Find example code below:

detector.loadModel()

– parameter detection_speed (optional) : This parameter


allows you to reduce the time it takes to detect objects in an
image by up to 80% which leads to slight reduction in
accuracy. This parameter accepts string values. The available
values are “normal”, “fast”, “faster”, “fastest” and “flash”.
The default values is “normal”

.detectObjectsFromImage() , This is the function that performs object


detection task after the model as loaded. It can be called many times to
detect objects in any number of images. Find example code below:
detections =
detector.detectObjectsFromImage(input_image="image.jpg",
output_image_path="imagenew.jpg",
minimum_percentage_probability=30)

– parameter input_image (required) : This refers to the path


to image file which you want to detect. You can set this
parameter to the Numpy array of File stream of any image if
you set the paramter input_type to “array” or “stream”

—parameter output_image_path (required only
if input_type = “file” ) : This refers to the file path to which
the detected image will be saved. It is required only
if input_type = “file”

– parameter minimum_percentage_probability (optional ) :
This parameter is used to determine the integrity of the
detection results. Lowering the value shows more objects
while increasing the value ensures objects with the highest
accuracy are detected. The default value is 50.

—parameter output_type (optional ) : This parameter is


used to set the format in which the detected image will be
produced. The available values are “file” and “array”. The
default value is “file”. If this parameter is set to “array”, the
function will return a Numpy array of the detected image.
See sample below::

returned_image, detections =
detector.detectObjectsFromImage(input_image=
”image.jpg”, output_type=”array”,
minimum_percentage_probability=30)

—parameter display_percentage_probability (optional ) :
This parameter can be used to hide the percentage
probability of each object detected in the detected image if
set to False. The default values is True.

– parameter display_object_name (optional ) : This
parameter can be used to hide the name of each object
detected in the detected image if set to False. The default
values is True.

—parameter extract_detected_objects (optional ) : This
parameter can be used to extract and save/return each
object detected in an image as a seperate image. The default
values is False.

– parameter thread_safe (optional) : This ensures the


loaded detection model works across all threads if set to
true.

—returns : The returned values will depend on the


parameters parsed into
the detectObjectsFromImage() function. See the comments
and code below

If all required parameters are set and


‘output_image_path’ is set to a file path you
want the detected image to be saved, the
function will return:
1. an array of dictionaries, with each dictionary
corresponding to the objects

2.detected in the image. Each dictionary


contains the following property:

 name (string)
 percentage_probability (float)
 box_points (tuple of x1,y1,x2 and y2
coordinates)

“”” detections =
detector.detectObjectsFromImage(input_image=
”image.jpg”,
output_image_path=”imagenew.jpg”,
minimum_percentage_probability=30)

“”“
If all required parameters are set and
output_type = ‘array’ ,the function will return

1. a numpy array of the detected image


an array of dictionaries, with each dictionary
corresponding to the objects

detected in the image. Each dictionary contains


the following property:

 name (string)
 percentage_probability (float)
 box_points (list of x1,y1,x2 and y2
coordinates)

“”” returned_image, detections =


detector.detectObjectsFromImage(input_image=
”image.jpg”, output_type=”array”,
minimum_percentage_probability=30)

“”“
If extract_detected_objects = True and
‘output_image_path’ is set to a file path you
want
the detected image to be saved, the function will
return: 1. an array of dictionaries, with each
dictionary corresponding to the objects
detected in the image. Each dictionary
contains the following property: *
name (string) *
percentage_probability (float) *
box_points (list of x1,y1,x2 and y2
coordinates)

1. an array of string paths to the image of each


object extracted from the image

“”” detections, extracted_objects =


detector.detectObjectsFromImage(input_image=
”image.jpg”,
output_image_path=”imagenew.jpg”,
extract_detected_objects=True,
minimum_percentage_probability=30)

“”“
If extract_detected_objects = True and
output_type = ‘array’, the the function will
return:

1. a numpy array of the detected image

2.an array of dictionaries, with each dictionary


corresponding to the objects
3.detected in the image. Each dictionary
contains the following property: * name
(string) * percentage_probability (float) *
box_points (list of x1,y1,x2 and y2
coordinates)

4.an array of numpy arrays of each object


detected in the image

“”” returned_image, detections,


extracted_objects =
detector.detectObjectsFromImage(input_image=
”image.jpg”, output_type=”array”,
extract_detected_objects=True,
minimum_percentage_probability=30)

.CustomObjects() , This function is used when you want to detect only a


selected number of objects. It returns a dictionary of objects and their
True or False values. To detect selected objects in an image, you will
have to use the dictionary returned by the this function with
the detectCustomObjectsFromImage() function.
.detectCustomObjectsFromImage(), This function have all the
parameters and returns all the values
the detectObjectsFromImage() functions does but a slight difference.
This function let detect only selected objects in an image. Unlike the
normal detectObjectsFromImage() function, this needs an extra
parameter which is “custom_object” which accepts the dictionary
returned by the CustomObjects() function. In the sample below, we set
the detection funtion to report only detections on persons and dogs:

custom = detector.CustomObjects(person=True, dog=True)


detections =
detector.detectCustomObjectsFromImage( custom_objects=custom,
input_image=os.path.join(execution_path , "image3.jpg"),
output_image_path=os.path.join(execution_path , "image3new-
custom.jpg"), minimum_percentage_probability=30)

Sample Image Object Detection code

Find below a code sample for detecting objects in an image:

from imageai.Detection import ObjectDetectionimport os

execution_path = os.getcwd()

detector =
ObjectDetection()detector.setModelTypeAsYOLOv3()detector.setModel
Path( os.path.join(execution_path ,
"yolo.h5"))detector.loadModel()detections =
detector.detectObjectsFromImage(input_image=os.path.join(execution
_path , "image.jpg"), output_image_path=os.path.join(execution_path ,
"imagenew.jpg"), minimum_percentage_probability=30)

for eachObject in detections:

print(eachObject["name"] , " : ",


eachObject["percentage_probability"], " : ", eachObject["box_points"] )

print("--------------------------------")
CHAPTER 8

THE SEVEN STEPS OF MACHINE LEARNING:

From the detection of skin cancer to detecting escalators in need of


repairs, or to its evolution today, machine learning has granted
computer systems new abilities that we could have never thought of.
But what is machine learning and how does it really work under the
hood? What actually are the steps to machine Learning? Machine
learning is a field of computer science that gives computers the ability
to learn without being programmed explicitly. The power of machine
learning is that you can determine how to differentiate using models,
rather than using human judgment. The basic steps that lead to
machine learning and will teach you how it works are described below
in a big picture:

1. Gathering data
2. Preparing that data
3. Choosing a model
4. Training
5. Evaluation
6. Hyper parameter tuning
7. Prediction.
1. Gathering Data: Once you know exactly what you want and the equipments
are in hand, it takes you to the first real step of machine learning- Gathering Data.
This step is very crucial as the quality and quantity of data gathered will directly
determine how good the predictive model will turn out to be. The data collected
is then tabulated and called as Training Data.

2.

3. Data Preparation: After the training data is gathered, you move on to the next
step of machine learning: Data preparation, where the data is loaded into a
suitable place and then prepared for use in machine learning training. Here, the
data is first put all together and then the order is randomized as the order of data
should not affect what is learned. This is also a good enough time to do any
visualizations of the data, as that will help you see if there are any relevant
relationships between the different variables, how you can take their advantage
and as well as show you if there are any data imbalances present. Also, the data
now has to be split into two parts. The first part that is used in training our model,
will be the majority of the dataset and the second will be used for the evaluation
of the trained models performance. The other forms of adjusting and
manipulation like normalization, error correction, and more take place at this
step.

4. Choosing a model: The next step that follows in the workflow is


choosing a model among the many that researchers and data scientists
have created over the years. Make the choice of the right one that
should get the job done

5. Training: After the before steps are completed, you then move onto
what is often considered the bulk of machine learning called training
where the data is used to incrementally improve the model’s ability to
predict. The training process involves initializing some random values
for say A and B of our model, predict the output with those values, then
compare it with the model's prediction and then adjust the values so
that they match the predictions that were made previously. This
process then repeats and each cycle of updating is called one training
step.

6. Evaluation: Once training is complete, you now check if it is good


enough using this step. This is where that dataset you set aside earlier
comes into play. Evaluation allows the testing of the model against data
that has never been seen and used for training and is meant to be
representative of how the model might perform when in the real
world..

7. Parameter Tuning: Once the evaluation is over, any further


improvement in your training can be possible by tuning the parameters.
There were a few parameters that were implicitly assumed when the
training was done. Another parameter included is the learning rate that
defines how far the line is shifted during each step, based on the
information from the previous training step. These values all play a role
in the accuracy of the training model, and how long the training will
take. For models that are more complex, initial conditions play a
significant role in the determination of the outcome of training.
Differences can be seen depending on whether a model starts off
training with values initialized to zeroes versus some distribution of
values, which then leads to the question of which distribution is to be
used. Since there are many considerations at this phase of training, it’s
important that you define what makes a model good. These
parameters are referred to as Hyper parameters. The adjustment or
tuning of these parameters depends on the dataset, model, and the
training process. Once you are done with these parameters and are
satisfied you can move on to the last step.

8. Prediction: Machine learning is basically using data to answer


questions. So this is the final step where you get to answer few
questions. This is the point where the value of machine learning is
realized. Here you can Finally use your model to predict the outcome of
what you want. The above-mentioned steps take you from where you
create a model to where you Predict its output and thus acts as a
learning path
CHAPTER 9

Disadvantages of Machine Learning:

Similar to the advantages of Machine Learning, we should also know


the disadvantages of Machine Learning. If you dont know the cons, you
wont know the risks of ML. So, lets have a look at these disadvantages:

1. Possibility of High Error In ML, we can choose the algorithms based


on accurate results. For that, we have to run the results on every
algorithm. The main problem occurs in the training and testing of data.
The data is huge, so sometimes removing errors becomes nearly
impossible. These errors can cause a headache to users. Since the data
is huge, the errors take a lot of time to resolve.
2. Algorithm Selection The selection of an algorithm in Machine
Learning is still a manual job. We have to run and test our data in all the
algorithms. After that only we can decide what algorithm we want. We
choose them on the basis of result accuracy. The process is very much
time-consuming.
3. Data Acquisition In ML, we constantly work on data. We take a huge
amount of data for training and testing. This process can sometimes
cause data inconsistency. The reason is some data constantly keep on
updating. So, we have to wait for the new data to arrive. If not, the old
and new data might give different results. That is not a good sign for an
algorithm.
4. Time and Space Many ML algorithms might take more time than you
think. Even if it’s the best algorithm it might sometimes surprise you. If
your data is large and advanced, the system will take time. This may
sometimes cause the consumption of more CPU power. Even with GPUs
alongside, it sometimes becomes hectic. Also, the data might use more
than the allotted space.

CHAPTER 10
TYPES OF CLASSIFICATION MODEL:
7 Types of Classification Algorithms:
1. Logistic Regression
2. Naïve Bayes
3. Stochastic Gradient Descent
4. K-Nearest Neighbours
5. Decision Tree
6. Random Forest
7. Support Vector Machine
Logistic Regression Definition:
Logistic regression is a machine learning algorithm for classification. In
this algorithm, the probabilities describing the possible outcomes of a
single trial are modelled using a logistic function. Advantages: Logistic
regression is designed for this purpose (classification), and is most
useful for understanding the influence of several independent variables
on a single outcome variable. Disadvantages: Works only when the
predicted variable is binary, assumes all predictors are independent of
each other and assumes data is free of missing values.

Naïve Bayes
Definition: Naive Bayes algorithm based on Bayes’ theorem with the
assumption of independence between every pair of features. Naive
Bayes classifiers work well in many real-world situations such as
document classification and spam filtering. Advantages: This algorithm
requires a small amount of training data to estimate the necessary
parameters. Naive Bayes classifiers are extremely fast compared to
more sophisticated methods. Disadvantages: Naive Bayes is is known to
be a bad estimator.

Stochastic Gradient Descent Definition:


Stochastic gradient descent is a simple and very efficient approach to
fit linear models. It is particularly useful when the number of samples is
very large. It supports different loss functions and penalties for
classification. Advantages: Efficiency and ease of implementation.
Disadvantages: Requires a number of hyper-parameters and it is
sensitive to feature scaling.

K-Nearest Neighbours Definition:


Neighbours based classification is a type of lazy learning as it does not
attempt to construct a general internal model, but simply stores
instances of the training data. Classification is computed from a simple
majority vote of the k nearest neighbours of each point. Advantages:
This algorithm is simple to implement, robust to noisy training data,
and effective if training data is large. Disadvantages: Need to determine
the value of K and the computation cost is high as it needs to compute
the distance of each instance to all the training samples.
Decision Tree Definition:
Given a data of attributes together with its classes, a decision tree
produces a sequence of rules that can be used to classify the data.
Advantages: Decision Tree is simple to understand and visualise,
requires little data preparation, and can handle both numerical and
categorical data.
Random Forest Definition:
Random forest classifier is a meta-estimator that fits a number of
decision trees on various sub-samples of datasets and uses average to
improve the predictive accuracy of the model and controls over-fitting.
The sub-sample size is always the same as the original input sample size
but the samples are drawn with replacement. Advantages: Reduction in
over-fitting and random forest classifier is more accurate than decision
trees in most cases. Disadvantages: Slow real time prediction, difficult
to implement, and complex algorithm.
Support Vector Machine Definition:
Support vector machine is a representation of the training data as
points in space separated into categories by a clear gap that is as wide
as possible. New examples are then mapped into that same space and
predicted to belong to a category based on which side of the gap they
fall. Advantages: Effective in high dimensional spaces and uses a subset
of training points in the decision function so it is also memory efficient.
Disadvantages: The algorithm does not directly provide probability
estimates, these are calculated using an expensive five-fold cross-
validation.
CHAPTER 11

11 MULTICLASS CLASSIFICATION:

Multiclass classification is a classification task that consists of more


than two classes, (ie. using a model to identify animal types in images
from an encyclopedia). In multiclass classification, a sample can only
have one class (ie. an elephant is only an elephant; it is not also a
lemur). Outside of regression, multiclass classification is probably the
most common machine learning task. In classification, we are
presented with a number of training examples divided into K separate
classes, and we build a machine learning model to predict to which of
those classes previously unseen data belongs (ie. the animal types from
the example above). In seeing the training data, the model learns
patterns specific to each class and uses those patterns to predict the
membership of future data.
How classifier machine learning works:
Hundreds of models exist for classification. In fact, it’s often possible to
take a model that works for regression and make it into a classification
model. This is basically how logistic regression works. We model a
linear response WX + b to an input and turn it into a probability value
between 0 and 1 by feeding that response into a sigmoid function. We
then predict that an input belongs to class 0 if the model outputs a
probability greater than 0.5 and belongs to class 1 otherwise. Another
common model for classification is the support vector machine (SVM).
An SVM works by projecting the data into a higher dimensional space
and separating it into different classes by using a single (or set of)
hyperplanes. A single SVM does binary classification and can
differentiate between two classes. In order to differentiate between K
classes, one can use (K – 1) SVMs. Each one would predict membership
in one of the K classes. Naive Bayes in ML classifiers Within the realm of
natural language processing and text classification, the Naive Bayes
model is quite popular. Its popularity in large part arises from the fact
of how simple it is and how quickly it trains. In the Naive Bayes
classifier, we use Bayes’ Theorem to break down the joint probability of
membership in a class into a series of conditional probabilities. The
model makes the naive assumption (hence Naive Bayes) that all the
input features to the model are mutually independent. While this isn’t
true, it’s often a good enough approximation to get the results we
want. The probability of class membership then breaks down into a
product of probabilities, and we just classify an input X as class k if k
maximizes this product.

CHAPTER 12

CONCLUSION

In this paper, we conclude that our machine learning model multiclass


classifier model has higher accuracy and quick prediction time so it can
be used in any kind of application or any field for research it can help
you minimize decision making A easy process and this kind of accurate
model are efficiently in medical field for cancer recognition, heart
disease recognition etc.In space field we can use this model for space
surveillance for new planets,stars,asteroids etc.So this model can be
used in any field of respective all we have to do is before deployment of
model we have to train the model with large amount of featured data
so that model can have same high accuracy and quick prediction.

CHAPTER 13
APPENDIX

from imageai.Prediction import ImagePredictionimport os


execution_path = os.getcwd()
prediction =
ImagePrediction()prediction.setModelTypeAsResNet()prediction
.setModelPath(os.path.join(execution_path,
"resnet50_weights_tf_dim_ordering_tf_kernels.h5"))prediction
.loadModel()
predictions, probabilities =
prediction.predictImage(os.path.join(execution_path,
"image1.jpg"), result_count=10)for eachPrediction,
eachProbability in zip(predictions, probabilities):
print(eachPrediction , " : " , eachProbability)

Sample Result

convertible : 52.459555864334106
sports_car : 37.61284649372101
pickup : 3.1751200556755066
car_wheel : 1.817505806684494
minivan : 1.7487050965428352
kite : 10.20539253950119
white_stork : 1.6472270712256432
CHAPTER 14

REFERENCE

1 ImageAI https://github.com/OlafenwaMoses/ImageAI
2 Python https://www.python.org/

You might also like