0% found this document useful (0 votes)

33 views54 pages

AI Complete Notes For B.SC AI & ML Osmania University

The document outlines the AI project cycle, which includes phases such as designing, development, testing, and deployment, emphasizing that AI projects are iterative and cyclic. It discusses problem scoping, goal setting, stakeholder identification, and ethical concerns, while also highlighting the role of AI in achieving Sustainable Development Goals. Additionally, it covers data acquisition, types of data, modeling approaches, evaluation, and deployment of AI models.

Uploaded by

ramdas charlapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views54 pages

AI Complete Notes For B.SC AI & ML Osmania University

Uploaded by

ramdas charlapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

CLASS 10 - Artificial Intelligence

AI PROJECT CYCLE

Introduction:

AI shares a close association with Information Technology (IT); therefore the development phases of
an AI and IT project would have similar basic structures. However, the AI systems do not have their unique
components that do not apply to the other IT projects.

IT Project Cycle:
Every IT project has four phases:

1. Designing Phase

2. Development Phase

3. Testing Phase

4. Deployment Phase

AI projects to go through all these phases.

AI Project Cycle:
In IT projects, the deployment phase is the last phase of a project. Any change after the deployment
phase is considered a separate project. The AI projects do not end at the deployment phase. Instead in this
phase, the AI project system turned into cyclic.

Test, Deploy-> System learns -> test, modify -> System learns -> test, modify

An AI project cycle has various phases and sub-phases.

PROBLEM SCOPING:
1.1 UNDERSTANDING PROBLEM SCOPING AND SUSTAINABLE DEVELOPMENT GOALS:

AI supported solutions can be used in various situations for assisting public and private sector
organisations to perform more efficiently and for enhancing the services provided to both customers as well
as the general public.

The problem statements may begin with negative or positive terms.

For example:

Positive term: We want to increase the computer skills of the learners

Negative term : Our learners are facing difficulties with computer skills.

The following 5 questions can prove helpful in the identification of the problems:

1. How can better decisions be made?

The AI system helps in the areas where the organization requires more data based on decision making.
The employees of an organization use their instincts and inputs to decide if making more data driven
decision making will be helpful or not.

2. In which areas are we most inefficient?

AI systems can be abundantly helpful in making businesses more efficient. The organization has to
identify the fields in which they are inefficient. Using AI solutions in such fields can help the
organizations achieve more efficiency.

3. Where can the relevant data be found?

The organisations have to identify the fields where they have most of the data. They can see if there
are some fields where AI can help them in becoming better.
4. What goals do we want to attain?

The organisations can also clearly decide the goals that they want to achieve as an organization. The
can also visualize the outcomes that the AI augmented systems can produce.

5. Will this solve our problem?

Lastly, the organisations are required to examine if the identified area or problem can be solved by
using the AI. There is a possibility that the relevant AI systems might not help solve the issue.

Problem Identification depends upon:

1. Determining if the AI system will create any real value for the organization or not.

2. Determining if the selected AI system will be feasible on the parameters of data, money, workforce,
time, etc.,

PROBLEM SCOPING:

Problem Scoping is the stage that begins after identifying the problem in developing the AI projects.
The AI process begins with problem definition, followed by brainstorming, designing, building, testing
(repeated as necessary) and concludes with showcasing or sharing the task.

STEPS OF PROBLEM SCOPING:

GOAL SETTING:

While setting goals, we first need to determine the reasons for the problems. For example, the struggle of
addressing customer grievance in time.

There can be various reasons for this problem. For example:

1. Less amount of information with the customer care representatives.
2. Overstrained customer care executives.
3. Absence of proper grievance handling procedures.
4. Challenging established procedures.
5. Suspension of communications between the customer care representatives and concerned
departments.

Identifying the stakeholders

After identifying the goals, we have to identify the different stakeholders who will be involved in achieving
those goals.
Let us take the example of struggle in addressing customer grievances in which the stakeholders can be
1. Customer care executives
2. HR department
3. Customers
4. Grievance department

Identifying the existing measures:

In the process of problem scoping, it is essential to know the current actions that are being taken by
different stakeholders to find a solution to the problem. This can help in two ways:
1. First, it can help us understand the efforts made and judge their effectiveness. We can avoid
undertaking actions that have proved to be furtile.
2. Second, it will avoid the duplication of work. We can identify the actions that have proved effective
and build on them.

Identifying ethical concerns:

The last step in the problem scoping process is to identify ethical concerns associated with the selected
problem and goal. These ethical concerns can be related to the groups of people, privacy concerns,
environmental concerns etc.

Role of AI in Sustainable Development

The development or progress of a society that meets the needs of the present and ensures that the
future generations are able to meet these needs as well is called ‘Sustainable Development’

Sustainable Development Goals are the set of 17 goals that were adopted at the United Nations
Conference on Sustainable Development in Rio de Janeiro in 2012. The SDGs were adopted to meet the
urgent challenges of the world covering the following three dimensions.
1.Environmental
2.Social
3.Economical
There are total 17 Sustainable Development Goals. They are

1. Zero Hunger
Food supply is aided by AI throughout the supply network, from manufacturing to transportation and
distribution.

2. No Poverty
Artificial intelligence has the potential to help reduce the number of individuals driven into poverty as a result
of natural disasters.

3. Good Health and Well-Being

Doctors and radiologists are collaborating to identify ways to use AI to detect brain cancers.

4. Quality Education
AI can assist in personalization, allowing young girls and boys to learn more effectively.

5. Climate action
AI may be used to improve electricity demand forecasts and associated predictions from sources such as
sunlight and wind.

6. Industry Innovation and Infrastructure

The industry can benefit from AI by applying it to develop reality modeling applications.

7. Gender Equality
AI can tell you how many of your applications are men and how many are women.

8. Clean Water and Sanitation

By measuring, forecasting, and modifying water efficiency, AI can ensure that more people have access to
clean water.

9. Affordable and clean energy

AI can improve energy output by predicting and adapting to changing conditions and demand.

10. Sustainable cities and communities

AI can help develop electrified and even driverless vehicles, as well as smarter infrastructure planning,
resulting in large and necessary reductions in air pollution.

11. Decent work and economic growth

AI can make work-life more safe, such as predictive maintenance of systems, plants, bridges.

12. Reduce Inequalities

AI can be used to identify discrepancies in legal practices and rules so that new, equal foundations can be
built.

13. Responsible consumption and production

AI can accomplish more with less, reducing inefficiencies in production, improving quality, and optimizing
logistics.

14. Life on land

Desertification can be noticed more easily with AI.
15. Life below water
AI can act as a lifeguard by monitoring the state of marine resources and assisting in the prevention and
reduction of pollution in our oceans.

16. Partnerships for the goals

The Global Partnership on Artificial Intelligence is a multi-stakeholder project aimed at closing the gap
between AI theory and reality by funding cutting-edge research and applied initiatives on AI-related
objectives.

17. Peace, Justice and strong institutions

AI helps to providing equal access to justice to every one and defending everyone’s fundamental rights.

4 Ws PROBLEM CANVAS

Fundamentally , the 4Ws Problem canvas includes the following four questions.

Who Canvas - Who is having the Problem?

What Canvas – What is the Nature of the Problem?

Where Canvas – Where does the Problem Arise?

Why ? – Why do you think it is a problem worth solving?

The information filled in the 4Ws Problem Canvas can be summarized in a specific format, so that, a
template is created which can be shared with all the people involved the project to look at and revisit the
basics of the problem whenever needed.

The Stakeholders – People Who?

concerned with the
problem

Have a problem of Issue, Problem What?

When/While Context, situation Where?

An ideal solution How will the solution help Why?

would the stakeholders

DATA ACQUISTION:
‘Data’ refers to the collection of facts and statistics used for reference or analysis.
For example, in a cricket match, if we are to predict the performance of a player before the match, we
must procure his past data, which might include, how many runs has he scored, which conditions are
favourable to him, how he bats against fast bowlers etc.,

Now once we have acquired the data, our AI model will be able to predict the player’s performance
efficiently. Here, the data procured is used for training our AI model and is known as the Training Data and the
data collected while the player is performing currently is called Testing Data as this data set is used to test the
accuracy of our trained AI model.

Types of Data:

At the basic level, data can be classified into two broad categories:
i. Numerical Data
Numerical data involves the use of numbers. This is the data on which we can perform mathematical
operations such as addition, subtraction, multiplication etc.,
It is divided into (i) discrete numerical data – contains only whole numbers. For example , the number of
students in a class and (ii) continuous data – in the form of any value within a range. For example, the weight
of books in your bag.
ii. Textual Data

It is made up of characters and numbers but we cannot perform calculations on these numbers. For
Example, Grapes, Mickey etc.,

STRUCTURAL CLASSIFICATION:

Classification of data can also be done on the basic structure. Based on structures data can be classified into
three types:
1. Structured Data : This type of data has a predefined data model and it is organized in a predefined
manner. Examples: Train Schedule, Mark sheets of students etc.,

2. Unstructured Data : This type of data does not have any predefined structure. It can take any form.
Examples: Videos, Presentations, Email etc.,

3. Semi Structured Data :This type of data has a structure that has the qualities of both structured as well
as unstructured data. This data is not organized logically.

OTHER DATA CLASSIFICATION:

Time Stamped Data: This type of data has time order in it which defines its sequence. The time order can be
according to some event time i.e when the data was collected or processed.

Machine Data: Systems or programs mechanically generate this type of data. List of phone calls made the
logging details on a computer, data in emails, etc., are some of the examples of machine data.

Spatiotemporal Data: This type of data has both location and time information i.e the time when an event
was captured or collected along with the location where the event capture or collection took place.

Open Data: This type of data is freely available for everyone to use. Open data is not restricted through
copyrights , patents , controls, etc.,

Real Time Data: This type of data is available as soon as an event takes place.

Data Exploration: Data exploration is the first step of data analysis which is used to visualize data to uncover
insights from the start or identify areas or patterns to dive into and dig more. It allows for a deeper, more
detailed, and better understanding of the data. Data Visualization is a part of this where we visualize and
present the data in terms of tables, pie charts, bar graphs, line graphs, bubble chart, choropleth map etc.
Modelling: First of, what is an AI Model? An AI model is a program or algorithm that utilizes a set of data that
enables it to recognize certain patterns. This allows it to reach a conclusion or make a prediction when
provided with sufficient information. Now, what is Modelling? Modelling is the process in which different
models based on the visualized data can be created and even checked for the advantages and disadvantages of
the model.

There are 2 Approaches to make a Machine Learning Model.

Learning Based Approach: Learning Based Approach is based on Machine Learning experience with the data
fed.

● Supervised Learning: The data that you have collected here is labelled and so you know what input
needs to be mapped to what output. This helps you correct your algorithm if it makes a mistake in giving
you the answer. Supervised Learning is used in classifying mail as spam.

● Unsupervised Learning: The data collected here has no labels and you are unsure about the outputs.
So, you model your algorithm such that it can understand patterns from the data and output the
required answer. You do not interfere when the algorithm learns.

● Reinforcement Learning: There is no data in this kind of learning, you model the algorithm such that it
interacts with the environment and if the algorithm does a decent job, you reward it, else you punish th
algorithm. (Reward or Penalty Policy). With continuous interactions and learning, it goes from being bad to being
the best that it can for the problem assigned to it.

Rule Based Approach: A rule-based system uses rules as the knowledge representation.

These rules are coded into the system in the form of if-then-else statements.

The main idea of a rule-based system is to capture the knowledge of a human expert in a specialized domain
and embed it within a computer system.

A rule-based system is like a human being born with fixed knowledge.

The knowledge of that human being doesn’t change over time.

This implies that, when this human being encounters a problem for which no rules have been designed, then
this human gets stuck and so won’t be able to solve the problem. In a sense, the human being doesn’t even
understand the problem.

Evaluation: The method of understanding the reliability of an API Evaluation and is based on the outputs which
is received by the feeding the data into the model and comparing the output with the actual answers.

Deployment: Deployment is the method by which you integrate a machine learning model into an existing
production environment to make practical business decisions based on data.

To start using a model for practical decision-making, it needs to be effectively deployed into production.

If you cannot reliably get practical insights from your model, then the impact of the model is severely limited,
and it is useless.

To get the most value out of machine learning models, it is important to seamlessly deploy them into
production so a business can start using them to make practical decisions.
CLASS 10 - Artificial Intelligence

INTRODUCTION TO AI
1. FOUNDATIONAL CONCEPTS:
The term artificial intelligence was first used in 1956 at the Dartmouth Conference by John McCarthy.
The research in AI during the early period of its development was restricted to problem-solving. The focus
was on using symbolic methods for developing the AI systems. The US Department of Defence was one of the
pioneers in the field of AI. Even back then, this department was interested in creating a computer that acted
and behaved like human beings. These human like robots and artificial intelligence are still restricted to
Science fiction, through research related to them has been going on for a considerable time now.
In 1950s, Alan Turing developed a test named the ‘Turing Test’. He said that a machine can be
regarded as intelligent if it qualifies the test. The test is defined as : “Computer (software or algorithm) passes
the test if a human after posing questions cannot tell if the responses came from another human or not”.
WHAT IS INTELLIGENCE?

Intelligence involves various mental abilities like logic, reasoning, problem solving and planning.
According to research, intelligence is broadly based on the following parameters.
Learning from experience: Acquiring, remembering and applying knowledge learned, is an important
component of intelligence.
Identifying Problems : To apply their knowledge, people must be able to identify possible problems and
situations where their knowledge might be helpful.
Problem Solving: People must then be able to use what they have learned and apply it to get innovative
and useful solutions to a problem
Definition of intelligence: Intelligence can be defines as an understanding of the ability, calculation,
reason, relationships and analogies of a system, learning from experience, learning from the store and
memory, solving problems, understanding complex ideas, using natural language, the ability to use fluently,
classify, generalise and adapt to new situations.
Ability to make accurate decisions: Intelligence involves competence to make complex decisions
from a set of factors that makes numerous decision makers. It measures an individual’s intelligence in general
as well as in the context of AI.
Ability to prove the outcomes: Making an accurate decision does not end a process. Once the decision has
been made, it needs to be proved. AI can prove that the decision has been taken. This is the right way of
measuring intellect.
Ability to think logically: All things in this world cannot be proved using mechanical formulas. The
competence to think logically and apply common sense to prove and understand things is also a measurement
of intellect.
Ability to learn and improve: The more you learn from the environment, the more you get new
information. Researchers are trying to enhance the intelligence of machines and systems in order to enable
them to make a better decision , give better reasoning, think more logically and learn from the environment.
DECISION MAKING:
Decision making is the process of making choices by identifying a situation, gathering relevant
information, and accessing any alternative resolutions.
The process of decision making is one of the basic human processes at the core of our interaction with
the others. Decisions can be characterised as structured, semi structured or unstructured.

Problems related to structured decision have an already known solution and thus do not need decision
or support. Examples of structured data include names, dates, addresses, geolocation,etc.
Problems related to unstructured do not have any agreed upon solution. They depend on the
understanding and choice of the decision maker. For example, ordering food in the restaurant can be called
an unstructured decision.
Problems related to semi structured decision, therefore depend upon decision support. They involve a
combination of interaction with the user and analytical methods to create alternatives based on criteria and
optimal solutions. When we use AI techniques for the development of the alternatives, the consequential
system is referred to as an intelligent decision support system (IDSS).
WHAT IS ARIFICIAL INTELLIGENCE?
Artificial Intelligence means a human made interface that can reason and integrate the knowledge.
However the fact is that there is no consensus on what AI demands. We believe that an AI must show atleast
some of the behaviors associated with human intelligence, such as planning, learning, reasoning, problem
solving, knowledge representation, perception, motion, manipulation, and to a lesser extent , social
intelligence and creativity.
Applications of AI around us
● Machine learning: It refers to a computer systems ability to improve its performance as it gets new
information, however , but without a need to follow clear programming instructions. In this type of
learning, the system automatically identifies patterns and tries to make estimates.
● Robotics: in robotics, the scientists combine artificial perception and automated planning with the
actuators. It creates moving and talking things that we both fear and appreciate.
● Self Driving Cars: Automated cars also possess the qualities of artificial perception , and predictive
behaviour in case of steering, braking and acceleration. Nevertheless, this is not the limit. They can
recognise the shape and pressure of raindrops on the windshield and activate the wiper instantly.
● Image Recognition: It integrates similar patterns into an informal group as humans do. Google Photos
and tolls that read your licence plates are examples of this technique.
● Natural Language Processing: However, the AI technology attempts to use context to identify a text’s
genre, sentence structures, grammar, people mentioned , etc.
● Speech Recognition: It is an AI technology related to natural language processing. Speech recognition
also makes the use of acoustics and predictive patterns for what sounds usually follow the other in a
given language.
● Personalisation: Personalised experiences on websites and apps do not need to be always AI. However
predesigned instructions can indeed be dealt with by an automated process. Certain parameters are
defined based on a user’s location, search history , etc., to bring a personalised experience , showing
what you would like to see.
Here are some of the most common examples of AI in vogue today.
● Siri: Siri is a friendly voice activated computer that interacts with us on a day to day basis. It helps in
searching for information, giving directions, preparing event calendar, sending messages and so on.
● Alexa: Its utility and unparalleled ability to understand speech from anywhere in the room has made it
a revolutionary product. It can help us to dive into the ocean of information, make purchase, schedule
appointments, set alarms and do thousands of the other things.
● Tesla: Its predictive capabilities, self driving features and sheer technologies smartness have fetched it
many accolades.
● Google Lens: it is an AI supported technology that makes use of the smartphone camera and machine
learning to not only identify an object but also understand what it detects.
● Amazon.com: The AI technology is helping Amazon make enormous amounts of money transactions
online
● Netflix: It studies billions of records to offer films that one might like based on the user’s previous
reactions and choices of films.
● Pandora: It is one of the most innovative technologies that exists in the world today. They term it as
their musical DNA. On the basis of around 400 musical characteristics, each song is manuaaly studied
by a team of professional musicians.
An AI should not only identify but also process the information it has collected. Sensors in the office can
identify shadows and movements, but we cannot term them devices with artificial intelligence. If the sensors
recognise you as a person freezing, and then turn up the heat, it is an AI based device.
HOW DO MACHINES BECOME ARTIFIALLY INTELLIGENT?
With the acquisition of experience and knowledge, humans are becoming more and more intelligent.
For example, in primary school, students learn alphabet and gradually learn to make words with them. As they
grow, they become increasingly fluent in the language and keep learning new words and use them.
They achieve intelligence when they are trained with the information, which helps them perform their
tasks. Artificial Intelligence also keeps updating knowledge to optimise the output.
WHAT IS NOT AI?

Artificial Intelligence is not black magic. It cannot solve anything at the blink of a magician’s eye.
As defined earlier, a machine trained with dta and capable of making decisions/ predictions on its own
can be called AI.
A smart refrigerator can perform on its own, but it needs human involvement to choose the
parameters of cooling and the required preparation for it to perform in a correct manner. Such working of
machines is an example of automation. It is not AI.
2. BASICS OF AI:

INTRODUCTION TO AI RELATED TERMINOLOGIES:

ARTIFICIAL INTELLIGENCE:
It is a technology that makes a machine behaves like human beings. It makes machines act like humans and
make decisions like them. It helps automate machines. Some of the most famous examples of AI include
Amazon Alexa and Google Assistant.
MACHINE LEARNING:
Machine learning is a component of AI. It makes use of the past data to solve a problem with the support of
the statistical methods and trained algorithm model. It learns automatically and improves with every
experience it gains. It learns algorithms based on input labelled or unlabelled data and makes prediction
based on the trained algorithms. One of the best examples of machine learning is Email Spam filter.
DEEP LEARNING:

Deep learning is a sub unit of machine learning. Deep learning is a technique that is used to learn and apply AI.
It makes use of an artificial neural network to make decisions and predict the solution for the given problems.
ANN works just like a natural neural network that works in human brains. An artificial neural network
is an interconnected group of nodes which has been inspired by the network of neurons in our brains.One of
the best example of deep learning is Number Plate Detection.

INTRODUCTION TO AI DOMAINS:
Artificial Intelligence is divided into three separate domains:

⮚ Data Science

⮚ Computer Vision (CV)

⮚ Natural Language Processing (NLP)

Artificial Intelligence

DATA SCIENCE
Data is the core of every AI system. Both of the other AI domains also needs data for their functioning. Data is
also at the core of the general AI systems , as these systems will have the capability of processing data for
learning and growing.
Data Science is the comprehensive field of study. It pertains to data systems and processes that aim at
maintaining data sets and obtaining meaning out of them. Data Scientists make use of a blend of devices,
tools, applications, principles and algorithms to comprehend random data clusters..
Examples: The whole digital marketing spectrum from the display boards on various websites to the digital
billboards at the markets, is decided with the help of the data science algorithms. This is called targeted
advertising.
COMPUTER VISION (CV)

This domain of AI is working towards the development of the AI systems, which will be able to perceive
human worlds as human beings do. There has been sustainable development in this domain, and this
technology is currently being used in several AI based systems.
Examples:

⮚ Face Recognition – Identifying faces in images and videos

✔ Applications, like Google Photos, Snapchat etc.

✔ Social media networks, like Facebook

✔ Law enforcement agencies, like Interpol

⮚ Content Based Image Retrieval (CBIR) – Identifying images based on their composition , color, texture

etc.,

✔ Search engines like Google and Bing

✔ Medical image databases of CT, MRI, etc.,

✔ Scientific databases, like Earth Sciences

⮚ Smart Interactions – A way of providing inputs to the computer systems.

✔ Gaming systems, like Microsoft Kinnect

✔ Games like Emoji Scavenger Hunt

✔ Systems for differently abled individuals

⮚ Environment Perception – Analysing videos, images, or video feeds for identifying patterns and

perceiving the environment

✔ Home security systems

✔ Office security systems

✔ Drone based surveillance systems

✔ Smart vcehicles

NATURAL LANGUAGE PROCESSING

This domain of AI is working towards the creation of the artificial intelligence systems, which will be capable
of communicating with human beings using natural language rather than by syntax or the identification of
keywords. This domain is working for the development of both oral and spoken languages.
COMPONENTS OF NLP:
Natural Language Processing has two main components
1. Natural Language Understanding (NLU) :
It is for understanding spoken or written language. It includes the following:

⮚ Establishing linkage with natural language inputs and what they represent.

⮚ Analysing different aspects of the language

2. Natural Language Generation (NLG) : For producing meaningful phrases and sentences in the form of
natural language. It involves the following:

⮚ Test Planning : Retrieving relevant text from data scores.

⮚ Sentence Planning: Deciding on the correct words, linking them into meaningful phrases etc.,

⮚ Text Realisation: Combining phases and words for forming sentences

Examples: Using NLP , online translators have become capable of translating languages accurately. Now
they are able to produce more grammatically correct results.
SOME OTHER AI RELATED TERMINOLOGIES:

AUTONOMOUS : Vehicles with a level four autonomy do not need a steering wheel or pedals. They do not
need a human to help them operate at their full capacity. In future, if ever a vehicle is developed that can
operate without a driver, without needing to connect to any grid, server, GPS, or another source in order to
give orders to it, it will be said to have achieved a level five autonomy.
ALGORITHM : The algorithm is an essential component of an AI system. Though the algorithms are
mathematical formulas and programming commands, they instruct regular non intelligent computers on how
to solve the problems with AI. They are like rules that instruct computers how to understand things on their
own.
BLACK BOX: When we feed data to an AI system, it runs some algorithms, does some complex mathematics
and gives us a result. For humans to understand the algorithms and complex mathematics, and do this process
manually is an impossible task and would take up a lot of time. When such a scenario takes place, it is known
as black box learning.
NEURAL NETWORKS: Neural networks are created when we want an AI system to work more efficiently. The
engineers have designed these networks, in the same manner, a human nervous system in the brain works.it
makes use of levels of learning to provide AI with the capability to solve severe problems by breaking them
down into different levels of data.
The initial level of the network may only bother about some pixels in an image file and check for similarities in
the other files. After the conclusion of the first stage , the neural network will forward its findings to the next
level. The next level will attempt to understand some more pixels. This process is repeated at every level of a
neural network.
REINFORCEMENT LEARNING: Just like humans , one of the methods through which machines can be taught is
reinforcement learning. This includes giving the AI a task that has not been defined with a specific metric,
such as asking it to ‘improve efficiency’ or ‘ find solutions’.
Rather thean searching for one specific answer , the AI system will explore different scenarios and give results.
These results are then evaluated by humans and judged. The AI system takes feedback and adjusts the next
scenario to attain better outputs.
SUPERVISED LEARNING : When we train an AI system using this type of learning method, we provide the
machine with the correct answer in advance.
Principally, the AI system knows the answer as well as the question. This is one of the most common methods
of training because it produces the most significiant amount of data; it defines patterns among the various
questions and answers.
UNSUPERVISED LEARNING: The most fantastic part of AI research is the fact that the machines are efficient in
learning , and they have been using heaps of data and processing capability to do so.
In unsupervised learning, we do not provide the AI with a predefined answer. Instead of searching for patterns
like ,” why people prefer one brand to another”, we just input a piece of data in the machine so that it can
search whatever patterns it is efficient for.
TRANSFER LEARNING: Once an AI system has learned something like “ how to decide if an image is a dog or
not”, it can continue to build on its knowledge even if we are not asking it to learn anything about dogs.
We could take an AI system that can decide if an image is a dog with 80 % accuracy, hyphothetically
and after it has spent week training on recognising caps, it could then return to its work on dogs with a
considerable enhancement in precision.
TURING TEST: Alan Turing is regarded as the father of modern computing and the designer of the Turing test.
Initially the test was designed as a way of defining if a conversation could fool a human in text display only,
between human and artificial intelligence. However it has since become shorthand for any AI that can fool a
person into trusting they are witnessing or interacting with a real human.
AI ETHICS:

Ethics is defined as the science of moral duty and ideal human behaviour. It teaches us the difference
between right or wrong and acts as a guideline for moral conduct. Even a simple objective system of logical
decisions depends on the capability of human judgement. It is humans who write the code and define success
or failure, who make the important decisions about the uses of systems who are affected by a system’s
outcomes.
The term ‘AI Ethics’ is a blanket term used to deal with all the ethical concerns and issues related to AI
systems.
AI ethics are generally divided into two categories:
1. Concerns related to data(bias and inclusions) used in AI systems.
2. Concerns related to implications of the AI technology
DATA PRIVACY:
Data is the lifeblood of the AI systems. These systems cannot exists without the data. However the collection,
storing and usage of this data raises serious privacy concerns.
Example:

The OS of the Android mobiles comes with many preinstalled Google apps. Some of them are
1. Contact List
2. Location
3. Email
4. Hangout
5. Photos
AI BIAS
The Problem is that AI system learns from the real world data fed into it. This means that the AI systems can
reinforce the biases found in the AI systems. For example, a computer system trained on the data for the last
200 years might find that more females were involved in specific jobs or that more percentage of successful
businesses were established by men and conclude that specific genders are better equipped for handling
certain jobs (gender bias)
Understanding or even detecting such biases is not easy because many AI systems act as black boxes.

THE PROBLEM OF INCLUSION

Consider the example of AI system used by Amazon for recruitment. This created a situation in which many
eligible females were left out of the consideration. This is known as the problem of inclusion.
DATA SCIENCE
Data Science is the domain of computer science where we extract insights from available data with the
help of scientific methods, algorithms and statistics. This data can be structured or unstructured (raw data). So,
data science is the study of vast arrays of structured and unstructured data and their conversion into a format,
which can be easily interpreted by humans.

Data Science includes:

● Visualisation of data
● Working with statistics and analytical methods
● Machine and deep learning.
● Probability analysis and predictive models.
● Neural networks and their application for solving real life problems.

APPLICATIOLNS OF DATA SCIENCE:

Data Science has created a great impact in several sectors, such as medicine , banking, manufacturing,
transportation etc.,

1. Banking:
Data Science has a significant impact on the banking industry. It enables the banks to make
better decisions, as it provides tools for resource management, fraud detection , customer
data management, making predictions about policies and the risks involved.
2. Finance:

The finance industry, just like the banking industry, has used Data Science tools to automate
the process of risk analysis . This has helped make better decisions, which have a low- risk and
high profit value. Data Science is a critical factor in algorithmic trading.
3. Manufacturing:
Industrial robots have taken over the mundane and repetitive activities that are required to
be carried out in the manufacturing units.
4. Transport:

Data Science is proving itself very useful in creating safer driving environments for drivers. It is
also contributing a lot in the field of optimisation of vehicle performance by providing greater
autonomy to drivers.
5. Healthcare:
Data Science is playing a very significant role in the healthcare industry. Using classification
algorithms, physicians are now able to diagnose cancer and tumors at an early stage.
6. E- Commerce:
E-Commerce companies such as Amazon, use a recommendation syatem that recommends the
users various products based on their purchase history.
7. Artificial Conversational Bots:

Data Scientists have developed the speech recognition system that essentially converts human
speech into textual data.

REVISING AI PROJECT CYCLE:

Artificial Intelligence (AI) is one of the interdisciplinary branches of computer science.

AI PROJECT CYCLE:

DATA COLLECTION:

Based on the requirements, the data collection will be done.

TYPES OF DATA:

At the basic level, data can be classified into two broad categories:

i. Numerical data
ii. Textual data

Numerical data involves the use of numbers. This is the data on which we can perform the mathematical
operations such as addition, subtraction, multiplication, etc.,

Textual data is made up of characters and numbers, but we cannot perform calculations on these numbers.
For example, Mickey, Donald, grapes, etc.,
Structural Classification:

Based on structures, data can be classified into three types:

i. Structured Data: This type of data has a predefined model and it is organised in a predefined
manner. Earlier , structures of data were quite simple, and they were often known before the data
model was designed and therefore, data was generally stored in a tabular form of relational
databases.
ii. Unstructured Data: This type of data does not have any pre-defined structure. It can take any form.
Most of the data exists in this form. Videos, Audios, Presentations, emails, documents, etc., are the
best examples of unstructured data.
iii. Semi Structured Data: This type of data has a structure that has the qualities of both structured as
well as unstructured data. This data is not organised logically. Nevertheless, it has some sorts of
markers and tags that give it an identifiable structure.

DATA FORMATS USED IN DATA SCIENCE:

In the case of data science, data is typically stored in tables. Some of the most common formats are:

i. CSV : Stands for Comma Separated Values. It is the basic file fomat for storing tabular data. Since
record values are separated by comma, therefore they are known as CSV files.
ii. Spreadsheet: A Spreadsheet is a sheet of paper or a machine that uses rows and columns to log
data and to submit information. Microsoft Excel is a software that helps to build spreadsheets.
iii. SQL: SQL is defiend as Structured Query Language . It is a domain specific language that is used
in programming. It is designed to handle data that is stored in various types of DBMS.

DATA ACCESS:

import math as mt

from math import *

We can access the factorial() using mt.factorial()

LIST OF SOME PYTHON LIBRARIES:

⮚ NumPy means Numerical Python. NumPy comprises essential linear algebraic functions,

Fourier transforms and advanced random number capabilities.

NumPy arrays are alternatives to Python Lists. NumPy arrays are faster, easier to work with and give
the users the functionality to perform calculations across entire arrays.

⮚ Matplotlib is a library used for plotting a wide variety of graphs, from hsitograms to line plots. Plots
help to understand trends and patterns. They act as instruments for reasoning about quantitative
information.
⮚ Pandas is an open source python library that is built on top of NumPy and is used for structured

data operations and manipulations. This library is exextensively used for data cleaning and
preparation.

⮚ SciPy means Scientific Python. SciPy has been developed on NumPy. It is one of the most prevalent

libraries used for advanced level science and engineering modules, such as discrete fourier transform
Linear Algebra, Optimisation and Sparse matrices.

⮚ Statsmodels is meant for statistical modelling. Statsmodels is a Python module that helps in exploring

dta , estimating statistical models and conducting statistical tests.

⮚ Seaborn is developed for statistical data visualisation. It is a library developed for creating

attractive and informative statistical graphics in Python.

NumPy Arrays Lists

1. Homogenous 1. Heterogenous
2. Cannot be directly initialised. Can only be 2. Can be directly initialised as it is a part of

operated with the NumPy package. Python syntax.

3. Direct numerical operations can be 3. Direct numerical operations are not
performed. possible.
4. Used for arithmetic operations 4. Used for data management.
5. Arrays take up less memory 5. Lists take up more memory
6. Functions like concatenation, appending 6. Functions like concatenation, appending
etc., are not possible with arrays etc., are possible with Lists

BASIC STATISTICS WITH PYTHON:

Statistical learning is a structure developed for understanding statistics based data. Such data can be
classified as supervised or unsupervised.

Supervised statistical learning includes creating a statistical model for predicting or estimating an output
based on one or more inputs.

Unsupervised statistical learning involves inputs but without supervising output. However we can learn
relationships and structure from such data.

Statistical learning presents latent data relationships i.e. relationships between the dependent and
independent data.
MEAN => the average value of a data set.
MEDIAN => the value that coincides with the middle of the data set
MODE => the value that appears the most frequently in our data
STANDARD DEVIATION => summarizes how much your data differs from the mean
VARIANCE => the square of the standard deviation
DATA VISUALISATION:

Data visualisation is a popular technique, used in applied statistics and machine learning.

During data collection, some errors can occur with the data.

1. Enormous Data:
The data can be enormous in two ways:
● Incorrect Values:
The dataset values are incorrect(at random location). For example, in a list of
phone numbers, there is a comma in the column of the Phone number or a marks-
list of a class prepared by a teacher, you find names in the column of the marks etc.
● Invalid / Null Values: Sometimes in a dataset, the values get corrupted in some places
and thereby making them invalid. Also, NaN (Not a Number) values are often found in
the dataset. These are null values that have no meaning and cannot be processed. This
is why these values are deleted from the data
2. Data Missing : Some cells remain empty in certain datasets. These cells lack values, and the cells,
therefore, remain vacant. Lack of information cannot be interpreted as an error because here,
the values are not erroneous because of mistakes.
3. Outliers: Data not included in the range of a particular element are known as outliers. Let us take an
example of members of a donation club to understand this better. It is not easy to analyse the
collected data because many tables and figures are involved.

Data visualisation can prove handy when we are exploring and trying to understand a dataset. It can
help in identifying patterns, corrupt data, outliers and much more.

There are five essential plots that we need to know well for basic data visualisation.

● Line Plot
● Bar Chart
● Histogram Plot
● Box and Whisker Plot
● Scatter Plot

LINE PLOT:
A line plot is used to show observations gathered at regular intervals.
The X – axis shows the regular interval, such as time. The Y-axis shows the observations, ordered by the
x-axis and linked by a line.
A line plot can be built by calling the plot() function and passing the X-axis data for the regular interval,
and Y-axis for the observations.
Example:

# code to create a line plot

pyplot.plot(x,y)

BAR CHART:
A bar chart represents relative quantities for multiple categories.

The X – axis shows the categories that are spaced evenly. The Y –axis shows the quantity for each
category and is drawn as a bar from the baseline to the required level on the Y-axis.
We can prepare a bar chart calling the bar() function and then passing the category names for the X-
axis and the quantities for the Y-axis.
Example:

# code to create a bar chart

pyplot.bar(x,y)
HISTOGRAM PLOT:
Histogram Plots are used for summarising the distribution of a data sample.
The X-axis shows distinct bins or intervals for the observations. For example, the observations with
values between 1 and 10 can be divided into five bins, the values[1,2] would be assigned to the first bin,[3,4]
would be assigned to the second bin , and so on.
The Y-axis shows the frequency or rally of the number of observations in the dataset that has been
assigned to each bin.
A histogram plot can be designed by calling the hist() function and passing a list or array that shows the
data sample.
Example:

# code to create a histogram plot

pyplot.hist(x)
BOX AND WHISKER PLOT:
A box and Whisker plot (boxplot) is usually used for summarising the distribution of a data sample.
The X-axis shows the data sample, where multiple boxplots can be created beside one another on the
X- axis. The Y –axis shows the observation values.
A box is created to summarise the middle 50% of the dataset, starting at the observation at the 25th
percentile and concluding at the 75th percentile. This is known as the interquartile range or IQR. The median,
or the 50th percentile is created with a line.

Lines called whiskers are created by extending from both ends of the box calculated as (1.5 x IQR) to
demonstrate the expected range of sensible values in the distribution. Observations outside the whiskers
might be outliers and are drawn with small circles.
Boxplots can be created by calling the boxplot() function, passing in the data sample as an array or list.

Example:
#code to create a box and whisker plot
pyplot.boxplot(x)
SCATTER PLOT:
A scatter plot usually summarises the relationship between two paired data samples.
The paired data samples imply that the two measures were recorded for a particular observation, such
as the weight and height of a person.
The X-axis shows the observation values for the first sample, and Y-axis shows the observed values for
the second sample. Each point on the plot shows a single observation.
We can create scatter plots using scatter() function and passing two data sample arrays.

Example:
# code to create a scatter plot
pyplot.scatter(x,y)

It shows the relationship between two variables.

K NEAREST NEIGHBOUR MODEL
PERSONALITY PREDICTION:
The term personality has been taken from the Latin word ‘persona’ which refers to the mask worn by
actors in a theatre. Personality is a set of qualities that characterises an individual and it involves emotions,
behaviour, temperament and mind.
In the study of the personality structure, the role of the Big Five Model or Five Factor Model is
significant.
BIG FIVE MODEL:

The model defines a personality structure split into five elements called OCEAN.
The Big Five qualities are explained as follows:
● Openeness: An individual can accept new things. Individuals who belong to this category use
social media quite frequently.
● Conscientiousness: it indicates individuals who are particular, watchful, punctual, thorough and
organised. Such persons use social media scarcely. They believe that most social media sites cause
distraction.
● Extroversion: it indicates the adventurous , sociable and talkative nature of the individuals.
Such individuals have many friends outside the virtual environments.
● Agreeableness: It shows friendliness of people towards each other. Surveys show that people with
low levels of agreeableness might have a large number of online contacts, but do not find it easy to
begin and maintain friendships outside the virtual environment.
● Neuroticism: it relates to an individual having control over the emotions. Such individuals use the
internet because they need a means of reducing loneliness.
HOW CAN AI CAPTURE THE DATA?
● Eye tracking sensors: Several eye tracking technologies that capture the eye movement data are
available in the market. These technologies capture the data of eye as well as body movement
because there is a direct correlation between the eye and body movement.
● Smartphones: Revolution in the field of smartphones is also helping in personality identification
indirectly. It has become straightforward to capture eye movements with smartphones.
● Prediction based algorithms: Various prediction based algorithms are extensively used in eye
tracking detection systems on smartphones.
● Machine learning: Machine learning plays an essential role in defining personality. The first
step involves capturing the relevant data and the second involves its processing.
K NEAREST NEIGHBOUR:
K Nearest Neighbors (KNN) has closeness with an algorithm based on learning. It achieves learning by
comparing a particular test tuple with training tuples. Training tuples are shown by n attributes. Each tuple
creates a point in an n-dimensional space.
K-Nearest Neighbors is a machine learning method and algorithm that we use in both regression and
classification tasks.
KNN steps: (Example)

1. Take a dataset with the known categories.

We need to collect data. Right now it is unsorted and termed as raw data. In this example, the data is
clearly categorised with Rabbits and tortoises.

2. Cluster the data

Clustering the data is entirely how you want it to be.

3. Now add a cell with an unknown category.

4. Find the “k”
Finding the value of k.
The square root of n (the number of items in the data set.

= √𝑛
= √8
= 2.82
≅3
5. Locate the “k” nearest neighbours.

6. Classify the new point

The new point is classified by a majority vote. Since most of the neighbours are rabbits , there is a high
chance that the unknown entity is a rabbit too. So the new point is classifies as a rabbit.
KNN ADVANTAGES AND DISADVANTAGES:
Advantages:
● KNN can be useful in bothe regression and classification tasks in contrast to some other
supervised learning algorithms.
● KNN is exceptionally accurate and easy to use. It is simple to interpret, understand and implement.
● KNN does not make any assumptions about the data. It can be used for a broad range of problems.
Disadvantages:
● KNN has to keep most or all of the data which requires a lot of storage space and it is computationally
expensive. Large datasets can also cause predictions to take longer time.
● KNN is very sensitive to the scale of the dataset. It can be junked by irrelevant features quite easily
in comparison to the other models.
COMPUTER VISION
Computer Vision (CV) is a field of study that attempts to develop techniques to enable computers to
“see” and comprehend the content of digital images, like photographs and videos.

CV is a multidisciplinary field. It can be considered as a subfield of Artificial Intelligence (AI) and

machine learning, which may encompass the use of dedicated techniques and uses general learning
algorithms.

APPLICATIONS OF COMPUTER VISION:

Automotive
Autonomous vehicles are the future of commuting. It is expected to bring down the number of
accidents as the possibility of human error is minimized.

Healthcare
Computer vision is being used extensively in various parts of our healthcare systems. Healthcare
depends a lot on imaging, extracting information, and recognizing trends from images.

Insurance
The insurance industry is one of the highest human-intensive industry. The insurance surveyor has to
physically inspect the damages to release or reject the claim made.

Retail
The retail industry leverages computer vision technologies to a great extent. Retail giants like Amazon
have been using computer vision technologies for automatic billing. In their store, Amazon Go, customers do
not have to stand in line for billing. They are tracked with cameras that can identify which customer has
picked up what item or kept back.

Agriculture
The agriculture industry is recently witnessing a lot of technological advancements. This is owing to the
recent rise in demand for food and crops as a whole. With mass farming, it is a challenge to manage
everything manually. Thus, the agriculture industry is making use of computer vision. Farming activities such
as harvesting and weeding are being conducted with the help of CV.

Banking
The banking industry uses CV extensively these days with the rise in fraud and counterfeit currency
cases. The banking system uses AI-based solutions to identify counterfeit currency being infused into the
system at the customer touchpoints.

Manufacturing
The manufacturing industry uses computer vision mainly for quality control of the finished goods.
This can range from clothing, shoes, furniture, automobiles, FMCG products, electronics, etc. CV can easily
detect defects that the naked eye cannot. This helps in manufacturing the highest quality products.

COMPUTER VISION VS HUMAN VISION:

To identify visual information, humans possess a “superbly efficient tool” – the natural vision. The
capabilities of machines in this regard are still inadequate.
Computers are able to see with the help of pixels. Pixels are tiny blocks that represent brightness and
colours through certain numbers.
CONCEPTS OF COMPUTER VISION:
CV is a field of study that allows the computers to imitate the human vision system. It is a branch of AI
that gathers information from digital photos or videos and processes them to define their features.
The whole process involves the following stages: Image acquiring, screening, analysing, identifying and
extracting information.
CV projects transform digital visual content into explicit content to collect multi dimensional data.
UNDERSTANDING CV CONCEPTS:
The various applications of computer vision are based on a certain number of tasks that are performed
to get certain information from the input images, which can either be directly used for prediction or form the
base for further analysis.

CLASSIFICATION:
Image classification assists in classifying a given image as if it belongs to one of a set of predefined
categories. This is the fundamental step in CV that, despite its simplicity, has a large variety of practical
applications.
CLASSIFICATION + LOCALISATION:
This is the task that involves both the processes of identifying what object is present in the image and
at the same time identifying at what location that object is present in that image. It is used only for single
objects.
OBJECT DETECTION:
Object detection is the process of finding instances of real world objects, such as faces, bicycles and
buildings in images or videos. Object detection algorithms typically use extracted features and learning
algorithms to recognise the instances of an object category. It is commonly used in applications, such as image
retrieval and automated vehicle parking system.
INSTANCE SEGMENTATION:
Instance segmentation is the process of detecting instances of the objects, giving them a category and
then giving each pixel a label on the basis of the assigned category. A segmentation algorithm takes an image
as input and outputs a collection of regions (or segments)
IMAGES AND PIXELS:
An image consists of a rectangular array of dots called pixels. The size of the image is usually specified
as width x height, in the number of pixels. The physical size of an image , in inches or centimetres, depends on
the resolution of the device on which the image is displayed.
Pixels are the essential building blocks of any digital image. A pixel is a colour or light value that
occupies a definite place in an image. The term bitmap comes from the computer programming terminology,
meaning just a map of bits, a spatially mapped array of bits.
An image having a resolution of 1024 x 768 is a grid that has 1024 columns and 768 rows. Therefore it
will have 1024x768 = 786,432 pixels.
PIXEL VALUE:
A set of numbers is used to represent a pixel. We call this range of numbers as the colour depth or bit
depth. An 8 bit depth utilises the numbers 0-255 for each colour channel in a pixel.
GRAYSCALE AND RGB IMAGES:
Most pixels exist in two forms: Grayscale and colour. In a grayscale image, each pixel has a single value
representing the light value, with 0 being black and 255 being white. Most colour pixels have three values
representing Red, Green and Blue (RGB) some other non- RGB representation schemes also exist, but RGB is
the most used format.
The three colors are each represented by one byte, or a value from 0 to 255, which indicates the
amount of the given colour.
IMAGE FEATURES:
A feature is a piece of information relevant for undertaking the computational task associated with a
specific application. We can classify features into two categories:
● Features that are in specific locations in images, such as mountain ranges, structure corners, windows

or precisely shaped patches of snow. Such kinds of features are generally called keypoint features and
usually determined by the appearance of patches of pixels surrounding the point location.
● Features that can be correlated on the basis of their orientation and local appearance are known as
edges and they can also be good indicators of the object boundaries and occlusion events in the image
sequence.
OPEN CV:
OpenCV is an open source computer vision and machine learning software library. OpenCV was
created to offer a shared infrastructure for CV applications ans speed up the use of machine perception in the
commercial products.
The OpenCV library has thousands of algorithms. One can easily detect and identify the oblects using
these algorithms.
Some of the other actions that can be easily performed using these algorithms are:

⮚ Red eye correction from digital pictures.

⮚ Merge images seamlessly to create a high resolution images.

⮚ Identify scenery

⮚ Track moving objects

⮚ Drag 3D models of objects

⮚ Assemble 3D point clouds from stereo cameras.

CONVOLUTION OPERATOR:
Convolution is a common technique, used for image processing. It is a general filtering effect for
images.
It is a mathematical operation, which provides a way of multiplying two matrices of numbers, which
usually have different sizes but always have the same dimensionality. The output image that we get is a
filtered image.
CONVOLUTION METHOD:
Convolution method is performed on two functions (f and g) to produce a third function and is used in
signal and image processing. For speech processing, it is operated in 1D(single Dimension), for image
processing in 2D(Two Dimension) and for video processing in 3D (three dimension).
Representation:

Visual Representation:
Scan the given code to see a video representation of the convolution process.
Mathematical representation:
g(x,y) = h(x,y) * f(x,y) ( When mask convolved with an image)
Or
g(x,y) = f(x,y) * h(x,y) (When image convolved with mask)

Here, (*) is known as the ‘Convolution Operator’. It is commutative in narure and can be represented in two
ways. Also, the function h(x,y) is the mask or filter.
DEFINING THE MASK:

From what is discussed above we can see that’ the mask’ is a signal as well. In a 2D matrix, it can be
represented in the order of

1x1 3x3 5x5 7x7

The reason behind the representation is odd number is to find the mid of the mask.
LEARNING TO PERFORM CONVOLUTION OF AN IMAGE:
UNDERSTANDING CONVOLUTION THROUGH ILLUSTRATION:
Let us begin at Step 1, that is, by flipping the mask.
MASK:
Take a mask as one given below:

2 3 4

5 6 7

8 9 10

Now, first flip it horizontally, once.

4 3 2

7 6 5

10 9 8

And , then flip it vertically , once.

10 9 8

7 6 5

4 3 2

IMAGE:
Now, the image at hand can be like this.

3 5 7

9 11 13

15 17 19

CONVOLUTION:
First, the centre of the mask is positioned at each element of the original image.
Then , the related elements are first multiplied and then added together.
The resultant elements are pasted onto the elements of the original image.
10 9 8

7 6 5

4 3 2

3 5 7

9 11 13

15 17 19

The box in red colour represents the mask, and the values in it represent the values of the mask.
The Other box represents the image. Now, for the first pixel of the image, the value will be calculated as,

First Pixel = (36) +(55)+(93)+(112)

= 18+25+27+32 = 92

Now, we must place 92 in the original image at the first index and repeat this procedure for each pixel of the
image, to get our final image.
CONVOLUTION NEYRAL NETWORK (CNN):
Convolutional Neural Network (CNN) is a branch of Deep learning with some added operations. CNN
has been found to achieve extraordinary accuracy in the image associated functions.
CONVOLUTION LAYER:
While the convolution process is performed, the input image pixels are changed by a filter. This is just
a matrix (smaller than the original pixel matrix) with which we multiply the different pieces of the input image.
The output, which is also known as Feature Map, is generally smaller than the original image and is
fundamentally more informative.
RECTIFIED LINEAR UNIT FUNCTION:
ReLU , the acronym that stands for the Rectified Linear Unit, is an easy function to introduce non –
linearity into the feature map. All the negative values are changed to 0, removing all blacks from the image.
The formal function is y = max(0,x)
POOLING LAYER:
The size of the feature map(s) is further reduced by a factor of whatever size is pooled in the process.
FULLY CONNECTED LAYER:
The final layer in the CNN is the Fully Connected Layer (FCP)
TESTING CNN:

The following phases are required to create a CNN model

Machine learning algorithms determine model construction. After model construction comes model
training. The model is trained with the help of training data and the expected output for this data.
After the model has been trained, it becomes possible to execute the model testing.
Atlast, the saved model is ready for use in the real world. This phase is known as model evaluation.
CLASS 10 – Artificial Intelligence

EVALUATION
INTRODUCTION:

An efficient evaluation model proves helpful in selecting the most suitable modelling method that
would represent our data. Therefore, the evaluation stage is an essential part of the process of selecting an
appropriate model for a project.

WHAT IS EVALUATION?

Evaluation is the method used for understanding the credibility of an aI model on the basis of the
outputs by providing a test dataset as input into the model and then making comparisons with the
actual answers.

Overfitting is an undesirable machine learning behavior that occurs when the machine learning model
gives accurate predictions for training data but not for new data. When data scientists use machine learning
models for making predictions, they first train the model on a known data set. Then, based on this
information, the model tries to predict outcomes for new data sets. An overfit model can give inaccurate
predictions and cannot perform well for all types of new data.

Underfitting is another type of error that occurs when the model cannot determine a
meaningful relationship between the input and output data. You get underfit models if they have not
trained for the appropriate length of time on a large number of data points.

MODEL EVALUATION TERMINOLOGIES

THE SCENARIO

In order to understand the effectiveness of this model, it is necessary to see if the predictions that it
makes are correct or not. There are two conditions that we need to consider : Prediction and Reality.

Prediction is the output produced by a machine and Reality is the actual scenario in that particular
place when the prediction has been made. Let us consider the different combinations that we can experience
in these two conditions.
Case 1: Is there a forest fire?

PREDICTION REALITY
TRUE POSITIVE
YES YES
● The predicted value matches the actual value
● The actual value was positive and the model predicted a positive value

Case 2: Is there a forest fire?

PREDICTION REALITY
TRUE NEGATIVE
NO NO
● The predicted value matches the actual value
● The actual value was negative and the model predicted a negative value
Case 3: Is there a forest fire?

● The predicted value was falsely predicted

● The actual value was negative but the model predicted a positive value
● Also known as the Type 1 error

Case 4: Is there a forest fire?

● The predicted value was falsely predicted

● The actual value was positive but the model predicted a negative value
● Also known as the Type 2 error
CONFUSION MATRIX:

The outcome of the comparison between the prediction and reality can be recorded in the confusion
matrix. In machine learning and particularly in the problem of statistical classification, the confusion matrix,
also called the error matrix,.

A confusion matrix is a table commonly used to define the performance of a classification model
(classifier) on a set of test data by using the true known values. It helps in the visualisation of the functioning
of an algorithm.

REALITY
MATRIX
YES NO

YES TP FP
PREDICTION
NO FN TN

The confusion matrix represent the manners in which a classification model is confused while
making the predictions. It informs us not only about the errors being made by a classifier but also about the
types of errors that are being made.
DEFINITION OF THE TERMS:

Positive(P): Observation is positive (Example: That is car)

Negative (N): Observation is not positive (Example: That is not a car)

True Positive (TP): Observation is positive and predicted to be positive.

False Negative (FN): Observation is positive but predicted negative.

True Negative (TN): Observation is negative and predicted to be negative.

False Positive (FP): Observation is negative but predicted positive.

EVALUATION METHODS:

ACCURACY:

Accuracy is defined as the percentage of correct predictions out of all the observations. A prediction
is said to be correct if it matches reality. Here we have two conditions in which the Prediction matches with
the Reality, i.e., True Positive and True Negative. Therefore, Formula for Accuracy is

PRECISION:

Precision is the percentage of true positive cases out of all the cases where the prediction is true. It
considers the True Positive cases and False Positive cases.
RECALL:

Recall is the ratio of the total number of correctly classified positive examples and the total number of
positive examples. High recall value shows that the class is correctly recognised (a small number of FN)

High recall, low precision:

This implies that most of the positive examples are correctly recognised (low FN) but there are many
false positives.

Low recall, high precision:

This implies that we miss many positive examples (high FN) , but those we predict as positive are, in
fact, positive (low FP).

Example where High Accuracy is not usable.

SCENARIO: An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model
evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of
99.99%.

Explanation: A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better
than chance. In some settings, however, the cost of making even a small number of mistakes is still too high.
99.99% accuracy means that the expensive chicken will need to be replaced, on average, every 10 days. (The
chicken might also cause extensive damage to cars that it hits.)

Example where High Precision is not usable.

Example: “Predicting a mail as Spam or Not Spam” False Positive: Mail is predicted as “spam” but it is “not
spam”. False Negative: Mail is predicted as “not spam” but it is “spam”. Of course, too many False Negatives
will make the spam filter ineffective but False Positives may cause important mails to be missed and hence
Precision is not usable.
F1 SCORE:

The F1 Score, also called the F score or F measure, is a measure of a test’s accuracy. It is calculated
from the precision and recall of the test, where the precision is the number of correctly identified positive
results divided by the number of all positive results, including those not identified correctly, and the recall is
the number of correctly identified positive results divided by the number of all samples that should have been
identified as positive. The F1 score is defined as the weighted harmonic mean of the test’s precision and
recall. This score is calculated according to the formula.

Necessary:

F-Measure provides a single score that balances both the concerns of precision and recall in one number. A
good F1 score means that you have low false positives and low false negatives, so you’re correctly identifying
real threats, and you are not disturbed by false alarms. An F1 score is considered perfect when it’s 1, while the
model is a total failure when it’s 0. F1 Score is a better metric to evaluate our model on real-life classification
problems and when imbalanced class distribution exists.

Python Important Questions AI and ML For Sem II Osmania University
No ratings yet
Python Important Questions AI and ML For Sem II Osmania University
95 pages
Unit 3 Notes of Operating Systems With Linux For B.SC AI & ML Sem III
No ratings yet
Unit 3 Notes of Operating Systems With Linux For B.SC AI & ML Sem III
14 pages
Bcom I Sem FIT Practical Record Q & Answers 1 50 ABED Sir
No ratings yet
Bcom I Sem FIT Practical Record Q & Answers 1 50 ABED Sir
85 pages
Cyber Security Vi Semester
No ratings yet
Cyber Security Vi Semester
30 pages
Programming in Java
No ratings yet
Programming in Java
64 pages
Latest Log
No ratings yet
Latest Log
12 pages
Creating and Using A USB Recovery Drive For Surface - Microsoft Support
No ratings yet
Creating and Using A USB Recovery Drive For Surface - Microsoft Support
9 pages
Fischer Connectors Cable Assembly Instructions Core Series Multipole Low Voltage 107
100% (1)
Fischer Connectors Cable Assembly Instructions Core Series Multipole Low Voltage 107
4 pages
AirplanePatched1 3
No ratings yet
AirplanePatched1 3
3 pages
Routing BINUS
No ratings yet
Routing BINUS
15 pages
3BTG811796-3054 Functional Description - Valve01
No ratings yet
3BTG811796-3054 Functional Description - Valve01
28 pages
Intel-WGI217V S LJWH-datasheet
No ratings yet
Intel-WGI217V S LJWH-datasheet
379 pages
Library Database Assignment
No ratings yet
Library Database Assignment
6 pages
Total Station GTS 102 N User Manual
No ratings yet
Total Station GTS 102 N User Manual
172 pages
EST - Bubble Sheet - Math No Calc PDF
No ratings yet
EST - Bubble Sheet - Math No Calc PDF
2 pages
Baumer PBMN-Flush DS EN 1304 PDF
No ratings yet
Baumer PBMN-Flush DS EN 1304 PDF
4 pages
Mic's Game Stash
No ratings yet
Mic's Game Stash
21 pages
CISSP Asset Security Guide
No ratings yet
CISSP Asset Security Guide
34 pages
Hardlock 13.5 Quick Start Guide
No ratings yet
Hardlock 13.5 Quick Start Guide
10 pages
Lab Exercise 2 - Reporting and Dashboards
No ratings yet
Lab Exercise 2 - Reporting and Dashboards
22 pages
Divum Mock Test Questions
No ratings yet
Divum Mock Test Questions
10 pages
NET3001 4 AdvAsm
No ratings yet
NET3001 4 AdvAsm
43 pages
Ackreceipt GSTN
No ratings yet
Ackreceipt GSTN
1 page
02.test Procedure Test Plans v1.1
No ratings yet
02.test Procedure Test Plans v1.1
41 pages
Basics of 3d Printing
No ratings yet
Basics of 3d Printing
66 pages
1330nm TOSA RF & Eye Margin Results
No ratings yet
1330nm TOSA RF & Eye Margin Results
5 pages
Metainfo 2
No ratings yet
Metainfo 2
4 pages
Setting MAVAcH
No ratings yet
Setting MAVAcH
2 pages
It Directory 2012 13xlsx
No ratings yet
It Directory 2012 13xlsx
3,216 pages
GRC Analyst Job Description
No ratings yet
GRC Analyst Job Description
1 page
MENDIOLA FINALReflection
No ratings yet
MENDIOLA FINALReflection
2 pages
BDA Mid Jntuh r18
No ratings yet
BDA Mid Jntuh r18
4 pages
2024-10-23 Grasen Quotation For DC Charger With CCS2+GBT
No ratings yet
2024-10-23 Grasen Quotation For DC Charger With CCS2+GBT
1 page
SQL Practical File for BBA Students
No ratings yet
SQL Practical File for BBA Students
51 pages
Rules Refine The Riddle - Global Explanation For Deep Learning-Based Anomaly Detection in Security Applications
No ratings yet
Rules Refine The Riddle - Global Explanation For Deep Learning-Based Anomaly Detection in Security Applications
15 pages