AML - Unit -1
AML - Unit -1
SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to
become more accurate at predicting outcomes without being explicitly programmed to do so.
Machine learning algorithms use historical data as input to predict new output values.
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.’
Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture.
Speech Recognition:
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms are
widely used by various applications of speech recognition. Google assistant, Siri, Cortana, and
Alexa are using speech recognition technology to follow the voice instructions.
Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with
the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
3171617 1
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from
the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such as
Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies,
etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company is
working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning. Below are some spam filters used
by Gmail:
Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.
3171617 2
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the
name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and decode it
using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern
which gets change for the fraud transaction hence, it detects it and makes our online transactions
more secure.
Supervised learning – Also called predictive learning. A machine predicts the class of unknown
objects based on prior class related information of similar objects.
Unsupervised learning – Also called descriptive learning. A machine finds patterns in unknown
objects by grouping similar objects together.
Reinforcement learning – A machine learns to act on its own to achieve the given goals.
classification, if it classifies
correctly it will get rewarded
otherwise it will get punished.
Labeled data is used. Unlabelled data is used. If it classifies correctly it will get
rewarded otherwise it will get
punished.
Model performance can be Hard to find the Model Model is evaluated on basis of
evaluated by number of Performance. penalty / Reward Function.
misclassifications done and by
comparing actual and predicted
values.
There are two types of There are two types of No such types.
Supervised Learning. Unsupervised Learning
Classification and Regression. Clustering and Association.
Simplest one to understand. Bit hard to Understand than Complex to understand and
Supervised Learning. apply.
Ex: Naïve Bayes,KNN,SVM.. Ex:k-means,PCA Ex:Sarsa ,Q- Learning
Human learning process varies from person to person. Once a learning process is set into the
minds of people, it is difficult to change it. But, in Machine Learning (ML), it is easy to change the
learning method by selecting a different algorithm. In ML, we have well defined processes to
understand and estimate the accuracy in learning. Estimation of human learning is usually done
through examinations and it cannot be considered as a measure of intelligence.
Humans acquire knowledge through experience either directly or shared by others. Machines
acquire knowledge through experience shared in the form of past data.
Both humans and machines make mistakes in applying their intelligence in solving problems. In
ML, overfitting memorizes all examples and an overfitted model lacks generalization and it fails to
work on never seen before examples.
Skill is a manifestation of intelligence possessed by humans. And intelligence is the ability to apply
knowledge. Human intelligence sustains, but his knowledge fades as new technologies emerge.
Humans without knowledge in particular subjects can apply their intelligence to solve problems in
new domains. But machines can solve new problems only if their intelligence has been updated
with retraining on data acquired from the changed scenarios. This is a fundamental difference
between human intelligence and machine intelligence.
In ML, Transfer Learning is a technique that reuses a model that was created by machine learning
experts and that has already been trained on a large dataset. Transfer learning leverages
information extracted from one set of distributions. In humans, transfer of knowledge to students is
often done by teachers and tuition providers. This may not make the students intelligent. But in the
3171617 4
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
case of machine learning, transfer learning makes the transferee as intelligent as the transferor. In
the case of humans, transfer learning only transfers the knowledge and it depends on the inherent
intelligence of the transferee to enhance his/her problem solving skills.
Although machine learning is being used in every industry and helps organizations make more
informed and data-driven choices that are more effective than classical methodologies, it still has
so many problems that cannot be ignored. Here are some common issues in Machine Learning
that professionals face to inculcate ML skills and create an application from scratch.
The major issue that comes while using machine learning algorithms is the lack of quality as well
as quantity of data. Although data plays a vital role in the processing of machine learning
algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting the machine learning algorithms. For example, a simple task requires
thousands of sample data, and an advanced task such as speech or image recognition needs
millions of sample data examples. Further, data quality is also important for the algorithms to work
ideally, but the absence of data quality is also found in Machine Learning applications. Data quality
can be affected by some factors as follows:
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision as well as
accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained in machine
learning models. Hence, incorrect data may affect the accuracy of the results also.
o Generalizing of output data- Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.
As we have discussed above, data plays a significant role in machine learning, and it must be of
good quality as well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less
accuracy in classification and low-quality results. Hence, data quality can also be considered as a
major common problem while processing machine learning algorithms.
To make sure our training model is generalized well or not, we have to ensure that sample training
data must be representative of new cases that we need to generalize. The training data must
cover all cases that are already occurred as well as occurring.
3171617 5
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
Further, if we are using non-representative training data in the model, it results in less accurate
predictions. A machine learning model is said to be ideal if it predicts well for generalized cases
and provides accurate decisions. If there is less training data, then there will be a sampling noise
in the model, called the non-representative training set. It won't be accurate in predictions. To
overcome this, it will be biased against one class or a group.
Hence, we should use representative data in training to protect against being biased and make
accurate predictions without any drift.
Overfitting:
Overfitting is one of the most common issues faced by Machine Learning engineers and data
scientists. Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the performance
of the model. Let's understand with a simple example where we have a few training data sets such
as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a considerable
probability of identification of an apple as papaya because we have a massive amount of biased
data in the training data set; hence prediction got negatively affected. The main reason behind
overfitting is using non-linear methods used in machine learning algorithms as they build non-
realistic data models. We can overcome overfitting by using linear and parametric algorithms in the
machine learning models.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with
fewer amounts of data, and as a result, it provides incomplete and inaccurate data and destroys
the accuracy of the machine learning model.
Underfitting occurs when our model is too simple to understand the base structure of the data, just
like an undersized pant. This generally happens when we have limited data into the data set, and
3171617 6
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
we try to build a linear model with non-linear data. In such scenarios, the complexity of the model
destroys, and rules of the machine learning model become too easy to be applied on this data set,
and the model starts doing wrong predictions as well.
As we know that generalized output data is mandatory for any machine learning model; hence,
regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.
A machine learning model operates under a specific context which results in bad
recommendations and concept drift in the model. Let's understand with an example where at a
specific time customer is looking for some gadgets, but now customer requirement changed over
time but still machine learning model showing same recommendations to the customer while
customer expectation has been changed. This incident is called a Data Drift. It generally occurs
when new data is introduced or interpretation of data changes. However, we can overcome this by
regularly updating and monitoring data according to the expectations.
Although Machine Learning and Artificial Intelligence are continuously growing in the market, still
these industries are fresher in comparison to others. The absence of skilled resources in the form
of manpower is also an issue. Hence, we need manpower having in-depth knowledge of
mathematics, science, and technologies for developing and managing scientific substances for
machine learning.
8. Customer Segmentation
Customer segmentation is also an important issue while developing a machine learning algorithm.
To identify the customers who paid for the recommendations shown by the model and who don't
3171617 7
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
even check them. Hence, an algorithm is necessary to recognize the customer behavior and
trigger a relevant recommendation for the user based on past experience.
The machine learning process is very complex, which is also another major issue faced by
machine learning engineers and data scientists. However, Machine Learning and Artificial
Intelligence are very new technologies but are still in an experimental phase and continuously
being changing over time. There is the majority of hits and trial experiments; hence the probability
of error is higher than expected. Further, it also includes analyzing the data, removing data bias,
training data, applying complex mathematical calculations, etc., making the procedure more
complicated and quite tedious.
Data Biasing is also found a big challenge in Machine Learning. These errors exist when certain
elements of the dataset are heavily weighted or need more importance than others. Biased data
leads to inaccurate results, skewed outcomes, and other analytical errors. However, we can
resolve this error by determining where data is actually biased in the dataset. Further, take
necessary steps to reduce it.
This basically means the outputs cannot be easily comprehended as it is programmed in specific
ways to deliver for certain conditions. Hence, a lack of explainability is also found in machine
learning algorithms which reduce the credibility of the algorithms.
This issue is also very commonly seen in machine learning models. However, machine learning
models are highly efficient in producing accurate results but are time-consuming. Slow
3171617 8
DR. SUBHASH UNIVERSITY
School of Engineering & Technology
Department of Information Technology
programming, excessive requirements' and overloaded data take more time to provide accurate
results than expected. This needs continuous maintenance and monitoring of the model for
delivering accurate results.
Although machine learning models are intended to give the best possible outcome, if we feed
garbage data as input, then the result will also be garbage. Hence, we should use relevant
features in our training sample. A machine learning model is said to be good if training data has a
good set of features or less to no irrelevant features.
3171617 9