Learning Algorithms
Learning Algorithms
●
●
●
Well Posed Learning Problem – A computer program is said to learn from experience E in context to some task T and some performance measure P, if its
performance on T, as was measured by P, upgrades with experience E.
Any problem can be segregated as well-posed learning problem if it has three traits –
● Task
● Performance Measure
● Experience
Certain examples that efficiently defines the well-posed learning problem are –
●
●
●
According to Arthur Samuel “Machine Learning enables a Machine to Automatically learn from Data, Improve performance from an Experience and predict
things without explicitly programmed.”
In Simple Words, When we fed the Training Data to Machine Learning Algorithm, this algorithm will produce a mathematical model and with the help of the
mathematical model, the machine will make a prediction and take a decision without being explicitly programmed. Also, during training data, the more
machine will work with it the more it will get experience and the more efficient result is produced.
Example : In Driverless Car, the training data is fed to Algorithm like how to Drive Car in Highway, Busy and Narrow Street with factors like speed limit,
parking, stop at signal etc. After that, a Logical and Mathematical model is created on the basis of that and after that, the car will work according to the
logical model. Also, the more data the data is fed the more efficient output is produced.
According to Tom Mitchell, “A computer program is said to be learning from experience (E), with respect to some task (T). Thus, the performance
measure (P) is the performance at task T, which is measured by P, and it improves with experience E.”
Step 1) Choosing the Training Experience: The very important and first task is to choose the training data or training experience which will be fed to the
Machine Learning Algorithm. It is important to note that the data or experience that we fed to the algorithm must have a significant impact on the Success
or Failure of the Model. So Training data or experience should be chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
● The training experience will be able to provide direct or indirect feedback regarding choices. For example: While Playing chess the training
data will provide feedback to itself like instead of this move if this is chosen the chances of success increases.
● Second important attribute is the degree to which the learner will control the sequences of training examples. For example: when training
data is fed to the machine then at that time accuracy is very less but when it gains experience while playing again and again with itself or
opponent the machine algorithm will get feedback and control the chess game accordingly.
● Third important attribute is how it will represent the distribution of examples over which performance will be measured. For example, a
Machine learning algorithm will get experience while going through a number of different cases and different examples. Thus, Machine
Learning Algorithm will get more and more experience by passing through more and more examples and hence its performance will increase.
Step 2- Choosing target function: The next important step is choosing the target function. It means according to the knowledge fed to the algorithm the
machine learning will choose NextMove function which will describe what type of legal moves should be taken. For example : While playing chess with the
opponent, when opponent will play then the machine learning algorithm will decide what be the number of possible legal moves taken in order to get
success.
Step 3- Choosing Representation for Target function: When the machine algorithm will know all the possible legal moves the next step is to choose the
optimized move using any representation i.e. using linear Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove function will move
the Target move like out of these move which will provide more success rate. For Example : while playing chess machine have 4 possible moves, so the
machine will choose that optimized move which will provide success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with the training data. The training data had to go through
with set of example and through these examples the training data will approximates which steps are chosen and after that machine will provide feedback
on it. For Example : When a training data of Playing chess is fed to algorithm so at that time it is not machine algorithm will fail or get success and again
from that failure or success it will measure while next move what step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes from number of examples , failures and success , correct and incorrect decision
and what will be the next step etc. Example: DeepBlue is an intelligent computer which is ML-based won chess game against the chess expert Garry
Kasparov, and it became the first computer which had beaten a human chess expert.
Learning Algorithms
Learning algorithms are the most popular and basic
algorithms in AI. They can adapt to the problem
environment. As shown in figure 1, we will go through
some of the main learning style algorithms.
Learning Algorithms
A. Supervised Learning
B. Unsupervised Learning
Unlike supervised learning, in unsupervised learning, the
input dataset is not labeled, classified or categorized. A
mathematical model tries to identify similarity in the
dataset and based on that it tries to deduce a structure
present in the input data. Unsupervised learning
problems can be further grouped into clustering
problems and association problems. In clustering
problem, we try to discover the inherent of groupings in
the dataset, whereas in association problem we try to
generalize rules which describe large portions of the
dataset. Most widely used learning algorithms in
unsupervised learning are K-means, Neural Networks,
Linear Discriminant Analysis Principal Component
Analysis, and Apriori Algorithm.
C. Semi-Supervised Learning
In semi-supervised learning, input dataset is a mixture
of both labeled data and unlabeled data. Usually, the
dataset has a small amount of labeled data and a large
amount of unlabeled data. The mathematical model
uses labeled data to learn the structure of unlabeled
data and tries to make predictions. Semi-Supervised
learning problems can also be further grouped into
classification and regression problems.
A. Regression Algorithms
Statistical machine learning has co-opted regression
methods because of the modeling the relationship
between variables. Regression algorithms can
iteratively refine these relationships to predict a better
outcome. Some of the most basic and popular
regression algorithms are linear regression and logistic
regression.
i. Linear Regression
Linear regression has been used for the past 200 years.
It’s been used to remove the variables that are
correlated. Linear regression can also be used to
remove the noisy data from the dataset. The main goal
of linear regression is to minimize the error of a
mathematical model and make the prediction more
accurate.
Linear Regression
This article will discuss some major practical issues and their business
implementation, and how we can overcome them. So let's start with a quick
introduction to Machine Learning.
What is Machine Learning?
Machine Learning is defined as the study of computer algorithms for
automatically constructing computer software through past experience and
training data.
It is a branch of Artificial Intelligence and computer science that helps build a
model based on training data and make predictions and decisions without
being constantly programmed. Machine Learning is used in various
applications such as email filtering, speech recognition, computer vision, self-
driven cars, Amazon product recommendation, etc.
Commonly used Algorithms in Machine Learning
Machine Learning is the study of learning algorithms using past experience
and making future decisions. Although, Machine Learning has a variety of
models, here is a list of the most commonly used machine learning
algorithms by all data scientists and professionals in today's world.
o Linear Regression
o Logistic Regression
o Decision Tree
o Bayes Theorem and Naïve Bayes Classification
o Support Vector Machine (SVM) Algorithm
o K-Nearest Neighbor (KNN) Algorithm
o K-Means
o Gradient Boosting algorithms
o Dimensionality Reduction Algorithms
o Random Forest
Common issues in Machine Learning
Although machine learning is being used in every industry and helps
organizations make more informed and data-driven choices that are more
effective than classical methodologies, it still has so many problems that
cannot be ignored. Here are some common issues in Machine Learning that
professionals face to inculcate ML skills and create an application from
scratch.
1. Inadequate Training Data
The major issue that comes while using machine learning algorithms is the
lack of quality as well as quantity of data. Although data plays a vital role in
the processing of machine learning algorithms, many data scientists claim
that inadequate data, noisy data, and unclean data are extremely exhausting
the machine learning algorithms. For example, a simple task requires
thousands of sample data, and an advanced task such as speech or image
recognition needs millions of sample data examples. Further, data quality is
also important for the algorithms to work ideally, but the absence of data
quality is also found in Machine Learning applications. Data quality can be
affected by some factors as follows:
o Noisy Data- It is responsible for an inaccurate prediction that affects
the decision as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and
results obtained in machine learning models. Hence, incorrect data
may affect the accuracy of the results also.
o Generalizing of output data- Sometimes, it is also found that
generalizing output data becomes complex, which results in
comparatively poor future actions.
2. Poor quality of data
As we have discussed above, data plays a significant role in machine
learning, and it must be of good quality as well. Noisy data, incomplete data,
inaccurate data, and unclean data lead to less accuracy in classification and
low-quality results. Hence, data quality can also be considered as a major
common problem while processing machine learning algorithms.
3. Non-representative training data
To make sure our training model is generalized well or not, we have to ensure
that sample training data must be representative of new cases that we need
to generalize. The training data must cover all cases that are already occurred
as well as occurring.
Further, if we are using non-representative training data in the model, it results
in less accurate predictions. A machine learning model is said to be ideal if it
predicts well for generalized cases and provides accurate decisions. If there is
less training data, then there will be a sampling noise in the model, called the
non-representative training set. It won't be accurate in predictions. To
overcome this, it will be biased against one class or a group.
Hence, we should use representative data in training to protect against being
biased and make accurate predictions without any drift.
4. Overfitting and Underfitting
Overfitting:
Overfitting is one of the most common issues faced by Machine Learning
engineers and data scientists. Whenever a machine learning model is trained
with a huge amount of data, it starts capturing noise and inaccurate data into
the training data set. It negatively affects the performance of the model. Let's
understand with a simple example where we have a few training data sets
such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas.
Then there is a considerable probability of identification of an apple as
papaya because we have a massive amount of biased data in the training
data set; hence prediction got negatively affected. The main reason behind
overfitting is using non-linear methods used in machine learning algorithms
as they build non-realistic data models. We can overcome overfitting by using
linear and parametric algorithms in the machine learning models.
Methods to reduce overfitting:
o Increase training data in a dataset.
o Reduce model complexity by simplifying the model by selecting one
with fewer parameters
o Ridge Regularization and Lasso Regularization
o Early stopping during the training phase
o Reduce the noise
o Reduce the number of attributes in training data.
o Constraining the model.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning
model is trained with fewer amounts of data, and as a result, it provides
incomplete and inaccurate data and destroys the accuracy of the machine
learning model.
Underfitting occurs when our model is too simple to understand the base
structure of the data, just like an undersized pant. This generally happens
when we have limited data into the data set, and we try to build a linear model
with non-linear data. In such scenarios, the complexity of the model destroys,
and rules of the machine learning model become too easy to be applied on
this data set, and the model starts doing wrong predictions as well.
Methods to reduce Underfitting:
o Increase model complexity
o Remove noise from the data
o Trained on increased and better features
o Reduce the constraints
o Increase the number of epochs to get better results.
5. Monitoring and maintenance
As we know that generalized output data is mandatory for any machine
learning model; hence, regular monitoring and maintenance become
compulsory for the same. Different results for different actions require data
change; hence editing of codes as well as resources for monitoring them also
become necessary.
6. Getting bad recommendations
A machine learning model operates under a specific context which results in
bad recommendations and concept drift in the model. Let's understand with
an example where at a specific time customer is looking for some gadgets,
but now customer requirement changed over time but still machine learning
model showing same recommendations to the customer while customer
expectation has been changed. This incident is called a Data Drift. It generally
occurs when new data is introduced or interpretation of data changes.
However, we can overcome this by regularly updating and monitoring data
according to the expectations.
7. Lack of skilled resources
Although Machine Learning and Artificial Intelligence are continuously
growing in the market, still these industries are fresher in comparison to
others. The absence of skilled resources in the form of manpower is also an
issue. Hence, we need manpower having in-depth knowledge of mathematics,
science, and technologies for developing and managing scientific substances
for machine learning.
8. Customer Segmentation
Customer segmentation is also an important issue while developing a
machine learning algorithm. To identify the customers who paid for the
recommendations shown by the model and who don't even check them.
Hence, an algorithm is necessary to recognize the customer behavior and
trigger a relevant recommendation for the user based on past experience.
9. Process Complexity of Machine Learning
The machine learning process is very complex, which is also another major
issue faced by machine learning engineers and data scientists. However,
Machine Learning and Artificial Intelligence are very new technologies but are
still in an experimental phase and continuously being changing over time.
There is the majority of hits and trial experiments; hence the probability of
error is higher than expected. Further, it also includes analyzing the data,
removing data bias, training data, applying complex mathematical
calculations, etc., making the procedure more complicated and quite tedious.
10. Data Bias
Data Biasing is also found a big challenge in Machine Learning. These errors
exist when certain elements of the dataset are heavily weighted or need more
importance than others. Biased data leads to inaccurate results, skewed
outcomes, and other analytical errors. However, we can resolve this error by
determining where data is actually biased in the dataset. Further, take
necessary steps to reduce it.
Methods to remove Data Bias:
o Research more for customer segmentation.
o Be aware of your general use cases and potential outliers.
o Combine inputs from multiple sources to ensure data diversity.
o Include bias testing in the development process.
o Analyze data regularly and keep tracking errors to resolve them easily.
o Review the collected and annotated data.
o Use multi-pass annotation such as sentiment analysis, content
moderation, and intent recognition.
11. Lack of Explainability
This basically means the outputs cannot be easily comprehended as it is
programmed in specific ways to deliver for certain conditions. Hence, a lack of
explainability is also found in machine learning algorithms which reduce the
credibility of the algorithms.
12. Slow implementations and results
This issue is also very commonly seen in machine learning models. However,
machine learning models are highly efficient in producing accurate results but
are time-consuming. Slow programming, excessive requirements' and
overloaded data take more time to provide accurate results than expected.
This needs continuous maintenance and monitoring of the model for
delivering accurate results.
13. Irrelevant features
Although machine learning models are intended to give the best possible
outcome, if we feed garbage data as input, then the result will also be
garbage. Hence, we should use relevant features in our training sample. A
machine learning model is said to be good if training data has a good set of
features or less to no irrelevant features.
Conclusion
An ML system doesn't perform well if the training set is too small or if the
data is not generalized, noisy, and corrupted with irrelevant features. We went
through some of the basic challenges faced by beginners while practicing
machine learning. Machine learning is all set to bring a big bang
transformation in technology. It is one of the most rapidly growing
technologies used in medical diagnosis, speech recognition, robotic training,
product recommendations, video surveillance, and this list goes on. This
continuously evolving domain offers immense job satisfaction, excellent
opportunities, global exposure, and exorbitant salary. It is high risk and a high
return technology. Before starting your machine learning journey, ensure that
you carefully examine the challenges mentioned above. To learn this fantastic
technology, you need to plan carefully, stay patient, and maximize your
efforts. Once you win this battle, you can conquer the Future of work and land
your dream job!
Difference Between Data Science and Machine Learning
S.No Data Science Machine Learning
Data Science is a field about processes and systems to Machine Learning is a field of study that gives computers th
1. extract data from structured and semi-structured data. capability to learn without being explicitly programmed.
2. Need the entire analytics universe. Combination of Machine and Data Science.
Many operations of data science that is, data gathering, It is three types: Unsupervised learning, Reinforcement
7. data cleaning, data manipulation, etc. learning, Supervised learning.
8. Example: Netflix uses Data Science technology. Example: Facebook uses Machine Learning technology.