ML Lecture - 1
ML Lecture - 1
Lecture notes
Machine learning, a subset of artificial intelligence (AI), aims
to comprehend the structure of data and fit it into models
that can be understood and utilized by humans
Machine
learning In traditional computing, algorithms consist of explicitly
programmed instructions used for calculations or problem
solving. On the other hand, machine learning algorithms
enable computers to learn from data inputs and employ
statistical analysis to generate outputs within a specific
range.
What is Machine Learning?
• Machine Learning
• Study of algorithms that
• improve their performance
• at some task
• with experience
• Optimize a performance criterion using example data or past
experience.
• Role of Statistics: Inference from a sample
• Role of Computer science: Efficient algorithms to
• Solve the optimization problems to learn models
• Learn models for unknown and changing worlds
• Representing and evaluating the model for inference
3
Machine Learning Methods
• categorize tasks based on how learning is received or feedback is given to the developed
system. Two widely used machine learning methods are supervised learning and
unsupervised learning.
• Supervised learning
• unsupervised learning
Machine
Learning -
Categories
•Initially, researchers started out with Supervised Learning. This is the case of housing price prediction
discussed earlier.
•This was followed by unsupervised learning, where the machine is made to learn on its own without
any supervision.
•Scientists discovered further that it may be a good idea to reward the machine when it does the job
the expected way and there came the Reinforcement Learning.
•Very soon, the data that is available these days has become so humongous that the conventional
techniques developed so far failed to analyze the big data and provide us the predictions.
•Thus, came the deep learning where the human brain is simulated in the Artificial Neural
Networks (ANN) created in our binary computers.
•The machine now learns on its own using the high computing power and huge memory resources
that are available today.
•It is now observed that Deep Learning has solved many of the previously unsolvable problems.
•The technique is now further advanced by giving incentives to Deep Learning networks as awards
and there finally comes Deep Reinforcement Learning.
• Supervised learning
• involves training algorithms using example inputs and labeled outputs provided by humans. The
algorithm learns by comparing its actual output with the labeled outputs, identifying errors, and
modifying the model accordingly. It uses patterns to predict label values for unlabeled data. For
example, an algorithm trained using labeled images of sharks and oceans can later classify
unlabeled shark and ocean images correctly.
• Supervised learning is commonly used for predicting future events based on historical data or
filtering spam emails.
Regression
Similarly, in the case of supervised learning, you give concrete known examples to the computer. You say that for given
feature value x1 the output is y1, for x2 it is y2, for x3 it is y3, and so on. Based on this data, you let the computer figure
out an empirical relationship between x and y.
Once the machine is trained in this way with a sufficient number of data points, now you would ask the machine to
predict Y for a given X. Assuming that you know the real value of Y for this given X, you will be able to deduce whether the
machine’s prediction is correct.
Thus, you will test whether the machine has learned by using the known test data. Once you are satisfied that the
machine is able to do the predictions with a desired level of accuracy (say 80 to 90%) you can stop further training the
machine.
Now, you can safely use the machine to do the predictions on unknown data points, or ask the machine to predict Y for a
given X for which you do not know the real value of Y. This training comes under the regression that we talked about
earlier.
Classification
You may also use machine learning techniques for classification problems. In classification problems, you classify objects of
similar nature into a single group. For example, in a set of 100 students say, you may like to group them into three groups
based on their heights - short, medium and long. Measuring the height of each student, you will place them in a proper
group.
Now, when a new student comes in, you will put him in an appropriate group by measuring his height. By following the
principles in regression training, you will train the machine to classify a student based on his feature – the height. When the
machine learns how the groups are formed, it will be able to classify any unknown new student correctly. Once again, you
would use the test data to verify that the machine has learned your technique of classification before putting the developed
model in production.
Supervised Learning is where the AI really began its journey. This technique was applied successfully in several cases. You
have used this model while doing the hand-written recognition on your machine. Several algorithms have been developed
for supervised learning. You will learn about them in the following chapters.
• Unsupervised learning
deals with unlabeled data, allowing the learning algorithm to find commonalities among the input data.
Unsupervised learning is valuable because unlabeled data is more abundant than labeled data. It aims to discover
hidden patterns within a dataset or automatically learn representations needed for data classification.
It is often used for analyzing transactional data or detecting anomalies in credit card transactions. It can also
power recommender systems. For example, an unsupervised learning algorithm analyzing customer purchase
data might identify a group of women buying unscented soaps as likely to be pregnant, enabling targeted
marketing campaigns
In unsupervised learning,
we do not specify a target variable to the machine, rather we ask machine “What can you tell me about X?”. More
specifically, we may ask questions such as given a huge data set X, “What are the five best groups we can make out of X?” or
“What features occur together most frequently in X?”. To arrive at the answers to such questions, you can understand that
the number of data points that the machine would require to deduce a strategy would be very large. In case of supervised
learning, the machine can be trained with even about few thousands of data points. However, in case of unsupervised
learning, the number of data points that is reasonably accepted for learning starts in a few millions. These days, the data is
generally abundantly available. The data ideally requires curating. However, the amount of data that is continuously flowing
in a social area network, in most cases data curation is an impossible task.
The following figure shows the boundary between the yellow and red dots as determined by unsupervised machine learning.
You can see it clearly that the machine would be able to determine the class of each of the black dots with a fairly good
accuracy.
Reinforcement Learning
Consider training a pet dog, we train our pet to bring a ball to us. We throw the ball at a certain distance and ask the
dog to fetch it back to us. Every time the dog does this right, we reward the dog. Slowly, the dog learns that doing the
job rightly gives him a reward and then the dog starts doing the job right way every time in future. Exactly, this concept
is applied in “Reinforcement” type of learning. The technique was initially developed for machines to play games. The
machine is given an algorithm to analyze all possible moves at each stage of the game. The machine may select one of
the moves at random. If the move is right, the machine is rewarded, otherwise it may be penalized. Slowly, the machine
will start differentiating between right and wrong moves and after several iterations would learn to solve the game
puzzle with a better accuracy. The accuracy of winning the game would improve as the machine plays more and more
games.
These networks have been successfully applied in solving the problems of computer vision, speech
recognition, natural language processing, bioinformatics, drug design, medical image analysis, and
games. There are several other fields in which deep learning is proactively applied. The deep
learning requires huge processing power and humongous data, which is generally easily available
these days.
We will talk about deep learning more in detail in the coming chapters.
Deep Reinforcement Learning
The Deep Reinforcement Learning (DRL) combines the techniques of both deep and reinforcement learning. The
reinforcement learning algorithms like Q-learning are now combined with deep learning to create a powerful DRL
model. The technique has been with a great success in the fields of robotics, video games, finance and healthcare. Many
previously unsolvable problems are now solved by creating DRL models. There is lots of research going on in this area
and this is very actively pursued by the industries.
So far, you have got a brief introduction to various machine learning models, now let us explore slightly deeper into
various algorithms that are available under these models.
supervised learning
• As previously mentioned, it s a widely used and successful approach in the field. In this chapter,
we will delve into supervised learning in greater detail and explore several popular algorithms
within this category
Supervised learning is employed when we aim to predict a specific outcome based on given
inputs, and we possess examples of input/output pairs. Using these pairs, known as the training
set, we construct a machine learning model. The ultimate objective is to make accurate
predictions for new, unseen data. While supervised learning often requires human effort to create
the training set, it subsequently automates and frequently expedites a task that would otherwise
be arduous or impractical
Classification and Regression
There are two major types of supervised machine learning problems, called classifica- tion and regression
Classification
It is defined as the systematic arrangement of the objects in groups or categories according to fixed criteria. It is a
part of a fundamental pre-number learning concept. Comparing items according to similarities and differences falls
under classification. There are various aspects that we can teach kids with help of classification. The three major
factors are:
For example, you can classify the apples in one category, the bananas in another, and so on. Similarly, geometric shapes
can be classified as triangles, quadrilaterals, and so on. Let us understand this with another example. If you are asked to
identify the relation between the given pairs on either side of :::: and you need to find the missing figure from the four
options given, can you do it?
In classification tasks,
• the objective is to predict a class label from a predefined set of possibilities. In Chapter 1, we used the
example of classifying circles into three different species.
• Classification is sometimes separated into binary classification, which is the special case of distinguishing
between exactly two classes, and multiclass classification, which is classification between more than two
classes. You can think of binary classification as trying to answer a yes/no question. Classifying emails as
either spam or not spam is an example of a binary classification problem. In this binary classification task,
the yes/no question being asked would be “Is this email spam?
In binary classification we often speak of one class being the posi‐ tive class and the other class being
the negative class. Here, positive doesn’t represent having benefit or value, but rather what the
object of the study is. So, when looking for spam, “positive” could mean the spam class. Which of the
two classes is called positive is often a subjective matter, and specific to the domain.
For regression tasks, the goal is to predict a continuous number, or a floating‐point number in
programming terms (or real number in mathematical terms). Predicting a person’s annual income
from their education, their age, and where they live is an example of a regression task. When
predicting income, the predicted value is an amount, and can be any number in a given range.
Another example of a regression task is predicting the yield of a corn farm given attributes such as
previous yields, weather, and number of employees working on the farm. The yield again can be an
arbitrary number.
An easy way to distinguish between classification and regression tasks is to ask whether there is some kind of continuity
in the output. If there is continuity between possible outcomes, then the problem is a regression problem. Think about
predicting annual income. There is a clear continuity in the output. Whether a person makes $40,000 or $40,001 a year
does not make a tangible difference, even though these are different amounts of money; if our algorithm predicts
$39,999 or $40,001 when it should have predicted $40,000, we don’t mind that much.
By contrast, for the task of recognizing the language of a website (which is a classifi- cation problem), there is no matter
of degree. A website is in one language, or it is in another. There is no continuity between languages, and there is no
language that is between English and French.1
Generalization, Overfitting, and Underfitting
• In supervised learning, we want to build a model on the training data and then be able to make accurate
predictions on new, unseen data that has the same characteris- tics as the training set that we used. If a
model is able to make accurate predictions on unseen data, we say it is able to generalize from the
training set to the test set. We want to build a model that is able to generalize as accurately as possible.
• Usually we build a model in such a way that it can make accurate predictions on
the training set. If the training and test sets have enough in common, we expect the
model to also be accurate on the test set. However, there are some cases where this
can go wrong. For example, if we allow ourselves to build very complex models,
we can always be as accurate as we like on the training set.
• Let’s take a look at a made-up example to illustrate this point. Say a novice data
scien‐ tist wants to predict whether a customer will buy a boat, given records of
previous boat buyers and customers who we know are not interested in buying a
boat.2 The goal is to send out promotional emails to people who are likely to
actually make a purchase, but not bother those customers who won’t be interested
• After looking at the data for a while, our novice data scientist comes up with the fol-
• lowing rule: “If the customer is older than 45, and has less than 3 children or is not
• divorced, then they want to buy a boat.” When asked how well this rule of his does,
• our data scientist answers, “It’s 100 percent accurate!” And indeed, on the data that is
• in the table, the rule is perfectly accurate. There are many possible rules we could
• come up with that would explain perfectly if someone in this dataset wants to buy a
• boat. No age appears twice in the data, so we could say people who are 66, 52, 53, or
• 58 years old want to buy a boat, while all others don’t. While we can make up many rules that
work well on this data, remember that we are not interested in making pre‐ dictions for this dataset;
we already know the answers for these customers. We want to know if new customers are likely to
buy a boat. We therefore want to find a rule that will work well for new customers, and achieving
100 percent accuracy on the training set does not help us there. We might not expect that the rule
our data scientist came up with will work very well on new customers. It seems too complex, and it
is sup‐ ported by very little data. For example, the “or is not divorced” part of the rule hinges on a
single customer.
The only measure of whether an algorithm will perform well on new data is the eval‐ uation on the test set. However, intuitively3
we expect simple models to generalize better to new data. If the rule was “People older than 50 want to buy a boat,” and this
would explain the behavior of all the customers, we would trust it more than the rule involving children and marital status in
addition to age. Therefore, we always want to find the simplest model. Building a model that is too complex for the amount of
information we have, as our novice data scientist did, is called overfitting. Overfitting occurs when you fit a model too closely to
the particularities of the training set and obtain a model that works well on the training set but is not able to generalize to new
data. On the other hand, if your model is too simple—say, “Everybody who owns a house buys a boat”—then you might not be
able to capture all the aspects of and vari‐ ability in the data, and your model will do badly even on the training set. Choosing too
simple a model is called underfitting.
The more complex we allow our model to be, the better we will be able to predict on the training data. However, if our
model becomes too complex, we start focusing too much on each individual data point in our training set, and the model
will not gener‐alize well to new data.
There is a sweet spot in between that will yield the best generalization performance. This is the model we want to find.
The trade-off between overfitting and underfitting is illustrated in Figure
Relation of Model Complexity to Dataset Size
It’s important to note that model complexity is intimately tied to the variation of inputs contained in your
training dataset: the larger variety of data points your data- set contains, the more complex a model you can use
without overfitting. Usually, col- lecting more data points will yield more variety, so larger datasets allow building
more complex models. However, simply duplicating the same data points or collecting very similar data will not
help.
Going back to the boat selling example, if we saw 10,000 more rows of customer data, and all of them complied
with the rule “If the customer is older than 45, and has less than 3 children or is not divorced, then they want to
buy a boat,” we would be much more likely to believe this to be a good rule than when it was developed using
only the 12 rows in Table
Having more data and building appropriately more complex models can often work wonders for supervised
learning tasks. In this book, we will focus on working with datasets of fixed sizes. In the real world, you often have
the ability to decide how much data to collect, which might be more beneficial than tweaking and tuning your