0% found this document useful (0 votes)
4 views

Machine Learning Reg

Uploaded by

xyzx010101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Machine Learning Reg

Uploaded by

xyzx010101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

What is Machine Learning?

Machine Learning is a subset of artificial intelligence(AI) that


focus on learning from data to develop an algorithm that can be
used to make a prediction.
Features of Machine Learning:
▪ Machine learning uses data to detect various patterns in a given
dataset.
▪ It can learn from past data and improve automatically.
▪ It is a data-driven technology.
▪ Machine learning is much similar to data mining as it also deals with
the huge amount of the data.
▪ AI is utilized in self-driving vehicles, digital misrepresentation
identification, face acknowledgment, and companion idea by Facebook,
and so on.
▪ Different top organizations, for example, Netflix and Amazon have
constructed AI models that are utilizing an immense measure of
information to examine the client interest and suggest item likewise.
Some key points which show the importance of Machine Learning:
▪ Rapid increment in the production of data
▪ Solving complex problems, which are difficult for a human
▪ Decision making in various sector including finance
▪ Finding hidden patterns and extracting useful information from data.
TYPES OF MACHINE LEARNING
Classification of Machine Learning
Machine learning can be classified into three types:
▪ Supervised learning
▪ Unsupervised learning
▪ Reinforcement learning
SUPERVISED MACHINE LEARNING
▪ Supervised Machine Learning, sample labeled data are provided
to the machine learning system for training, and the system then
predicts the output based on the training data.

▪ The system uses labeled data to build a model that understands


the datasets and learns about each one.

▪ After the training and processing are done, we test the model
with sample data to see if it can accurately predict the output.
Supervised Machine Learning

▪ The mapping of the input data to the output data is the


objective of supervised learning.

▪ The managed learning depends on oversight, and it is


equivalent to when an understudy learns things in the
management of the educator.

▪ Spam filtering is an example of supervised learning


Supervised Machine Learning

▪ Supervised learning is a type of machine learning in which the


algorithm is trained on the labeled dataset.

▪ It learns to map input features to targets based on labeled training


data.

▪ In supervised learning, the algorithm is provided with input features


and corresponding output labels, and it learns to generalize from
this data to make predictions on new, unseen data.
Supervised Machine Learning

▪ Supervised learning, is known as subcategory


of machine learning and artificial intelligence.

▪ It is defined by its use of labeled data sets to train


algorithms that to classify data or predict outcomes
accurately.
▪ In supervised learning, the training data provided to the machines work
as the supervisor that teaches the machines to predict the output
correctly. Supervised learning is a process of providing input data as well
as correct output data to the machine learning model.

▪ The aim of a supervised learning algorithm is to find a mapping function


to map the input variable(x) with the output variable(y).

▪ In the real-world, supervised learning can be used for Risk Assessment,


Image classification, Fraud Detection, spam filtering, etc.
HOW TO WORK SUPERVISED ML
▪ Suppose we have a dataset of different types of shapes which includes
square, rectangle, triangle, and Polygon. Now the first step is that we
need to train the model for each shape.

▪ If the given shape has four sides, and all the sides are equal, then it will
be labelled as a Square.

▪ If the given shape has three sides, then it will be labelled as a triangle.

▪ If the given shape has six equal sides then it will be labelled as hexagon.
Steps Involved in Supervised Learning:

▪ First Determine the type of training dataset


▪ Collect/Gather the labelled training data.
▪ Split the training dataset into training dataset, test dataset, and
validation dataset.
▪ Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
▪ Determine the suitable algorithm for the model, such as
support vector machine, decision tree, etc.

▪ Execute the algorithm on the training dataset. Sometimes we


need validation sets as the control parameters, which are the
subset of training datasets.
▪ Evaluate the accuracy of the model by providing the test set.
If the model predicts the correct output, which means our
model is accurate.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
Regression
▪ Regression algorithms are used if there is a
relationship between the input variable and the output
variable.
▪ It is used for the prediction of continuous variables,
such as Weather forecasting, Market Trends, etc.
Some popular Regression algorithms which come under
supervised learning:
•Linear Regression
•Regression Trees
•Non-Linear Regression
•Bayesian Linear Regression
•Polynomial Regression
Classification
Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No,
Male-Female, True-false, etc.
Spam Filtering
TYPES OF CLASSIFICATIONS
•Random Forest
•Decision Trees
•Logistic Regression
•Support vector Machines
Advantages of Supervised learning:

•With the help of supervised learning, the model can predict the
output on the basis of prior experiences.

•In supervised learning, we can have an exact idea about the


classes of objects.

•Supervised learning model helps us to solve various real-world


problems such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:
•Supervised learning models are not suitable for handling the
complex tasks.
•Supervised learning cannot predict the correct output if the test
data is different from the training dataset.
•Training required lots of computation times.
•In supervised learning, we need enough knowledge about the
classes of object.
What is Regression?
▪ Regression is a statistical method used in machine learning to
model and analyze the relationships between a dependent
variable (output) and one or more independent variables
(inputs).

▪ It aims to predict the dependent variable’s value based on the


independent variables’ values.
Regression in Machine Learning
▪ In machine learning, regression is a type of supervised learning
in which the model learns from a dataset of input-output pairs.
▪ The model identifies patterns in the input features to predict
continuous numerical values of the output variable.
▪ Regression algorithms help solve regression problems by
finding the relationship between the data points and fitting a
regression model.
Characteristics of Regression
▪ Dependent and Independent Variables: Regression models the
relationship between the dependent and independent variables
(target) (predictors).
▪ Regression Coefficients: These are the parameters of the
regression model that are estimated from the data.
▪ Regression Line: In linear regression, this is the line that best
fits the data points.
▪ Residuals: The differences between the predicted values and
the actual values.
• Loss Function: Measures the model’s error. Examples include
mean squared error (MSE) and mean absolute error (MAE).
• Overfitting and Underfitting: Regression models must
balance complexity and simplicity to generalize well on
unseen data.
• Regularization Techniques: Methods like ridge and lasso
regression are used to avoid overfitting by penalizing large
coefficients
What are Regression Algorithms?
▪ Regression algorithms are a subset of machine learning
algorithms that predict a continuous output variable based on
one or more input features.
▪ Regression aims to model the relationship between the
dependent variable (output) and one or more independent
variables (inputs).
▪ These algorithms attempt to find the best-fit line, curve, or
surface that minimizes the difference between predicted and
actual values.
Applications of Regression Algorithms
Regression algorithms are versatile tools used to predict continuous outcomes across
various domains. Here are some detailed applications:
1. Finance and Economics:
o Stock Price Prediction: Predicting future stock prices based on historical data,
market trends, and economic indicators.
o Risk Management: Estimating the risk of investment portfolios and calculating
Value at Risk (VaR).
o Economic Forecasting: Modeling economic indicators like GDP growth,
unemployment rates, and inflation trends.
o Credit Scoring: Assessing the creditworthiness of individuals or companies by
predicting default probabilities.
2. Healthcare:
o Disease Progression: Predicting the progression of diseases
such as diabetes or cancer based on patient history and
medical data.
o Patient Outcomes: Estimating patient survival rates,
recovery times, and treatment effectiveness.
o Healthcare Costs: Forecasting hospital readmission rates
and healthcare expenditures.
3. Marketing and Sales:
o Customer Lifetime Value (CLV) Is the total value a customer
will bring to a business over the course of their relationship.
o Sales Forecasting: Predicting future sales based on historical
sales data, market conditions, and promotional activities.
o Market Response Modeling: Understanding and predicting
consumer responses to marketing campaigns and changes in
pricing.
4. Engineering and Manufacturing:
o Predictive Maintenance: Forecasting equipment failures and
maintenance needs to reduce downtime and repair costs.
5. Environmental Science:
o Weather Forecasting: Predicting weather conditions such as
temperature, rainfall, and wind speed.
o Climate Change Modeling: Estimating the impacts of climate change
on various environmental factors.
o Pollution Levels: Forecasting air and water pollution levels based on
industrial activities, traffic, and meteorological data.
6. Retail and E-commerce:
o Demand Forecasting: Predicting future product demand to
optimize inventory levels and supply chain management.
o Price Optimization: Estimating the optimal pricing strategy
to maximize revenue and profit.
7. Transportation and Logistics:
o Delivery Time Estimation: Forecasting delivery times in
logistics and supply chain operations based on various
factors, such as distance, traffic, and weather conditions.
Benefits and Drawbacks of Regression Algorithms
Advantages:
▪ Simplicity: Many regression algorithms, especially linear regression, are
easy to understand and implement.
▪ Interpretability: Regression models, particularly linear ones, provide clear
insights into the relationships between variables.
▪ Efficiency: Regression algorithms can be computationally efficient,
particularly for linear models.
▪ Versatility: Applicable to a wide range of problems across different fields.
▪ Predictive Power: Can be very accurate for predicting continuous
outcomes when the model is well-fitted.
Drawbacks:
▪ Overfitting: Complex models (e.g., polynomial regression) can
overfit the training data, capturing noise instead of the
underlying pattern.
▪ Underfitting: Simple models may underfit the data, failing to
capture important patterns.
▪ Assumptions: Many regression methods rely on assumptions
(e.g., linearity, normality, independence of errors) that may not
hold for all datasets.
▪ Sensitivity to Outliers: Outliers can heavily influence
regression models, leading to inaccurate predictions.
▪ Multicollinearity: When independent variables are highly
correlated, it can cause instability in the coefficient estimates.
▪ Scalability: Some regression techniques (e.g., neural network
regression) can become computationally expensive with large
datasets.
Linear Regression
▪ Linear Regression is an ML algorithm used for supervised
learning. It predicts a dependent variable(target) based on the
given independent variable(s).
▪ This regression technique reveals a linear relationship between
a dependent variable and the other given independent
variables.
▪ Hence, the name of this algorithm is linear regression. It has
two types: simple linear regression and multiple linear
regression.
In the figure above, the independent variable is on the X-axis, and the output
is on the Y-axis.
The regression line is the best-fit line for a model, and our main objective in
this algorithm is to find this best-fit line.
Pros:
❖ Linear Regression model is simple to implement.
❖ Less complexity compared to other algorithms.
❖ Linear Regression may lead to over-fitting, but it can be avoided by using
some dimensionality reduction techniques, regularization techniques, and
cross-validation.
Cons:
❖ Outliers affect this algorithm badly.
❖ A linear regression model oversimplifies real-world problems by assuming
a linear relationship among the variables; hence, it is not recommended
for practical use cases.
Regression Evaluation Metrics
Here are three common evaluation metrics for regression problems:
•Mean Absolute Error (MAE) is the mean of the absolute value of the
errors:1nn∑i=1|yi−^yi|1n∑i=1n|yi−y^i|
•Mean Squared Error (MSE) is the mean of the squared
errors:1nn∑i=1(yi−^yi)21n∑i=1n(yi−y^i)2
•Root Mean Squared Error (RMSE) is the square root of the mean of the
squared errors: ⎷1nn∑i=1(yi−^yi)21n∑i=1n(yi−y^i)2
Comparing these metrics:
•MAE is the easiest to understand, because it's the average error.
•MSE is more popular than MAE, because MSE "punishes" larger errors, which tends
to be useful in the real world.
•RMSE is even more popular than MSE, because RMSE is interpretable in the "y"
units.
All of these are loss functions, because we want to minimize them.
Assumptions We Make in a Linear Regression Model:
Given below are the basic assumptions that a linear regression
model makes regarding a dataset on which it is applied:
• Linear relationship: The relationship between response
and feature variables should be linear.
• The linearity assumption can be tested using scatter plots.
• As shown below, 1st figure represents linearly related
variables whereas variables in the 2nd and 3rd figures are
most likely non-linear.
• So, 1st figure will give better predictions using linear
regression.
Linear relationship i the feature space
• Little or no multi-collinearity: It is assumed that there is little or no
multicollinearity in the data. Multicollinearity occurs when the features
(or independent variables) are not independent of each other.
• Little or no autocorrelation: Another assumption is that there is little
or no autocorrelation in the data. Autocorrelation occurs when the
residual errors are not independent of each other. You can refer here
for more insight into this topic.
• No outliers: We assume that there are no outliers in the data. Outliers
are data points that are far away from the rest of the data. Outliers can
affect the results of the analysis.
• Homoscedasticity:
• Homoscedasticity describes a situation in which the error
term (that is, the “noise” or random disturbance in the
relationship between the independent variables and the
dependent variable) is the same across all values of the
independent variables.
• As shown below, figure 1 has homoscedasticity while Figure 2
has heteroscedasticity.

You might also like