0% found this document useful (0 votes)
9 views

ML (Theory)

machine learning btech sem vii

Uploaded by

rohitkumar74348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ML (Theory)

machine learning btech sem vii

Uploaded by

rohitkumar74348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Subject: - Machine Learning (CS-704) Write notes on

following topic with necessary diagram (if


needed)Machine Learning (CS-704) Notes
1. Applications and Limitations of Machine Learning
Applications:
 Healthcare: Predictive analytics for disease diagnosis,
personalized medicine, and medical imaging analysis.
 Finance: Fraud detection, algorithmic trading, risk assessment,
and customer segmentation.
 Marketing: Customer behavior analysis, recommendation
systems, and targeted advertising.
 Autonomous Vehicles: Object detection, path planning, and real-
time decision-making.
 Natural Language Processing (NLP): Sentiment analysis, chatbots,
and language translation.
 Manufacturing: Predictive maintenance, quality control, and
supply chain optimization.
Limitations:
 Data Quality: Machine learning models require high-quality,
relevant data; poor data can lead to inaccurate predictions.
 Overfitting: Models may perform well on training data but poorly
on unseen data if they are too complex.
 Interpretability: Many machine learning models (especially deep
learning) are often seen as "black boxes," making it difficult to
understand their decision-making process.
 Computational Resources: Training complex models can require
significant computational power and memory.
 Bias: If training data is biased, the model will likely produce biased
results, leading to unfair or unethical outcomes.

2. Regression and Classification


Regression:
 Definition: A type of predictive modeling technique that estimates
the relationships among variables. It predicts a continuous output.
 Example: Predicting house prices based on features like size,
location, and number of bedrooms.
Classification:
 Definition: A type of predictive modeling that assigns labels to
instances based on input features. It predicts categorical outputs.
 Example: Email spam detection, where emails are classified as
"spam" or "not spam."
Diagram:
VerifyOpen In EditorEditCopy code
1[Input Features] --> [Model] --> [Output: Continuous (Regression) or
Categorical (Classification)]

3. Process of Machine Learning


The machine learning process can be broken down into the following
steps:
1. Data Collection: Gathering relevant data from various sources.
2. Data Preprocessing: Cleaning and preparing data for analysis
(handling missing values, normalization, etc.).
3. Feature Engineering: Selecting and transforming variables to
improve model performance.
4. Model Selection: Choosing the appropriate machine learning
algorithm based on the problem type.
5. Training: Feeding the model with training data to learn patterns.
6. Evaluation: Testing the model with validation data to assess its
performance (using metrics like accuracy, precision, etc.).
7. Hyperparameter Tuning: Adjusting model parameters to optimize
performance.
8. Deployment: Implementing the model in a production
environment for real-world use.
9. Monitoring and Maintenance: Continuously evaluating model
performance and updating it as necessary.

4. Reinforcement Learning and Semi-Supervised Learning


Reinforcement Learning (RL):
 Definition: A type of machine learning where an agent learns to
make decisions by taking actions in an environment to maximize
cumulative reward.
 Key Components:
 Agent: The learner or decision maker.
 Environment: The context in which the agent operates.
 Actions: Choices made by the agent.
 Rewards: Feedback from the environment based on the
agent's actions.
Semi-Supervised Learning:
 Definition: A type of machine learning that uses both labeled and
unlabeled data for training. It is particularly useful when acquiring
a fully labeled dataset is expensive or time-consuming.
 Example: A model trained with a small amount of labeled images
and a large amount of unlabeled images for image classification
tasks.

5. Simple Linear Regression and Multiple Linear Regression


Simple Linear Regression:
 Definition: A statistical method that models the relationship
between two variables by fitting a linear equation to observed
data.
 Equation: ( Y = b_0 + b_1X + \epsilon )
 ( Y ): Dependent variable
 ( X ): Independent variable
 ( b_0 ): Intercept
 ( b_1 ): Slope
 ( \epsilon ): Error term
Multiple Linear Regression:
 Definition: An extension of simple linear regression that models
the relationship between one dependent variable and multiple
independent variables.
 Equation: ( Y = b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n + \epsilon )
 Here, ( X_1, X_2, ..., X_n ) are multiple independent
variables.

6. Linear Regression and Logistic Regression


Linear Regression:
 Purpose: Used for predicting continuous outcomes.
 Output: A continuous value.
 Assum ptions: Assumes a linear relationship between the
independent and dependent variables.
Logistic Regression:
 Purpose: Used for binary classification problems.
 Output: A probability value that is transformed into a binary
outcome (0 or 1).
 Equation: Uses the logistic function to model the probability:
[ P(Y=1|X) = \frac{1}{1 + e^{-(b_0 + b_1X)}} ]
 Assumptions: Assumes a linear relationship between the
independent variables and the log-odds of the dependent
variable.

7. Machine Learning
Machine Learning is a subset of artificial intelligence that focuses on the
development of algorithms that allow computers to learn from and
make predictions based on data. It involves training models on data to
recognize patterns and make decisions without being explicitly
programmed for each task.

8. Different Types of Machine Learning


1. Supervised Learning: The model is trained on labeled data, where
the input-output pairs are known. Examples include regression
and classification tasks.
2. Unsupervised Learning: The model is trained on unlabeled data,
and it tries to find patterns or groupings in the data. Examples
include clustering and dimensionality reduction.
3. Semi-Supervised Learning: Combines both labeled and unlabeled
data for training, often used when labeling data is expensive.
4. Reinforcement Learning: The model learns by interacting with an
environment and receiving feedback in the form of rewards or
penalties.

9. Bias, Variance, Bias-Variance Trade-off, Regularization, Overfitting,


and Underfitting
Bias:The error due to overly simplistic assumptions in the learning
algorithm. High bias can lead to underfitting.
Variance:The error due to excessive complexity in the learning
algorithm. High variance can lead to overfitting.
Bias-Variance Trade-off:The balance between bias and variance is
crucial for model performance. A good model should minimize both
bias and variance to achieve optimal performance.
Regularization: Techniques used to reduce overfitting by adding a
penalty to the loss function. Common methods include L1 (Lasso) and
L2 (Ridge) regularization.
Overfitting: When a model learns the training data too well, capturing
noise and outliers, leading to poor performance on unseen data.
Underfitting:When a model is too simple to capture the underlying
patterns in the data, resulting in poor performance on both training and
test data.

10. Ensemble Model


An ensemble model combines multiple individual models to produce a
better overall model. The idea is that by aggregating the predictions of
several models, the ensemble can achieve better accuracy and
robustness than any single model. Common ensemble methods include
bagging, boosting, and stacking.

11. Bagging and Boosting


Bagging (Bootstrap Aggregating):
 Definition: A technique that reduces variance by training multiple
models on different subsets of the training data (created through
bootstrapping) and averaging their predictions.
 Example: Random Forest is a popular bagging algorithm.
Boosting:A technique that reduces bias by sequentially training models,
where each new model focuses on the errors made by the previous
ones. The final prediction is a weighted sum of the predictions from all
models.
 Example: AdaBoost and Gradient Boosting are common boosting
algorithms.

12. Support Vector Machine (SVM)


Support Vector Machine is a supervised learning algorithm used for
classification and regression tasks. It works by finding the hyperplane
that best separates the data points of different classes in a high-
dimensional space. SVM aims to maximize the margin between the
closest points of the classes (support vectors).

13. Short Notes on:


a) Clustering:An unsupervised learning technique that groups similar
data points together based on their features. Common algorithms
include K-means and hierarchical clustering.
b) Naïve Bayes:A family of probabilistic algorithms based on Bayes'
theorem, assuming independence among predictors. It is commonly
used for text classification tasks.
c) K-NN Algorithm:K-Nearest Neighbors is a simple, instance-based
learning algorithm that classifies a data point based on the majority
class of its K nearest neighbors in the feature space.
d) K-means Algorithm: A popular clustering algorithm that partitions
data into K distinct clusters by minimizing the variance within each
cluster.

14. Hierarchical Agglomerative Clustering


Hierarchical Agglomer ative Clustering is a bottom-up approach to
clustering where each data point starts as its own cluster. The algorithm
iteratively merges the closest pairs of clusters until a single cluster is
formed or a specified number of clusters is reached. The result can be
visualized using a dendrogram, which illustrates the merging process
and the distances at which clusters are combined.

15. K-Means Partitioned Clustering


K-Means Partitioned Clustering is an iterative algorithm that partitions
the dataset into K distinct clusters. The steps involved are:
1. Initialization: Randomly select K centroids from the data points.
2. Assignment: Assign each data point to the nearest centroid,
forming K clusters.
3. Update: Recalculate the centroids as the mean of all data points
assigned to each cluster.
4. Repeat: Continue the assignment and update steps until the
centroids no longer change significantly or a maximum number of
iterations is reached.

16. Precision, Recall, Accuracy, F1 Score


 Precision: The ratio of true positive predictions to the total
predicted positives. It measures the accuracy of positive
predictions. [ \text{Precision} = \frac{TP}{TP + FP} ]
 Recall: The ratio of true positive predictions to the total actual
positives. It measures the ability of a model to find all relevant
cases. [ \text{Recall} = \frac{TP}{TP + FN} ]
 Accuracy: The ratio of correctly predicted instances (both true
positives and true negatives) to the total instances. [ \
text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} ]
 F1 Score: The harmonic mean of precision and recall, providing a
balance between the two metrics. [ F1 = 2 \times \frac{\
text{Precision} \times \text{Recall}}{\text{Precision} + \
text{Recall}} ]

17. Perceptron in ANN & Types of Perceptron


A perceptron is the simplest type of artificial neural network (ANN) and
serves as a binary classifier. It consists of input features, weights, a bias,
and an activation function. The perceptron computes a weighted sum of
the inputs and applies an activation function to produce an output.
Types of Perceptron:
1. Single-Layer Perceptron: Consists of a single layer of output nodes
connected directly to the input features. It can only classify
linearly separable data.
2. Multi-Layer Perceptron (MLP): Contains one or more hidden
layers between the input and output layers, allowing it to learn
complex patterns and classify non-linearly separable data.

18. Multilayer Networks and Back Propagation


Multilayer Networks:
Multilayer networks, or Multi-Layer Perceptrons (MLPs), consist of
multiple layers of neurons, including an input layer, one or more hidden
layers, and an output layer. Each neuron in a layer is connected to every
neuron in the subsequent layer, allowing the network to learn complex
functions.
Back Propagation:
Back propagation is a supervised learning algorithm used for training
neural networks. It involves the following steps:
1. Forward Pass: Input data is passed through the network to obtain
the output.
2. Loss Calculation: The difference between the predicted output
and the actual output is calculated using a loss function.
3. Backward Pass: The error is propagated back through the
network, and the weights are updated using gradient descent to
minimize the loss.
This process is repeated for multiple iterations (epochs) until the model
converges to an optimal set of weights.

You might also like