Machine Learning - Easy & Complete Notes
UNIT 2: DATA REPRESENTATION
1. Introduction to Data in ML
- Data = Raw facts (numbers, text, images, etc).
- Without data, machine learning models cannot be trained.
Example: Facebook bought WhatsApp mainly to access user data and improve services.
2. Why is Data Important?
- Good data = Good model.
- Data improves AI, Automation, and Analytics.
3. Properties of Data
- Volume, Variety, Velocity, Value, Veracity, Viability, Security, Accessibility, Integrity, Usability.
4. Types of Data (Based on Structure)
- Structured: Tables (Eg: Excel files, SQL database)
- Unstructured: No fixed format (Eg: Videos, Tweets)
- Semi-structured: Mix of both (Eg: JSON files, XML)
5. Types of Data (Based on Representation)
- Numerical Data: Numbers like Age, Income.
- Categorical Data: Categories like Male/Female.
- Ordinal Data: Ordered categories like Size (S, M, L).
6. Types of Data (Based on Labelling)
- Labelled: Input + Correct Output (Eg: Dog Image + 'Dog' label)
- Unlabelled: Only Input (Eg: Dog Image without label)
7. Data to Information to Knowledge (Example)
- Data: Raw survey responses.
Machine Learning - Easy & Complete Notes
- Information: Summary reports.
- Knowledge: Use information to improve services.
8. How Data is Split
- Training Set: Model learns here.
- Validation Set: Model tuned here.
- Testing Set: Model checked here.
9. Advantages of Using Data
- Better Accuracy, Automation, Personalization, Cost-saving.
10. Challenges with Data
- Poor quality data, Small data, Bias, Overfitting, Privacy issues.
11. Importance of Data Preparation
- Clean data improves model predictions.
12. Data Preparation Process
- Define problem -> Collect data -> Clean -> Analyze -> Feature engineer -> Train -> Evaluate -> Deploy ->
Monitor.
13. Handling Missing Data
- Fill with Mean/Median/Mode or predict with KNN.
14. Example (Handling Missing Data)
import pandas as pd
df = pd.read_csv("data.csv")
df.isnull().sum()
df.fillna(df.mean(), inplace=True)
15. Visualizing Data
Machine Learning - Easy & Complete Notes
- Bar Chart, Pie Chart, Line Plot, Scatter Plot, Heatmap (Use Seaborn).
UNIT 3: SUPERVISED MACHINE LEARNING
1. What is Supervised Learning?
- Learning from labelled data.
- Example: Spam email detection.
2. How it Works
- Train model on inputs and correct outputs.
Example: Input = Shape with 4 equal sides -> Output = Square.
3. Types of Supervised Learning
- Regression (Predict numbers)
- Classification (Predict class/label)
4. Popular Regression Algorithms
- Linear, Polynomial, Ridge, Lasso, Decision Tree, Random Forest Regression.
5. Popular Classification Algorithms
- Logistic Regression, SVM, Decision Tree, Random Forest, KNN, Neural Networks, Naive Bayes.
6. Regression vs Classification
- Regression: Predict numbers (Eg: Salary prediction).
- Classification: Predict class (Eg: Spam detection).
7. Algorithms Quick Summary
- Linear Regression: Predict house price.
- Logistic Regression: Spam detection.
- Decision Tree: Loan approval.
Machine Learning - Easy & Complete Notes
- Random Forest: Disease detection.
- KNN: Movie recommendation.
- SVM: Digit recognition.
- Naive Bayes: Spam filter.
8. Example: KNN
- Find nearest 5 neighbors -> Pick most common class.
9. Example: Decision Tree
- Ask questions -> Split data -> Predict outcome.
10. Example: Naive Bayes
- Predict based on probability.
11. Logistic Regression
- Used for binary classification.
12. Random Forest
- Combines multiple decision trees to improve accuracy.
13. Simple Formulae
- Linear Regression: y = mx + b
- Logistic Regression: Uses sigmoid function (Probability output).