Fill in the blanks
1. ______ is the field of study that enables computers to learn from data.
Answer: Machine Learning
2. In ______ learning, data is labeled.
Answer: Supervised
3. In ______ learning, the model explores and learns from rewards.
Answer: Reinforcement
4. ______ occurs when a model performs well on training data but poorly on test data.
Answer: Overfitting
5. ______ is the process of converting raw data into features for ML models.
Answer: Feature Engineering
6. ______ values can be filled using mean, median, or mode.
Answer: Missing
7. One-hot encoding is used to handle ______ data.
Answer: Categorical
8. Log transformation is only applicable to ______ values.
Answer: Positive
9. ______ is the process of grouping continuous values into intervals.
Answer: Binning
10. ______ and variance are two sources of error in ML models.
Answer: Bias
11. The process of analyzing datasets using statistics and visualizations is called ______.
Answer: Exploratory Data Analysis (EDA)
12. ______ is used to create a standard for comparing model performance.
Answer: Benchmarking
13. The process of converting complex features into simpler parts is called ______.
Answer: Feature Split
14. ______ is used to fill missing numerical values with mean or median.
Answer: Imputation
15. ______ values are those that deviate significantly from the rest of the data.
Answer: Outlier
16. One-hot encoding creates ______ columns for each category.
Answer: Binary
17. PCA stands for ______.
Answer: Principal Component Analysis
18. Handling missing values and outliers falls under ______.
Answer: Feature Engineering
19. Grouping numerical data into categories is called ______.
Answer: Binning
20. ______ is the first step in the feature engineering process.
Answer: Data Preparation
Match the columns
Match the items in Column A with the correct items in Column B:
Column A Column B
1. Supervised Learning A. No labeled output
2. Unsupervised Learning B. Rewards and penalties
3. Reinforcement Learning C. Labeled data
4. Feature Engineering D. Converts raw data to features
5. Overfitting E. Performs poorly on new data
Correct Answers: 1→C, 2→A, 3→B, 4→D, 5→E
Column A Column B
1. Imputation A. Fill missing values
2. PCA B. Feature Extraction
3. Binning C. Group values into intervals
4. Z-score D. Detect Outliers
5. EDA E. Analyze data visually/statistically
Correct Answers: 1→A, 2→B, 3→C, 4→D, 5→E
1- mark questions
1. What is a hyperparameter?
Answer: A setting configured before training that influences learning.
2. What is a validation set?
Answer: A dataset used to tune hyperparameters and check performance.
3. Define bias in ML.
Answer: Error from overly simple assumptions in the model.
4. What is variance in ML?
Answer: Error due to sensitivity to small data fluctuations.
5. What is imputation?
Answer: Filling missing values with suitable replacements.
6. What is the purpose of log transformation?
Answer: To normalize skewed data and reduce outlier effects.
7. What does one-hot encoding do?
Answer: Converts categorical data into binary format.
8. Define outliers.
Answer: Data points significantly distant from others.
9. What is feature selection?
Answer: Picking only the most relevant features for training.
10. What is the goal of feature engineering?
Answer: Improve model accuracy and performance.
11. What is Exploratory Data Analysis (EDA)?
Answer: It involves analyzing and summarizing datasets using statistical and visualization
techniques.
12. What is Benchmarking in ML?
Answer: Setting a baseline for model accuracy to compare future improvements.
13. What is Feature Split?
Answer: Dividing a complex feature into simpler parts to improve learning.
14. Why do we use Binning in ML?
Answer: To reduce noise and overfitting by grouping values into intervals.
15. How are Outliers identified?
Answer: Using Z-score or standard deviation methods.
16. Give an example of Feature Extraction technique.
Answer: Principal Component Analysis (PCA).
17. State one benefit of using Feature Engineering.
Answer: Improves model accuracy and performance.
18. What is the role of Data Preparation?
Answer: To clean and convert raw data into usable format for ML models.
19. What happens if irrelevant features are not removed?
Answer: Model performance degrades and overfitting may occur.
20. Which technique is used for handling categorical variables?
Answer: One-hot encoding.
2-Mark Questions
1. Explain the difference between Supervised and Unsupervised Learning.
Answer: Supervised learning uses labeled data to predict outcomes, while unsupervised
learning finds hidden patterns in unlabeled data.
2. Explain the steps involved in the machine learning process.
Answer: Source → Feature Extraction → Feature Correlation → Transformation → Model
Training → Ensemble → Evaluation → Handle Overfitting/Underfitting.
3. What is the purpose of using Hyperparameters and a Validation Set?
Answer: Hyperparameters guide training; validation sets evaluate model during training
to prevent overfitting.
4. Explain the Bias-Variance Tradeoff.
Answer: High bias leads to underfitting; high variance leads to overfitting. A balance gives
optimal performance.
5. List the main steps in Feature Engineering.
Answer: Data Preparation, Exploratory Data Analysis, Benchmarking.
6. What is Feature Extraction and why is it used?
Answer: It creates new features from raw data to reduce volume and complexity while
retaining essential information.
7. Compare Feature Selection and Feature Extraction.
Answer: Selection chooses relevant features from existing ones; extraction creates new
ones from raw data.
8. Explain the benefit of using better features in ML.
Answer: Better features lead to simpler models, better accuracy, and improved
generalization.
9. Describe the technique of Handling Outliers.
Answer: Outliers are identified using z-score or standard deviation and removed to
improve model accuracy.
10. What are the benefits of Feature Selection?
Answer: Reduces dimensionality, simplifies model, improves accuracy, reduces training
time and overfitting.
11. What are the key steps in Data Preparation?
Answer: Cleaning, augmentation, ingestion, fusion, and formatting raw data for ML use.
12. Explain One-hot Encoding with an example.
Answer: It converts categories into binary columns. Example: Red, Green → [1,0], [0,1].
13. Why is Feature Selection important?
Answer: It simplifies the model, improves accuracy, reduces training time, and avoids
overfitting.
14. Compare Data Cleaning and Feature Engineering.
Answer: Data cleaning removes errors; feature engineering creates useful variables from
raw data.
15. List any two Feature Engineering Techniques and their use.
Answer: Imputation – fills missing data; Binning – reduces noise by grouping values.
16. Describe the importance of EDA in feature engineering.
Answer: Helps understand data patterns and relationships to select relevant features.
17. What does Log Transformation help with?
Answer: It reduces skewness and impact of outliers, normalizing data distribution.
18. How does Feature Engineering reduce overfitting?
Answer: By selecting only meaningful features and removing noise, it helps generalization.
19. Why is Feature Extraction critical for high-dimensional data?
Answer: It reduces data size and complexity while keeping important information.
20. Explain how categorical variables affect model performance.
Answer: They must be encoded properly; else, they may confuse the model and reduce
accuracy.