0% found this document useful (0 votes)
20 views3 pages

Machine Learning Framework and Techniques

The document discusses key concepts in machine learning, emphasizing the importance of probability and Bayesian statistics for modeling uncertainty and updating beliefs with new data. It outlines a typical framework for developing machine learning models, covering stages from problem definition to deployment and maintenance. Additionally, it explores feature reduction techniques like PCA and feature selection methods, as well as feature construction strategies to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

Machine Learning Framework and Techniques

The document discusses key concepts in machine learning, emphasizing the importance of probability and Bayesian statistics for modeling uncertainty and updating beliefs with new data. It outlines a typical framework for developing machine learning models, covering stages from problem definition to deployment and maintenance. Additionally, it explores feature reduction techniques like PCA and feature selection methods, as well as feature construction strategies to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning Using Python

Home Assignment 1

Prepared by: Center for Online Education (CDOE)


CO1 – Question 1
Question: Explain the role of probability and Bayesian statistics in machine learning. Provide a brief overview
of key concepts from these areas that are foundational to machine learning algorithms.

Answer:

1. Probability theory forms the mathematical basis for modelling uncertainty in data and predictions. In
machine learning, it quantifies how likely an event or an outcome is to occur given observed evidence.

2. Bayesian statistics extend this foundation by providing a systematic approach to update beliefs or model
parameters as new data become available. This is achieved through Bayes’ theorem, which combines prior
knowledge with observed evidence to compute a posterior probability distribution.

3. Key probabilistic concepts include random variables, probability distributions (such as Gaussian or
Bernoulli), expectation, variance, and conditional probability. These are used in modelling data-generating
processes and in defining likelihood functions for parameter estimation.

4. Bayesian reasoning supports algorithms such as the Naïve Bayes classifier, Bayesian networks, and
probabilistic graphical models, all of which represent uncertainty explicitly.

5. This probabilistic approach allows models to express confidence intervals for predictions, perform robust
inference under noisy or incomplete data, and adapt dynamically as additional observations arrive, making
probability and Bayesian principles indispensable to modern machine learning.

CO1 – Question 2
Question: Outline the typical framework for developing machine learning models. Describe the key stages
involved, from problem definition to deployment and maintenance.

Answer:

1. Problem Definition: The process begins by clearly specifying the objective, determining whether it is a
classification, regression, or clustering problem, and identifying measurable success criteria.

2. Data Collection: Relevant and reliable data are gathered from databases, sensors, APIs, or surveys,
ensuring ethical sourcing and representativeness.

3. Data Pre-processing: The dataset is cleaned, missing values are handled, categorical variables are
encoded, and numerical features are normalized or standardized.

4. Exploratory Data Analysis (EDA): Statistical summaries and visualisations reveal distributions, correlations,
and potential anomalies.

5. Feature Engineering: Domain knowledge is applied to create, transform, or select informative features to
improve predictive capability.

6. Model Selection and Training: Appropriate algorithms (e.g., logistic regression, decision trees, or neural
networks) are trained using training data, often with cross-validation to mitigate overfitting.

7. Evaluation: Models are assessed using metrics such as accuracy, precision, recall, F1-score, or RMSE.

8. Deployment: The chosen model is integrated into production systems for real-time or batch inference.

9. Maintenance: Continuous monitoring detects performance drift; periodic retraining and updates maintain
accuracy as data distributions evolve.

CO2 – Question 1
Question: Explain the goal of feature reduction. Discuss two common feature reduction techniques (e.g.,
Principal Component Analysis (PCA), feature selection methods) and their underlying principles.

Answer:

1. Goal: Feature reduction seeks to decrease the dimensionality of data while retaining the maximum amount
of relevant information. Reducing redundant or noisy attributes simplifies models, shortens training time, and
enhances interpretability.

2. Principal Component Analysis (PCA): PCA is an unsupervised linear-algebra-based method that


transforms correlated features into a smaller number of uncorrelated variables called principal components.
Each component is a linear combination of the original features arranged by the amount of variance it
explains. By retaining only the leading components, dimensionality is reduced with minimal information loss.

3. Feature Selection Methods: These techniques identify and keep only the most informative variables. Filter
methods use statistical criteria such as correlation or mutual information. Wrapper methods evaluate subsets
using model performance metrics. Embedded methods (e.g., LASSO regression) incorporate selection during
training through regularization penalties.

4. Effective feature reduction enhances generalization, prevents overfitting, and often yields faster, more
stable algorithms, particularly beneficial for high-dimensional datasets in image or text analytics.

CO2 – Question 2
Question: Describe the process of feature construction. Provide two examples of how new features can be
derived from existing ones to potentially improve model performance.

Answer:

1. Feature construction involves creating new, informative attributes from existing data to better capture
relationships or patterns that the model may otherwise miss. It combines domain expertise with statistical
insight to enhance model learning.

2. The process typically includes analysing variable interactions, applying mathematical transformations, and
aggregating related variables. Properly engineered features can significantly increase predictive accuracy
without changing the algorithm.

3. Example 1 – Interaction Features: Multiplying or combining two variables can capture nonlinear
relationships, such as combining 'advertising spend' × 'seasonal index' to measure campaign impact under
seasonal effects.

4. Example 2 – Aggregated Features: Creating summary variables, such as average purchase value or total
transactions per customer, provides temporal or behavioural context that improves classification or
forecasting.

5. Additional transformations such as logarithmic scaling, polynomial expansion, or ratio computation can
reveal new patterns. Effective feature construction is iterative: engineered features are evaluated using model
performance metrics and refined continually. This practice often yields greater performance gains than
switching algorithms, highlighting its central role in applied machine learning.

You might also like