Assignment Title:
Mastering Learning with T, P, and E: Developing a Gender Classification Model from Names
Total Points: 100
Assignment Overview
In this assignment, students will build and refine a machine learning model capable of
predicting the gender of a person based solely on their given name. The objective is to
understand the concepts of learning in terms of Training (T), Prediction (P), and Evaluation
(E). Students will develop, train, test, and improve a machine learning model using data
provided for gender classification. This assignment will also help students grasp the
fundamentals of machine learning, including data preprocessing, feature extraction, model
training, testing, and re-training to improve performance.
Learning Objectives
- Gain practical experience in building a machine learning model for classification tasks.
- Understand the process of feature engineering and data preprocessing for textual data.
- Develop a systematic approach for training and testing models to improve their predictive
performance.
- Practice using training, testing, and additional data sets to enhance model accuracy.
- Evaluate model performance using appropriate metrics.
---
Assignment Tasks and Deliverables
Task 1: Understanding and Pre-processing the Data (15 Points)
- Description: You will be provided with a dataset containing a list of names along with their
corresponding gender labels (Male/Female).
- Steps:
- Load the data and perform an initial exploration.
- Clean the data by removing duplicates, handling missing values (if any), and converting all
names to a consistent format (e.g., lowercasing).
- Consider feature extraction approaches (e.g., length of the name, first/last letter analysis,
n-grams, etc.) to convert names into a suitable format for machine learning.
Task 2: Splitting Data and Building an Initial Model (15 Points)
- Description: Split the data into a training set (80%) and a testing set (20%) using a random
split.
- Steps:
- Train a baseline model (e.g., using Logistic Regression, Decision Trees, or any simple
classifier).
- Evaluate the model's performance using metrics such as accuracy, precision, recall, and
F1-score on the testing set.
- Deliverable: A report of your initial model's performance, including details of feature
engineering choices and evaluation metrics.
Task 3: Model Improvement - Training on Additional Data (20 Points)
Dr Lokhande, Osmania University, Hyd. Email : SURESH.L@[Link] 1
Assignment Title:
Mastering Learning with T, P, and E: Developing a Gender Classification Model from Names
Total Points: 100
- Description: Additional labeled data will be provided to simulate real-world scenarios
where more data becomes available to enhance model accuracy.
- Steps:
- Integrate the additional data with the original training set.
- Retrain the model with the combined data.
- Evaluate the new model's performance on the testing set.
- Compare and document any improvements observed compared to the baseline model.
- Deliverable: Detailed documentation of the integration process, model retraining steps,
and a comparison of performance metrics before and after including the additional data.
Task 4: Hyperparameter Tuning and Model Optimization (15 Points)
- Description: Optimize your model by tuning hyperparameters and experimenting with
different algorithms or feature engineering techniques.
- Steps:
- Use methods such as Grid Search or Random Search to identify the optimal
hyperparameters.
- Experiment with at least one additional machine learning algorithm (e.g., Support Vector
Machines, Random Forests).
- Evaluate the new model’s performance using the testing set.
- Deliverable: A summary of hyperparameter tuning, choice of algorithms, and a comparison
of the models' performance.
Task 5: Final Model Evaluation and Reporting (15 Points)
- Description: Provide a comprehensive evaluation of your final model, including its
strengths, limitations, and potential areas for improvement.
- Steps:
- Perform cross-validation on the final model.
- Discuss the implications of overfitting/underfitting observed during the process.
- Reflect on how additional data improved or did not improve the model's accuracy.
- Deliverable: A detailed report (2-3 pages) summarizing the final model's performance,
insights gained during the process, and a reflection on the training, prediction, and evaluation
cycle (T, P, and E).
---
Assignment Submission Guidelines
- All code and analysis should be submitted as a Jupyter Notebook or Python script file
(.ipynb or .py).
- Include a PDF report summarizing your results, model insights, and reflections.
- Submission Deadline: 30 Nov 2024 by 5pm
- Total Points: 100
---
Dr Lokhande, Osmania University, Hyd. Email : SURESH.L@[Link] 2
Assignment Title:
Mastering Learning with T, P, and E: Developing a Gender Classification Model from Names
Total Points: 100
Grading Rubric
- Task 1: Data Preprocessing (15 Points)
- Data cleaning and formatting: 5 points
- Feature engineering: 10 points
- Task 2: Initial Model Building (15 Points)
- Splitting data correctly: 5 points
- Model training and evaluation: 10 points
- Task 3: Training with Additional Data (20 Points)
- Data integration: 5 points
- Retraining and evaluation: 15 points
- Task 4: Hyperparameter Tuning and Optimization (15 Points)
- Hyperparameter tuning methods: 7 points
- Model experimentation: 8 points
- Task 5: Final Evaluation (15 Points)
- Model evaluation and reporting: 10 points
- Reflection on learning cycle (T, P, E): 5 points
- Presentation Component (20 Points)
- Clarity and Depth of Explanation: 5 Points
- Understanding of Concepts and Approach: 5 Points
- Originality and Independent Effort Demonstrated: 5 Points
- Visual and Verbal Communication Quality: 5 Points
Dr Lokhande, Osmania University, Hyd. Email : SURESH.L@[Link] 3