CampusX Data Science Mentorship Program Curriculum
CampusX Data Science Mentorship Program Curriculum
● Array vs List
● Variable scope
● Deletion of function
● Returning of function
● Advantages of functions
● Lambda functions
● Methods vs Functions
● Class diagram
● Magic/Dunder methods
● Concept of ‘self’
● Reference Variables
● Mutability of Object
● Encapsulation
● Collection of objects
● Constructor example
● Method Overriding
● Super keyword
● Super constructor
● Hybrid Inheritance
● Polymorphism
● Operator Overloading
4. Session on Abstraction
● What is Abstraction?
● Abstract class
● What is open()?
● append()
● Saving a file
● Pickling
● Pickle vs JSON
2. Session 11: Exception Handling
● Syntax Error with Examples
● Raise Exception
● Yield vs Return
● Generator Expression
● Practical Examples
● Benefits of generator
6. Session on Resume Building
7. Session on GUI Development using Python
● GUI development using tkinter
8. Week 4 Interview Questions
Week 5: Numpy
1. Session 13: Numpy Fundamentals
● Numpy Theory
● Numpy array
● Matrix in numpy
● Array operations
● Broadcasting
● Sigmoid in numpy
● Plotting graphs
3. Session 15: Numpy Tricks
● Various numpy functions like sort, append, concatenate, percentile, flip, Set functions,
etc.
4. Session on Web Development using Flask
● What is Flask library
Week 6: Pandas
1. Session 16: Pandas Series
● What is Pandas?
● Series Methods
● Filtering a Dataframe
● Sort, index, reset_index, isnull, dropna, fillna, drop_duplicates, value_counts, apply, etc.
4. Session on API Development using Flask
● What is API?
● Hands-on project
5. Session on Numpy Interview Question
● Practical implementations
3. Session on Streamlit
● Introduction to Streamlit
● Features of Streamlit
● Benefits of Streamlit
● Flask vs Streamlit
● What is VCS/SCM?
● Types of VCS
● Advantages
● Installing git
● Merging branches
● Undoing changes
● Multiindex DataFrames
● Transpose Dataframes
● Swaplevel
● Pandas-melt
2) Session 22: Vectorized String Operations | Datetime in Pandas
● Pivot table
● Agg functions
● Common functions
● Pandas Datetime
3) Session on Pandas Case Study – time Series analysis
4) Session on Pandas Case Study – Working with textual data
● Bar chart
● Histogram
● Pie chart
● Subplots
● 3D plots
● Contour plots
● Heatmaps
● Pandas plot()
3) Session on Plotly (Express)
● About Plotly
● Disadvantages
● Hands-on Plotly
4) Session on Plotly Graph Objects (go)
5) Session on Plotly Dash
● Basic Introduction about Dash
6) Making a COVID-19 dashboard using Plotly and Dash
7) Deploying a Dash app on Heroku
8) Session on Project using Plotly
● Project using Indian Census Data with Geospatial indexing Dataset
● Seaborn roadmap
● Relational plots
● Distribution plots
● KDE plot
● Matrix plot
2) Session 26: Plotting Using Seaborn- Part 2
● Categorical Plots
● Stripplot
● Boxplot
● Violinplot
● Barplot
● Pointplot
● Countplot
● Faceting
● Regression Plots
● Regplot
● Lmplot
● Residual Plot
● FacetGrid
● Blog idea
3) Session on Open-Source Software – Part 1
4) Session on Open-Source Software – Part 1
● Import Data from various sources (CSV, excel, JSON, text, SQL)
● Types of Assessment
● Data Cleaning
3) Session on ETL using AWS RDS
● Introduction about Extraction, transform and Load pipeline
● Tidiness issues
● Data Cleaning
2) Session 29: Exploratory Data Analysis (EDA)
● Introduction to EDA
● Why EDA?
● Univariate Analysis
● Bivariate Analysis
● Feature Engineering
3) Session on Data Cleaning – Part 2
● Data Cleaning on Smartphone Dataset – Continued
4) Session on EDA Case Study – Smartphone Dataset
Week 13: SQL Basics
1) Session 30: Database Fundamentals
● Introduction to Data and Database
● CRUD operations
● Properties of database
● Types of Database
● DBMS
● Keys
● Cardinality of Relationship
● Drawbacks of Database
2) Session 31: SQL DDL Commands
● Xammp Software
● DDL commands
3) Session on Tableau – Olympics Dataset (Part 1)
● Download and Install Tableau
● INSERT
● SELECT
● UPDATE
● DELETE
● Functions in SQL
2) Session 33: SQL Grouping and Sorting
● Sorting Data
● ORDER BY
● GROUP BY
● HAVING clause
● Common filters
● SET operations
● SELF join
● Practice questions
2) Session on SQL Case Study 1 – Zomato Dataset
● Understanding Dataset through diagram
● Types of Subqueries
● Database Engines
● Components of DBMS
● What is Collation?
● COUNT(*) vs COUNT(col)
● DELETE Vs TRUNCATE
● Anti joins
● Non-equi joins
● Natural joins
● Metadata Queries
● LAG(), LEAD()
2) Session 37: Windows Functions Part 2
● Ranking
● Cumulative sum and average
● Running average
● Percent of total
3) Session 37: Windows Functions Part 3
● Percent Change
● Quantiles/Percentiles
● Segmentation
● Cumulative Distribution
● Wildcards
● String Functions
● Data Cleaning
5) Session on EDA using SQL | Laptop Dataset
● EDA on numerical and categorical columns
● Plotting
● Types of Statistics
● Population vs Sample
● Types of Data
● Measure of Dispersion
● Coefficient of variation
● Graphs for Univariate Analysis
● DATETIME Functions
● Datetime Formatting
● Type Conversation
● DATETIME Arithmetic
● TIMESTAMP VS DATETIME
● Boxplots
● Scatterplots
● Covariance
● Correlation
● Correlation vs Causation
● Probability Distributions
● Probability Distribution Functions and its types
● Density Estimation
● Database Normalization
● ER Diagram
● 2D density plots
● Skewness
● QQ plot
● Uniform Distribution
● Log-normal distribution
● Pareto Distribution
● Transformations
i. Mathematical Transformation
ii. Function Transformer
iii. Log Transform
iv. Reciprocal Transform / Square or sqrt Transform
v. Power Transformer
vi. Box-Cox Transform
vii. Yeo-Johnson Transformation
3) Session on views and User Defined Functions in SQL
● What are views?
● Types of views
● Binomial Distribution
i. PDF formula
ii. Graph of PDF
iii. Examples
iv. Criteria
v. Application in Data Science
● Sampling Distribution
● CLT in code
● Case study
● Parameter vs Estimate
● Point Estimate
● Confidence Interval
i. Ways to calculate CI
ii. Applications of CI
iii. Assumptions of z-procedure
iv. Formula and Intuition of z-procedure
v. Interpreting CI
vi. T-procedure and t-distribution
vii. Confidence Intervals in code
● Performing z-test
● Interpreting p-value
● T-test
● Types of t-test
i. Single sample t-Test
ii. Independent 2-sample t-Test
iii. Paired 2 sample t-Test
iv. Code examples of all of the above
3) Session on Chi-square test
● Chi – square distribution (Definition and Properties)
● Chi-square test
● F-distribution
● One-way ANOVA
i. Steps
ii. Geometric Intuition
iii. Assumptions
iv. Python Example
● Post – Hoc test
● Nd tensors
● Vector example in ML
● Euclidean Distance
● Vector Addition/Subtraction
● Dot product
● Angle between 2 vectors
● Equation of a Hyperplane
3) Linear Algebra Part 2 | Matrices (computation)
● What are matrices?
● Types of Matrices
● Matrix Equality
● Scalar Operation
● Transpose of a Matrix
● Determinant
● Minor
● Cofactor
● Adjoint
● Inverse of Matrix
● Linear Transformations
● Linear Transformation in 3D
● Types of ML
i. Supervised Machine Learning
ii. Unsupervised Machine Learning
iii. Semi supervised Machine Learning
iv. Reinforcement Learning
● Batch/Offline Machine Learning
● model-based learning
● Challenges in ML
i. Data collection
ii. Insufficient/Labelled data
iii. Non-representative data
iv. Poor quality data
v. Irrelevant features
vi. Overfitting and Underfitting
vii. Offline learning
viii. Cost
● Machine Learning Development Life-cycle
● Code example
● Regression Metrics
i. MAE
ii. MSE
iii. RMSE
iv. R2 score
v. Adjusted R2 score
3) Session 50: Multiple Linear Regression
● Introduction to Multiple Linear Regression (MLR)
● Code of MLR
● Minimizing error
● Multivariable Functions
● Parameters in a Function
● Loss Function
● Gradient Descent
● Derivative of a constant
● Cheatsheet
● Power Rule
● Sum Rule
● Product Rule
● Quotient Rule
● Chain Rule
● Partial Differentiation
● Matrix Differentiation
● Intuition
● Mathematical Formulation
● Visualization 1
● Effect of Data
2) Session 52 (part 1): Batch Gradient Descent
● Types of Gradient Descent
● Mathematical formulation
● Stochastic GD
● Time comparison
● Visualization
● When to use stochastic GD
● Learning schedules
● Sklearn documentation
4) Session 52 (part 3): Mini-batch Gradient Descent
● Introduction
● Code
● Visualization
5) Doubt Clearance session on Linear Regression
● Inference vs Prediction
● Degree of freedom
● Adjusted R-squared
● T – Statistic
● Standard Error
5) Session 53: Multicollinearity
● What is multicollinearity?
● Multicollinearity (Mathematically)
● Types of multicollinearity
● Correlation
● Condition Number
● Diagram
● Analogy
● Code Example
● What is Regularization?
● Geometric Intuition
● Sklearn Implementation
4) Ridge Regression Part 2
● Ridge Regression for 2D data
● Code example
● Code example
10) Doubt Clearance session on regularization
● Code Example
● How to select K?
● Decision Surface
● Limitations of KNN
2) Session on coding K nearest Neighbors from scratch
3) Session on How to draw Decision Boundary for Classification problems
4) Session on Advanced KNN Part 2
● KNN Regressor
● Hyperparameters
● Weighted KNN
● KD-Tree
5) Classification Metrics Part 1
● Accuracy
● Confusion matrix
● Type 1 error
● Type 2 error
● Recall
● F1 score
● Visualization
● Properties
● Matrix Composition
● Matrix Decomposition
● Eigen decomposition
● Kernel PCA
● What is SVD
● Applications of SVD
● SVD in PCA
Capstone Project:
1. Session 1 on Capstone Project | Data Gathering
a. Project overview in details
b. Gather data for the project
c. Details of the data
2. Session 2 on Capstone Project | Data Cleaning
3. Session 3 on Capstone Project | Feature Engineering
4. Session 4 on Capstone Project | EDA
5. Session 5 on Capstone Project | Outline Detection and Removal
6. Session 6 on Capstone Project | Missing Value Imputation