R2023 KIT-CBE (An Autonomous Institution)
B23ADT302 FOUNDATIONS OF DATA T P TU C
B.Tech SCIENCE
3 0 0 3
(Common to AI&DS & CSE(AI&ML)
Course Objectives
1. To introduce the basic concepts of Vector spaces in linear algebra.
2. To introduce the basic concepts of Data Science.
3. To understand the mathematical skills in statistics
4. To acquire the skills in data pre-processing steps.
5. To learn the concept of clustering approaches and to visualize the processed data
using visualization techniques.
UNIT – I VECTOR SPACES 9
Vector spaces and subspaces – Linear independence and dependence – Basis and
Dimension - Null spaces, column spaces and Linear transformations - LU
decomposition method - Singular Value Decomposition method.
UNIT- II INTRODUCTION TO DATA SCIENCE 9
Need for Data Science – Benefits and uses – Facets of data – Types of data-
Organization of data- Data Science process- Data Science life cycle- Role of Data
Science- Big Data – sources and characteristics of Big Data
UNIT- III DESCRIBING DATA 9
Frequency distributions – Outliers – Relative frequency distributions – Cumulative
frequency distributions – Frequency distributions for nominal data – Interpreting
distributions – Graphs – Averages – Mode – Median – Mean – Averages for
qualitative and ranked data – Describing variability Tentative– Range – Variance –
Standard deviation – Degrees of freedom – Interquartile range – Variability for
qualitative and ranked data
UNIT IV DATA PREPROCESSING 9
Data pre-processing: Data cleaning - Data integration and Data transformation - Data
Reduction - Data Discretization - Exploratory Data Analysis - Basic tools (plots,
BOS CHAIRMAN
R2023 KIT-CBE (An Autonomous Institution)
graphs and summary statistics) of EDA, Philosophy of EDA - The Data Science
Process
UNIT- V CLUSTERING AND DATA VISUALIZATION 9
Clustering: Choosing distance metrics - Different clustering approaches - Hierarchical
and agglomerative clustering - k-means – Applications – Visual Analytics. -
Visualization with Matplotlib – Line plots – Scatter plots – Visualizing errors – Density
and contour plots– Histograms, Binnings and density – Three dimensional plotting.
Course Outcomes:
Students will be able to
CO1: Understand the concepts of Vector spaces
CO2: Summarize the data science basics and its life cycle.
CO3: Outline the relationship between data dependencies using statistics
CO4: Summarize the data pre-processing methods and implement the EDA
CO5: Build the visualization of data using the visualization tools.
Text Books:
1. David C. Lay, “Linear Algebra and its Applications”, Pearson Education Asia,
New Delhi, 5 th Edition, 2016.
2. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data
Science”, Manning Publications, 2016.
3. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley
Publications, 2017.
Reference Books:
1. Kreyzig E., “Advanced Engineering Mathematics”, 10th Edition, John Wiley
and sons, 2011
2. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly
Media, 2017.
3. Mario Dobler and Tim Großmann, “The Data Visualization Workshop”, O’Reilly
Media, 2020.
4. Jake VanderPlas, “Python Data Science Handbook”, O’Relly,2017.
5. Cathy O'Neil and Rachel Schutt, “Doing Data Science, Straight Talk from The
Frontline”, O'Reilly, 2014.
BOS CHAIRMAN