In an era defined by data, the convergence of mathematics and data science is reshaping
the landscape of research and industry a like. This project, titled "Computational
Mathematics in the Era of Data Science," investigates how fundamental mathematical
principles underpin modern datadriven methodologies. By exploring the theoretical
underpinnings of linear algebra, calculus, probability, and optimization, the project
demonstrates how these concepts are harnessed to solve complex problems across diverse
domains.
Overall, this work emphasizes the critical role of mathematics in driving data science
advancements and provides a framework for future exploration at the intersection of these
disciplines.
1
INDEX
1 Introduction
3.1 Introduction
3.5 Calculus
3.9 Conclusion
2
4 Principal Component Analysis
4.7 Conclusion
5
Mathematical Explanation And Python Coding
For PCA
5.1 Explanation for 2Dimension
6.4 Conclusion
7 Conclusion
3
CHAPTER 1
INTRODUCTION:
In today’s datadriven world, the fusion of mathematics and data science has
revolutionized how we analyze, interpret, and leverage vast amounts of information. The project
titled "Computational Mathematics in the Era of Data Science" delves into the critical role that
mathematical techniques play in transforming raw data into actionable insights. With the rapid
advancement of computational power and the explosion of data availability, traditional
mathematical concepts have found new applications in solving complex problems across diverse
fields such as healthcare, finance, artificial intelligence, and climate science.
At its core, this project explores how computational mathematics underpins key data
science methodologies, enabling more efficient and accurate decisionmaking processes. From
linear algebra and calculus to probability, statistics, and optimization, the mathematical
foundations discussed herein are pivotal for developing algorithms that drive machine learning,
pattern recognition, and predictive modeling.
4
CHAPTER 2
2. 1.Data Science:
Data science is an interdisciplinary field that combines statistical techniques,
mathematical modes and computational methods to analyze and interpret complex data. It aims
to extract meaningful insights pattern and knowledge from structure and unstructured data.
Structured Data:
Structured data refers to data that is organized in a defined manner, typically in rows and
columns making it easy to store query and analyze. It follows a specific schema or format which
allows for easy access and manipulation. This type of data is often found in relational data bases
or spread sheets.
Example:
5
Unstructured data:
Unstructured data refers to data that does not have a predefined format or structure. It is
often text heavy and can come in various formats such as text files, images, audio and vedio.
Unstructured data lacks the organization of structured data which makes it more challenging to
process and analyze directly
Example:
Text document (articles, blogs, social media post Images and
vedio ( Photos, movies, You Tube) Audio ( podcosts, music)
Web page ( HTML, Javascript, content)
Evolution:
The field has evolved from statistics and data mining to a full fledged discipline involving
big data machine learning, artificial Intelligence (AI) and data visulalization the rise of digital
technology and the explosion of data availability have driven the growth of data science.
Importance:
Data Science helps business, governments and scientists make informed decusions by
analyzing large data sets identifying trends, making predictions and optimizing processes.
2.2 .Key Components of Data Science :
Data science is a combination of several core components which include.
Data collection:
Data can come from various sources such as sensors, data bases, social
media surveys and transactional records.
In the modern world large scale data generation has led to the concept of big data
Common techniques include handlings missing data, outlier detection normalization and
encoding categorical variables.
Data Analysis:
Data analysis involves statistical methods, hypothesis testing and Enploratory data
analysis (EDA) to uncover patterns trends and correlations
6
Descriptive Statistics:
Measures of central tendency ( mean, median), dispersion( varience, standard deviation)
and distribution ( skewness, kurtosis)
Inferential Statistics:
Estimating population parameters from sample data testing hypothesis and making
predictions about future trends.
Data Visualization:
Visualization tools and techniques help to communicate findings from the data analysis
effectively.
2.3.Tools and Technologies in Data Science:
Data science requires a set of powerful tools for collecting, processing and analyzing
data some of the commonly used tools include.
Programming language:
Python, R and SQL are the most widely used languages in data science. python is
particularly favored for its rich ecosystem of libraries such as pandas Numpy and SciPy.
Statistical Analysis:
Scipy, Stats models (for Python) and R provide extensive libraries for statistical analysis.
Classification:
Assigning data to predefined categories (e.g., spam vs. not spam email)
Regression:
Predicting continuous values based on historical data (e.g., predicting housing prices)
Clustering:
Grouping similar data points together (e.g., customer segmentation).
7
Recommendation:
Suggesting items to users based on their preferences (e.g., movie recommendations on
Netflix).
Anomaly Detection:
Healthcare:
Predictive modeling for disease outbreaks, patient diagnosis,personalised treatment
plans and medical image analysis.
Finance:
Fraud detection, stock market prediction, credit scoring, risk management and
algorithmic trading.
E-Commerce:
Recommendation systems, inventory management, sales forecasting and personalised
marketing.
Social Media:
Sentiment analysis, trend prediction and social network analysis.
Manufacturing:
Predictive maintenance, process optimization and quality control.
8
2.6. Challenges in Data Science
Data Quality:
Raw data may be noisy, inconsistent or incomplete. Data scientists must ensure the data is
clean and reliable before analysis.
Scalability:
Managing and processing large datasets efficiently.
Interpretability:
Making machine learning models understandable to nonexperts and ensuring their decisions
are transparent and explainable.
Summary of Chapter 1
This chapter introduced the fundamental concepts of data science, emphasizing the
importance of mathematical, computational, and statistical techniques in extracting value from
data. By understanding the key components, tools and techniques used in data science, students
and practitioners can appreciate how data science applies to realworld problems and industries.
9
Chapter 3
3. 1. Introduction
Overview:
Mathematics plays a central role in data science, forming the foundation for algorithms
and models used in data analysis, machine learning, and statistical inference. In this chapter, we
explore the mathematical concepts and techniques essential for data science.
Linear Algebra:
Provides the tools for dealing with data in matrix and vector form, which is central to
machine learning and data manipulation.
Calculus:
Used for optimizing models, particularly in machine learning algorithms, where we need
to minimize or maximize objective functions (e.g., gradient descent).
Optimization:
Mathematical optimization techniques are crucial in finding the best solution for various
problems like classification, regression or clustering.
Discrete Mathematics:
Relevant for graph theory, combinatorics and algorithms that deal with noncontinuous
data (such as network analysis and pathfinding)
10
3.3 Linear Algebra in Data Science
Linear algebra is one of the most fundamental branches of mathematics used in data
science, especially in machine learning, deep learning and data processing.
Applications:
Linear regression, machine learning algorithms like support vector machines (SVMs) and
neural networks.
Example:
Representing data points as vectors in a high dimensional space (e.g., a dataset with 10
features becomes a 10dimensional vector).
Descriptive Statistics:
Includes measures such as mean, median, mode, variance and standard deviation that
summarize the properties of a dataset.
Probability Distributions:
Probability distributions like Normal, Poisson and Binomial are used to model the
randomness and uncertainty inherent in data.
11
Bayesian Inference:
A method of statistical inference in which Bayes' theorem is used to update the probability
estimate for a hypothesis as more evidence or data becomes available.
Hypothesis Testing:
A procedure used to make inferences or decisions about a population based on sample data,
using tests like ttests, chisquare tests and ANOVA.
Regression Analysis:
A statistical method for modeling the relationship between a dependent variable and one or
more independent variables (e.g., linear regression).
Applications:
Predictive modeling, hypothesis testing and risk assessment.
Example:
Using a normal distribution to model the heights of people in a population and
calculating the probability of selecting someone taller than 6 feet.
Differentiation:
In machine learning algorithms like gradient descent, differentiation helps in adjusting
model parameters to minimize the loss function. The gradient represents the direction of the
steepest increase and we move against it to minimize the function.
Integration:
Helps in calculating areas under curves, which is essential in probability theory and in
determining certain aspects of data distributions.
Optimization:
Finding minima or maxima of functions (e.g., the least squares method in regression,
minimizing error in classification).
12
Example:
Using the derivative of the loss function to adjust weights in a neural network during
training.
Convex Optimization:
A subset of optimization where the function being optimized is convex (i.e., it has a
single global minimum).
Gradient Descent:
An iterative optimization technique used to minimize a loss function in many machine
learning algorithms. It adjusts parameters (such as weights in a neural network) based on the
gradient of the function.
Example:
Using gradient descent to minimize the loss function in a logistic regression model to
predict binary outcomes.
13
Combinatorics:
The study of counting, arrangement and combination of objects. It is used in algorithm
analysis, data structure design, and optimization problems.
Example:
Analyzing social media networks using graph theory to identify influencers (nodes) and
their connections (edges).
Supervised Learning:
Involves learning from labeled data. Mathematics is used to compute the best fit line (in
linear regression) or decision boundary (in SVMs).
Unsupervised Learning:
Involves finding hidden patterns or structures in data. Concepts like clustering and
dimensionality reduction (e.g., PCA) rely on mathematical foundations.
Deep Learning:
Involves complex mathematical computations, especially in neural networks, where
calculus and linear algebra are used to adjust the weights and biases.
3.9. Conclusion
In this chapter, we have explored the essential mathematical tools and concepts used in
data science. Linear algebra, probability, statistics, calculus and optimization are foundational to
the development and understanding of data science algorithms and models. A strong grasp of
these mathematical principles is essential for anyone wishing to excel in data science and its
applications.
14
Chapter 4
Principal Component Analysis (PCA)
4. 1. Introduction to PCA
Overview:
Principal Component Analysis (PCA) is a statistical technique used for
dimensionality reduction while preserving as much variance (information) as possible.
It is widely used in data science for reducing the complexity of high dimensional data.
Purpose of PCA:
PCA helps to transform a large set of variables into a smaller one, called
principal components, without losing significant information. This is especially useful
in machine learning for improving model performance, reducing overfitting and
speeding up computation.
where X is the original data, 𝜇 is the mean of the feature and 𝜎 is the standard
deviation.
C= 1 ∑𝑛 (𝑥 𝑋 −)(𝑋 − 𝑋)𝑇
𝑛−1 𝑖=1 𝑖−𝑖
15
where 𝑥𝑖 is a data point and X bar is the mean of the data.
The data is projected onto these principal components to reduce the dimensionality while
retaining as much variance as possible
16
X Y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
(This image is just a placeholder; you'll need to use actual plotting software like Python’s matplotlib to
generate the plot in your project).
2.Standardized Data:
After standardization, each feature is rescaled to have a mean of 0 and a standard
deviation of 1.
3.Covariance Matrix:
The covariance matrix indicates the relationship between the two features, showing if
there’s a linear relationship between them. PCA seeks to find directions that explain the
maximum variance, which in this case might be a line or vector that best captures the data's
spread.
X Y
X 1.0 0.9
Y 0.9 1.0
17
5.Projecting Data onto Principal Components:
By projecting the data onto the principal component (the eigenvector with the largest
eigenvalue), we can reduce the dimensionality of the data from 2D to 1D while retaining as
much of the variance as possible.
Projected Data:
The data points are now projected onto the principal component (a single line in the case
of 2D data)
Image Compression:
Reducing the size of images while retaining key features.
Face Recognition:
Using PCA to extract key facial features and reduce the number of dimensions.
Speech Recognition:
Reducing dimensionality in audio data for more efficient processing.
Data Visualization:
Visualizing high dimensional data in lower dimensions (e.g., 2D or 3D) for easier
interpretation.
Genetics:
Analyzing gene expression data and reducing noise for clearer insights.
Advantages:
Dimensionality Reduction:
PCA reduces the number of variables while preserving essential
information.
Noise Reduction:
By focusing on the principal components with the highest variance, PCA helps to
18
reduce noise in the data.
Improved Computation:
Reduced dimensions often lead to faster processing times in machine learning
models.
Disadvantages:
Interpretability:
The transformed data (principal components) may be harder to interpret because
they are linear combinations of the original features.
Assumption of Linearity:
PCA assumes that the data is linearly related, which may not be the case in all
scenarios.
Loss of Information:
Although PCA aims to preserve variance, some information is inevitably lost when
reducing dimensions.
4.7. Conclusion
Principal Component Analysis (PCA) is a powerful technique for dimensionality
reduction, enabling the extraction of key features from complex datasets. Its ability to reduce
data while retaining essential information makes it an invaluable tool in data science, particularly
in machine learning, pattern recognition and data visualization. However, it is important to
understand its limitations and the assumptions behind the method.
19
Chapter 5
Mathematical Explanation and python coding for PCA
Student A 85 78
Student B 90 88
Student C 75 70
Student D 95 92
Each student is represented by a 2D point (x,y) where x is the Math mark and y is the
English mark.
𝜇 78+88+70+92 = 82
𝐸𝑛𝑔𝑙𝑖𝑠ℎ =
4
StudentA:
(85−86.25, 78−82)=(−1.25, −4)
StudentB:
(90−86.25, 88−82)=(3.75, 6)
StudentC:
(75−86.25, 70−82)=(−11.25, −12)
StudentD:
(95−86.25, 92−82)=(8.75, 10)
S= 1
∑𝑛𝑖=1 𝑥𝑖 𝑥𝑇.
𝑛−1
(−1.25)2+(3.75)2+(−11.25)2+(8.75)2=1.5625+14.0625+126.5625+76.562
=218.75
(−4)2+(6)2+(−12)2+(10)2=16+36+144+100
=296
Thus,
𝜎2 =296 ≈ 98.67.
𝐸𝑛𝑔𝑙𝑖𝑠ℎ 3
Dividing by 3:
171.58±√(171.58)2−4×250
λ= .
2
The discriminant is
So
√28439.5 ≈168.66
Thus, the eigenvalues are approximately:
171.58+168.66 171.58−168.66
𝜆1≈ ≈170.12, 𝜆 2≈ ≈1.46.
2 2
21
Eigenvectors:
det(S−λI)=0.
That is,
‖𝑣‖=√(1)2+(1.1665)2≈ √1 + 1.3606≈1.5365.
The second eigenvector (for λ2) will be orthogonal; for instance, it may be approximately
v2 ≈ (0.7588, −0.6506).
22
For each centered data point x, the coordinate along PC1 is given by the dot product
t = 𝑣𝑇1x.
This projection gives the new coordinate along the principal axis that explains
most of the variance in marks.
import numpy as np
# Data: rows represent students; columns represent marks in Math and English.
# Students: A, B, C, D
X = np.array([ [85, 78],
[90, 88],
[75, 70],
[95, 92]
])
X mean_marks
np.cov(X_centered.T, bias=False)
print("Eigenvectors:\n", eigenvectors)
23
# Sort the eigenvalues (and corresponding eigenvectors) in descending order. idx =
eigenvalues.argsort()[::1]
eigenvalues)
PC1 = eigenvectors[:, 0]
plt.grid(True)
24
plt.axis('equal') plt.show()
OUTPUT:
1.Data Preparation:
The marks are stored in a NumPy array where each row is a student’s marks in Math and
English.
2.Centering:
The mean of each column is subtracted so that the data is centered at the origin.
3.Covariance Matrix:
We compute the covariance matrix using the centered data.
4.Eigen Decomposition:
We use NumPy’s np.linalg.eig to obtain the eigenvalues and eigenvectors of the
covariance matrix. We then sort them in descending order of eigenvalue magnitude
25
5.Projection:
The centered data is projected onto the principal components by computing the dot product
with the eigenvectors.
6.Visualization:
The original centered data is plotted along with the PC1 and PC2 axes to show the
direction of maximum variance.
5.2 Mathematical Explanation for 3D .
75+85+65+80
𝜇𝑆𝑜𝑐𝑖𝑎𝑙 = = 76.25
4
26
For Tamil and Social:
(−1.25)(−1.25)+(8.75)(8.75)+(−11.25)(−11.25)+(3.75)(3.75)
Cov(Tamil,Social)=
3
≈ 72.92
Note: Because the first and third subjects (Tamil and Social) have identical centered deviations,
the matrix is singular (one eigenvalue will be nearly zero), indicating redundancy between those
subjects.
λI) = 0,
To obtain eigenvalues λ1, λ2, λ3 and corresponding eigenvectors v1, v2, v3.
Without going through all algebraic details (which involve solving a cubic equation), one
typically finds:
A much smaller eigenvalue (e.g. λ2 ≈ 16.3) and a zero (or nearly zero) eigenvalue λ3 ≈ 0
reflecting the redundancy between Tamil and Social.
The first principal component (PC1) captures the maximum variance (here, nearly all the
variation), while the remaining components capture little additional information.
Step 4. Projection onto Principal Components
Each centered data point xc is projected onto the new axes by computing
t = v𝖳 xc.
27
t(1) = 𝑣𝑇𝑥𝑐
1
Repeating this for each data point gives the transformed (or score) coordinates in the PCA
space.
import numpy as np
X = np.array([
])
X_centered = X mean_marks
28
print("Mean Marks:", mean_marks) print("Centered
Data:\n", X_centered)
# We use np.cov with rowvar=False (or transpose) because each column is a variable.
Matrix:\n", cov_matrix)
numpy as np
X = np.random.rand(100, 5)
# This scales each feature to have zero mean and unit variance. scaler =
StandardScaler()
X_std = scaler.fit_transform(X)
# You can set n_components to any number <= p, or a fraction of explained variance.
PCA(n_components=n_components)
T = pca.fit_transform(X_std)
29
print("Explained Variance Ratio:", pca.explained_variance_ratio_) print("Principal
T.shape)
Components')
plt.show()
OUTPUT:
30
Mathematical Explanation:
Standardize each variable to have a mean of zero and a standard deviation of one. This
ensures that each variable contributes equally to the analysis.
3.Eigen Decomposition:
Perform eigen decomposition on the covariance matrix Σ to obtain eigenvalues and
eigenvectors.
The eigenvalues indicate the amount of variance captured by each principal component,
while the eigenvectors represent the directions of these components in the feature space.
31
Explanation of the Python Code
3. Eigen Decomposition:
We extract eigenvalues and eigenvectors with np.linalg.eig and sort them so that the first
principal component corresponds to the largest eigenvalue.
4. Projection:
The centered data is projected onto the principal components by taking the dot product
with the eigenvector matrix.
5. Visualization:
A 3D scatter plot displays the centered data points. The principal axes (PC1, PC2, PC3)
are overlaid as arrows (scaled for visibility) originating at the origin.
Summary
Mathematically:
The process starts by centering the 3dimensional student marks data. The covariance
matrix is computed from the centered data, and its eigen decomposition yields the principal
components. The eigenvector with the highest eigenvalue (PC1) points in the direction of
greatest variance, while the other eigenvectors capture the remaining (often redundant)
variance
In Python:
The provided code shows how to compute the mean, covariance, eigenvalues/eigenvectors,
and finally project the data into the new PCA space. A 3D plot helps visualize both the data
and the principal directions.
32
Mathematical Explanation:
Standardize each variable to have a mean of zero and a standard deviation of one. This
ensures that each variable contributes equally to the analysis.
33
instantiating
cancer = load_breast_cancer(as_frame=True) #
creating dataframe
df = cancer.frame #
checking shape
print('Original Dataframe shape :',df.shape) # Input
features
X = df[cancer['feature_names']] print('Inputs
Dataframe shape :', X.shape) #
Mean
X_mean = X.mean() #
Standard deviation
X_std = X.std()
# Standardization
Z = (X - X_mean) / X_std #
covariance
c = Z.cov()
34
OUTPUT
Explanation:
Data Standardization:
Standardscalar is used to standardize the dataset so that each feature Has a
PCA Transformation:
The PCA class from scikitlearn is utilized to perform PCA. The ncomponents parameter
specifies the number of principle components to retain. The fit_transform method computers the
principal components and Returns the transformed data.
Output:
Explained_varience_ratio_
Components_ contains the principal axes in the features space, representing the direction of
maximum varience.
This implementation can be extended to datasets with n_components retain the desired number
of principal components. It's important to note that while PCA reduces dimensionality, it may
also lead to some loss of information. Therefore, it's crucial to balance dimensionality reduction
with the amount of variance retained in the data
35
Chapter 6
Application and Future Directions
6. 1. 1 Healthcare
Medical Imaging: PCA is widely used to enhance medical images such as MRI scans by
reducing noise while preserving crucial features.
6. 1.2 Finance
Risk Assessment: Statistical models, Monte Carlo simulations and optimization techniques are
used to assess financial risks.
Fraud Detection: Machine learning algorithms based on probability and statistics detect
fraudulent transactions by analyzing transaction patterns.
Portfolio Optimization: Techniques such as convex optimization and linear programming help
investors allocate resources efficiently to maximize returns.
36
6. 1.4 Climate Science and Environmental Monitoring
Climate Prediction: Computational mathematics helps in modeling climate change using
largescale simulations and PCA based data reduction techniques.
Disaster Forecasting: Datadriven models predict natural disasters like hurricanes and
earthquakes, helping mitigate their impact.
37
6.2.4 Ethical AI and Bias Reduction
Mathematical frameworks will be developed to minimize bias in AI models, ensuring
fairness in decisionmaking processes such as hiring, credit scoring and law enforcement
applications.
38
6.4 Conclusion
Computational mathematics will continue to be a driving force in data science, enabling
new technological breakthroughs across various industries. The integration of advanced
mathematical techniques with modern computational frameworks will pave the way for a more
efficient, transparent and intelligent future.
In this project, we have explored the critical intersection between mathematics and data
science, demonstrating how computational techniques are essential in extracting meaningful
insights from vast datasets. The study has highlighted the role of fundamental mathematical
concepts—ranging from linear algebra and calculus to probability and optimization—in driving
innovations in data science. A focal point of the research was Principal Component Analysis
(PCA), which was examined both from a theoretical perspective and through practical
applications.
39
7. CONCLUSION:
By delving into the mathematical foundations and computational processes behind PCA,
the project showcased how dimensionality reduction can simplify complex datasets without
significant loss of information. This not only enhances computational efficiency but also
improves the interpretability of data, which is crucial in various realworld applications such as
healthcare, finance, and artificial intelligence.
Overall, this work underscores the indispensable role of mathematics in advancing data science
methodologies and sets the stage for future research. The integration of advanced mathematical
techniques with emerging computational frameworks is not only enhancing current applications
but is also paving the way for new innovations that will continue to shape our datadriven world.
40
REFERENCES:
2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical
Learning.
A comprehensive resource on statistical learning methods that are critical in data science.
5. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical
Learning.
An accessible guide to statistical modeling and inference, with applications in data science.
6. Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine
Learning.
Focuses on the mathematical tools essential for machine learning and data analysis.
41
42