0% found this document useful (0 votes)

39 views35 pages

Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us

Based on data science

Uploaded by

daysgonecoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views35 pages

Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us

Based on data science

Uploaded by

daysgonecoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Data science is an amalgamation of different scientific methods, algorithms and systems which

enable us to gain insights and derive knowledge from data in various forms. Various organizations
like Google, Facebook, Uber, Netflix, etc. are already leveraging data science to provide better
experiences to their end users.

Although data science techniques have been conceptualized and in use for several decades now, the
current demand for data science is fueled by the high availability of digital data, and resources for
computation.

This course serves as an introduction to various Data Science concepts such as Probability,
Statistics, Linear Algebra, Machine Learning and Computer Science. At the end of this course, you
should be familiar with the key ideas behind these concepts.

By the end of this course, you should be able

to
● Understand what is data science
● Recognize why data science is gaining importance in today’s business world
● Comprehend where data science can be applied in different scenarios across industry
domains
● Understand major components of data science stack
● Learn how a data science project is implemented step-by-step in a given business use-case

What else can we probe with data?

In each of the scenarios discussed till now, we see that a variety of meaningful inferences can be
drawn by probing the data available to us.

In order to derive such interesting

inferences, we need to begin by asking a
number of questions, like the ones given
below.
● What kind of problem needs to be solved? Do the documents need to be classified into
predefined categories or clustered?
● Can we quantify the calibre of the data being considered for analysis? Is its quality up to the
mark?
● Is the data ready for analysis? Or are any transformations required?
● Is the available data sufficient to solve the given problem?
● What are the potential sources of the data?
● How many observations are there in the data?
● How many attributes/features are there in the data?
● Are there any missing values in the data?
● Is there any correlation between the different features of the data?
● Can any pattern be identified in the set of features?
● Which type of analysis is required: Descriptive / Predictive / Prescriptive?
● Does the result need optimization?
● Which tool may be used during data orientation and alignment?

Components of Data Science

Following are the various components of data science which act like tools to enable a data scientist
to draw meaningful insights from data.

In addition to these we must acquire knowledge about the domain or industry vertical in which we
plan to apply Data Science, such as retail, banking & finance, healthcare, e-commerce, life sciences,
telecom etc.

Let us now explore each component of Data Science.

What is Probability?
Probability is a mathematical subject which enables us in determining or predicting how likely it is
that an event will happen. The probability of occurrence is assigned a value from 0 to 1. When the
value assigned is 1, it implies that the event will happen with all certainty. On the other hand when it
is 0, it implies that the event is not likely to take place. Thus, we can be more certain of an event's
occurrence when its probability is higher.

What is Statistics?
Statistics is another mathematical subject which deals primarily with data. It helps us draw
inferences from data by having procedures in place for collecting, classifying and presenting the
data in an organized manner. The analysis and interpretation of the refined data helps in providing
further insights.

Role of probability, statistics and

computation in data science
When studying and exploring an event, we make use of probability to quantify how likely it is that an
event will occur. On the other hand, we use statistics to observe patterns in data samples to draw
inferences about a population. We must note that statistics is not completely independent of
probability, as statistical analysis involves probability distributions.

Since, both statistics and probability have their roots in mathematics, computation as a tool is
needed to perform quantitative analysis. The use of computers is also necessary to perform
complex calculations while processing the statistical data.

What is Linear Algebra?

Linear Algebra is a mathematical subject the deals with the theory of systems of linear equations,
matrices, vector spaces and linear transformations.

Why Linear Algebra?

● Linear Algebra is critically used in almost all peripheries of science, practically solving most
of the problems using linear models.
● Most of the complex science problems are converted into problems of vectors and matrices
and then solved using linear models.
● In the world of data (especially, big data), linear algebra can be very handy to process huge
chunks of data to accomplish many practical transformations such as graphical
transformations, face morphing, object detection and tracking, audio and image
compression, edge detection, blurring, and signal processing.

Role of linear algebra in data science

While solving a given business problem, an appropriate statistical computing technique may be
used. These algorithms while working on the data, may either use iterative methods or linear algebra
techniques for computation.

Linear Algebra works as a computational engine for most of the data science problems because of
its performance advantages over iterative methods. Let us discuss a simple example, to understand
the difference between the two methods.

Iterative methods vs Linear Algebra

techniques
Say we need to find the Frobenius norm of a matrix(mat1). We can do this using either the iterative
method or the linear algebra technique, as shown below.
Example 1: Message Transmission

Problem statement
● We need to transmit a message over the network: “PREPARE to NEGOTIATE”.
● When transmitting, we need to encrypt the message and at the receiving end, we need to
decrypt the message.
● To encrypt and decrypt, we need to use a confidential piece of information, usually referred
to as a key.
● The prime objective is to ensure confidentiality and privacy of data during transmission.
Solution
Step 1: The message is encrypted by assigning a number for each letter in the message. Thus, the
message becomes:

Step 2: The message is split into a sequence of 3x1 vectors:

Step 3: A 3 x 3 encoding matrix is used to encrypt the message vectors:

Step 4: At the receiving end, the message is decrypted by multiplying this matrix with the inverse of
the encoding matrix. The inverse of the encoding matrix is:
Step 5: After multiplication, we will get back the original enumerated matrix. The original message
can now be decoded from this matrix.

Example 2: Solving an electrical network

Problem statement
Currents I1, I2 and I3 need to be determined for the following electrical network:
Solution
Step 1: The equations for current are written based on Kirchhoff’s Law.

Step 2: These equations are converted into a matrix.

Step 3: The matrix is solved to get the values of the currents.

Example 3: Finding relationships on a social

networking site
Problem statement
Five visitors of a social networking site are linked with each other as depicted by the directed graph
G below:

How can we use these relationships to extract more information about them and predict their
proposed activities?

Solution
Step 1: These relationships can be converted into a relationship chart in which “1” indicates related
and “0” indicates unrelated:
Step 2: From the chart created in the previous step, the adjacency matrix for the directed graph is:

Step 3: The adjacency matrix may be used as a data structure for representing graphs in computer
programs for manipulation.

Linear Algebra: Summary

Linear Algebra is fun to learn! If not convinced yet, here are a few more reasons:

● Linear Algebra makes scientific computing easy as most complex equations can be
converted into linear equations with help of vectors and matrices, where we can view vectors
as single dimensional matrices.
● Linear Algebra helps represent large sets of data as matrices enabling us to better visualize
the given data.
● All the operations/processes performed on matrices are batch processes. This means, we
can process millions of data points simultaneously instead of processing each data point
individually.

What is machine learning?

In 1959, Arthur Samuel defined machine learning as "A field of study that gives computers the ability
to learn without being explicitly programmed".

How can a machine learn?

How can a machine behave like an intelligent entity? How can it learn and make decisions? These
questions can be answered with the following definitions of machine learning.

● Machine Learning is the field of scientific study that concentrates on induction algorithms
and on other algorithms that can be said to "learn". (Ref. Stanford glossary of terms)
● A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P if its performance at tasks in T, as measured by P, improves
with experience E. (Ref. Tom M. Mitchell)

Why make a machine learn?

Machine learning becomes essential when:

● Analysis on a given data by a human being has huge associated cost, time and effort. Note
that we are talking about data that is huge in volume, with a lot of variety and coming with
high velocity.
● Human intervention is not sustainable (e.g. If we want to navigate on Mars and we don’t have
the expertise available, we can make a machine learn and let it navigate on unknown territory
without any human intervention).
● Human expertise cannot always be explained (e.g. speech recognition, image processing).
● A solution needs to be adapted to a particular case (e.g. user biometrics).

Machine beats Man

IBM created a program that could play chess, called Deep Blue. The chess grand master, Gary
Kasparov competed with Deep Blue in 1996 and 1997 respectively in 2 games of chess. The results
of both the games are as shown below.
As we can observe from the results, Gary lost to the computer. This was the first time a machine
beat a champion in his own game!

Digital recognition of handwritten notes

Mail needs to be sorted in the absence of human intervention. Addresses are written on mail in
different hand writings. These handwritten notes can be recognized digitally by machines.
Face Recognition
A person needs to be recognized based on their facial features. A machine is trained using a set of
pictures. It can then recognize a person.

Types of machine learning

Supervised machine learning model:
Learning phase
A machine is taught to identify various fruits by building a model with the help of images.
Supervised machine learning model: Testing
phase
A new set of images is given to this model as test data so that it can classify different fruits.
Supervised machine learning model:
Evaluation phase
We should always evaluate a model to know whether it will do a good job of predicting the target,
given new data points. One way to do so is to compute the ratio of correctly classified test data
points to the total number of test data points available, thus determining the accuracy of the model.

In our example, out of eight test images, machine was able to classify 6 images correctly and 2
images incorrectly. Hence, accuracy of this supervised machine learning model is 6/8 i.e. 75%

Supervised machine learning model:

Summary
We can summarize our learning of the supervised machine learning as follows:

● In the first step, we train the machine with known data so that it learns something from it.
● In the second step we expect the machine to utilize the knowledge it gained in the previous
step and classify a new unknown data point.
● In the third step, the model is evaluated on the basis of how accurately it has classified the
unknown data.

Supervised machine learning model: Types

There can be two types of supervised machine learning techniques as shown below:

● Classification: Used to predict discrete results.

For example, assume that a company wants to predict the budget period of a new project that they
have acquired as 'short-term' or 'long-term', based on various input attributes about the project such
as the number of resources required, software requirements, hardware requirements etc. We will
need to use the classification machine learning technique here.

● Regression: Used to predict continuous numeric results

For example, if we are trying to predict the approximate budget requirement of a new project that the
company has acquired in actual quantifiable figures, based on various input attributes about the
project such as number of resources required, software requirement, hardware requirement etc.,
then we use the regression technique.

Unsupervised machine learning model

There is a basket filled with some fresh fruits. The machine’s task is to group similar colored fruits
together.
But here, unlike supervised learning, the machine is not exposed to any prior knowledge. So how will
it arrange the same type of fruits based on their colors?

Unsupervised machine learning model –

Clustering
Machine identifies four clusters of fruits based on their color, as shown below.
Unsupervised machine learning model:
Summary
We can summarize our learning of the unsupervised machine learning as follows:
● In the first step, we fix some variables or parameters based on which the machine will
arrange the given data (in our example, we have taken "color" as the parameter).
● In the second step, the machine groups similar data points together. In our example, fruits
with the same color are grouped together.

Semi-supervised machine learning

In real-time, it may so happen that the unlabeled data points exceed the number of labeled data
points in a data set. In order to fit a model to such data we use the semi-supervised machine
learning technique, wherein we perform the following steps:

● Step 1: Train the model with labeled data points only.

● Step 2: Use the above model to predict the labels of the unlabeled data points
● Step 3: Combine the existing labeled data points with the newly labeled data points and use
it to retrain the model
● Step 4: Repeat the 2nd and 3rd steps until it converges

Applications of semi-supervised learning are text processing, video-indexing, bioinformatics, web

page classification and news classification among others.

Reinforcement learning
Reinforcement machine learning algorithm is a reward based and immediate feedback technique.
Here, the machine's goal is to maximize the numerical reward at each and every step. In the process
of learning, the machine is not provided any supervision as opposed to the previous ML algorithms
we discussed till now. Instead, the machine is expected to figure out the optimum actions which will
reap the maximum reward at each step, all on their own, without any interference.

The actions that the machine takes at each step might not only affect the immediate reward but may
also affect all the subsequent rewards. The ultimate aim is to reach the max possible reward in the
least amount of steps possible.Thus, trial and error search methodology and immediate feedback in
the form of a numerical reward are the two main characteristics of reinforcement learning.

An example of reinforcement learning would be when a machine, learning to play chess, decides
whether a move is right by planning the possible moves, anticipating the corresponding counter
moves and finally choosing one based on reward based appeal for a particular position or set of
moves. Another example could be when a trash collecting bot's charge is about to reach critical
levels and it needs to make a decision, to clean one more room before reaching out for the charging
station or to immediately rush to the nearest charging station. The decision taken by the bot
depends on the ease with which it can reach the charging station, based on its prior knowledge.
Some famous machine learning algorithms

We shall focus on supervised and unsupervised learning algorithms in the forthcoming courses.

Integrating the blocks of Data Science

To solve a given business problem, various blocks of the Data Science stack are tightly coupled with
each other.

● Core algorithms need to be written in some programming language for implementation.

● Most algorithms use the basic concepts of linear algebra.
● Statistical computations need to be done on the given data.
● Available data in structured, un-structured and semi-structured form need to be managed
through various data management systems.

Computer Science provides us with the necessary programming languages, database management
systems, statistical analysis and machine learning tools.

Tools and packages for Data Science

Following are a few tools and packages available which enable the application of data science
solutions on huge amounts of digital data.
The complete life cycle of a data science project is shown below.
Data Science implementation - Business use
case
Country Bank of India wants to cut down on their losses due to bad loans. It approaches a data
analytics firm to help them reduce these losses by X%.

Step 1: Define the goal

The first and foremost step in any project is to define a clear goal. Hence, at this point, it is important
to learn every minute detail about the project such as:

● Why is the project being started? What is missing currently and what exactly is required?
● What are they currently doing to fix the problem, and why isn’t it working?
● What all resources are needed? What kind of data is available? Is domain expertise available
within the team? What are the computational resources available/required?
● How does the business organization plan to deploy the derived results? What kind of
problems need to be addressed for successful deployment?

Bad loan use-case: Define the goal

The goal is to lessen the bank's losses caused by bad loans. To do this the firm intends to create a
tool to help the bank's loan officers to improve their accuracy in identifying bad loan applicants,
thereby lowering the number of bad loans being authorized. For this purpose, the goal defined should
be to the point and unambiguous. For example, a goal which states "We want to reduce the rate of
loan charge-offs by at least 10%, using a model which predicts whether loan applicants are likely to
default" is preferred over "We want to get better at finding bad loans".

Step 2: Collect and manage data

Now that the goal has been set, the next step is to find, explore and clean the data necessary for
analysis. This stage takes up a lot of time but helps in finding answers to many important questions,
such as:

● What all data is available?

● Will it help in solving the problem?
● Is the data enough to carry out analysis?
● Is the quality of data up to the mark?
Bad loan use-case: Collect and manage data
● Collect the data about each loan application with relevant attributes such as status of loan,
duration, credit history of the applicant, present employment status, residing at an address
since, number of dependents, and the number of active loans under the applicant’s name.
● Collect the data across a reasonable span of time such as one year or one decade.
● Conduct initial exploration (using data visualization and summary statistics) and clean the
data.
● While refining the data, it may so happen that the data identified earlier turns out to be not
adequate to perform the analysis.
● There might also be a situation wherein we encounter various new problematic areas within
the data, which we disregarded as not being a problem at all, previously! For example, if the
data set we took contained most of the defaulters or just a few defaulters, our analysis may
result in a biased conclusion.

Step 3: Build a model

Once the data is ready, the next step is to find meaningful insights from the data. Depending on the
nature of the business problem we are dealing with we can make use of any of the following data
modelling techniques to gather such insights.

● Classification: Determining which among the given categories a data point falls under
● Scoring: Predicting or estimating a quantifiable value
● Ranking: Ordering the data points depending on the priorities involved
● Clustering: Grouping similar items based on certain parameters
● Finding relations: Finding associations between various features of the data
● Characterization: Creating plots, graphs and various reports for understanding the data
better

Bad loan use-case: Build a model

● In the bank scenario, the problem we are dealing with is classification. We wish to classify
bank customers who apply for loans as probable defaulters or non-defaulters. Hence, we
need to train our model in such a way that it covers the entire range of the available data,
thus enabling it to learn about most of the probable loan defaulter cases.
● Given the preceding requirement, we decide on a suitable approach to build the model. We
can choose from either logistic regression, naive Bayes, k-nearest neighbours or decision
trees, among other available classification techniques.
● We also need to be aware of why a model is taking a particular decision and how confident it
is in its prediction. Ultimately, our model should be able to answer the question "how likely is
an applicant to be a defaulter"?

Step 4: Model evaluation

Now that we have built our model, we need to determine whether it meets our goals by asking the
following questions:

● Is the model accurate enough for our needs?

● Does the model meet the expectations? Is it better than the methodology being currently
used?

If for either of the above questions, the answer is NO, we need to revisit the previous steps.

Bad loan use-case: Model evaluation

● Check whether the evaluation parameters from the suggested model apply in our scenario.
● Proceed to calculate the model evaluation parameters (such as accuracy and precision)
based on the predefined rules and observe how many predicted values match the actual
values.

Step 5: Present results and document

At this stage we have achieved a desirable model. A model that meets all the requirements and
goals we set for ourselves at the beginning of the project. The next step is to showcase the project
to various audience as follows:

● Present the details of the model to all the collaborators, clients and sponsors.
● Provide everyone in charge of usage and maintenance of the model, once deployed, with
documentation that covers all aspects of the working of the model.
We must keep in mind that each group of people involved in the project require different kind of
treatment, when it comes to presentations and providing documentation. Hence, specific data
visualization techniques must be used for each of them. What might work for one audience, might
not work for the other.

Step 6: Deploy model

The last and final step is to deploy the model. Usually from this point ahead the data scientist is no
longer associated with the operations of the model. But before they are off the job they must make
sure that the following are in place:

● The model has been tested thoroughly and generalizes well.

● The model should be able to adjust well to unforeseen environmental changes.
● The model has been deployed in a pilot program and any problems that cropped up in the
last moment were taken care of by updating the model accordingly.

Bad loan use-case: Deploy model

There may arise a situation wherein experienced loan officers might veto the decision taken by the
model that we created as it opposes their instincts. Hence, we need to be always on the look out for
which is correct, our model or their intuition?

Characteristics of a successful Data Science

project
For the success of any data science project, we must have:
Each of the above use cases are briefly described below:

Churn Prediction
Churn implies loss of customers to competition. For any company, it costs more to acquire new
customers than to retain the old ones. As churn prediction aids in customer retention, it is extremely
important especially for businesses with a repeat customer base. The application of this model cuts
across domains such as Banking, E-Retail, Telecom, Energy and Utilities.

Sentiment Analysis
Also referred to as opinion mining, it is the process of computationally identifying what customers
like and dislike about a product or a brand. A domain which relentlessly makes use of sentiment
analysis is the Retail industry. Companies like Amazon, Flipkart, Reliance, Paytm use customer
feedback from social networking sites like Facebook, Twitter, etc. or their own company websites to
find out what their customers are talking about and how they feel i.e. positive, negative, or neutral.
They leverage this information to reposition their products and provide better/new services.

Online Advertisement
The incremental growth in the complexity of ads industry is due to the ease of access to the internet
via a wide variety of devices around the world. This gives the advertisers an opportunity to study
user preferences and online trends. The insights offered to them through these analysis, translates
to actionable items on issues and opportunities such as reducing ad-blindness or optimizing
cost-per-action (CPA) and click-through-rates (CTR).

Recommendations
Many e-retail companies like Amazon, Netflix, Spotify, Best Buy, You Tube among many others use
recommender systems to improve a customer's shopping experience. This offers the companies a
chance to gather information on customer's preferences, purchases and other browsing patterns
which lend insights that can amplify their return on investment.

Truth and Veracity

In today's digital world, the quantity of fake news is on the rise. Not only does vast majorities of
population fall prey to misinformation but it effects businesses negatively. Data Science is being
used to ensure data veracity or in other words, verify the truthfulness of data based on both accuracy
and context. Companies such Facebook, Twitter, Starbucks, Costco and many others are combating
fake news currently with the help of various data science techniques.

News Aggregation
A news aggregator gathers and clusters stories of the same topic from several leading news
websites and also traces the genuine source of a news item and what is the course of the story. It
has a special interactive timeline that allows the reader to flip swiftly between headlines, refining
their search by country or by specific news sites. Notable examples include Google News, Reddit,
Flipboard, Pulse etc.

Scalability
Scalability refers to an enterprise's ability to handle increased demands. In the corporate
environment, a scalable company is one that can maintain or improve profit margins while sales
volume increases. Many a times the process is slowed down by human intervention for decisions.
For example, the credit operations in banks invest substantial time in assessing the credit
worthiness of a client. Client management teams take long time to suggest the right product to the
customer/suggest alternatives. The client help desk takes long time to provide the desired info to
the client. If these processes can be automated, the business can scale up. Data science helps build
systems like recommender systems, Chabot’s etc. to achieve scalability.

Content Discovery/ Search

Content discovery involves using predictive algorithms to help make content recommendations to
users based on how they search. Search engines such as Google, Bing and Yahoo and various other
platforms are now using intelligent learning mechanisms to understand user preferences to be able
to suggest content that’s most suitable for them.

Few more platforms that use content discovery algorithms are Facebook and YouTube. The content
that appears in an individual's Facebook news feed and the videos that appear in the "Recommended
for You" section of YouTube user's account, are both altered according to each user's past behavior
and personal preferences.
Intelligent Learning
Intelligent learning has become a part of our day to day lives in various forms. For example, Google
Maps uses undesignated location data from various smart devices to predict the flow of traffic in
real time. It also utilizes user based reports on incidents that might affect the traffic, like road
construction and accidents, to help suggest fastest routes for travel, to users.

Another example would be ride sharing apps like Uber and Ola. They optimize the ride experience by
not only minimizing the ride time but also by matching users with other passengers for least amount
of detours in shared rides

Other examples of intelligent learning include self-driving cars, smart-email categorization,

credit-card fraud detection, etc.

Personalized Medicine
In many cases, the success of a particular treatment for a patients' condition cannot be predicted
beforehand. Thus, many medical practitioners follow a non-optimal trial-and-error approach.

In personalized medicine, a doctor needs to study an individual's genes, environment and lifestyle.
This would help tailor treatments for specific medical conditions as opposed to a trial and error
approach. This would also enable pharmaceutical researchers to create combination drugs targeting
a specific genomic profile which in turn increases safety and efficiency.

Companies that are active in the field of personalized medicine are Roche, Novartis, Johnson &
Johnson among others.
Data Science solutions as a product are offered by various vendors in the market today. Few of the
popular vendors are as follows:
We can observe the in both Data Analytics and Machine Learning fields, IBM Watson emerges as the
top player.

Platform comparison
A Consolidated view of the platform comparison based on Data Science technologies is shown
below.

In this course, we have explored and

understood the following:
● Why data science is the need of the hour?
● How to align the data methodically to our business' advantage
● What is data science?
● Different components of the data science stack: probability & statistics, linear algebra,
machine learning and computer science tools and packages
● Data science project life cycle
● Characteristics of a successful data science project
● Top 10 use cases of data science
● Data science ecosystems
● Data science technology popular players

Types of Digital Data
No ratings yet
Types of Digital Data
22 pages
Basics of Data Science
No ratings yet
Basics of Data Science
216 pages
ML & Data Analytics Course Syllabus
No ratings yet
ML & Data Analytics Course Syllabus
35 pages
Data Science Lecture 4 6th Semster
No ratings yet
Data Science Lecture 4 6th Semster
6 pages
Linear Algebra in Data Science Peter Zizler, Roberta La Haye Z Library
No ratings yet
Linear Algebra in Data Science Peter Zizler, Roberta La Haye Z Library
232 pages
Unit I
No ratings yet
Unit I
52 pages
Unit - I & II
No ratings yet
Unit - I & II
59 pages
Applied Data Analysis
No ratings yet
Applied Data Analysis
128 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Chapter 5
No ratings yet
Chapter 5
58 pages
Andrews M. Doing Data Science in R. An Introduction... 2021
No ratings yet
Andrews M. Doing Data Science in R. An Introduction... 2021
486 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Ds U1 chp1
No ratings yet
Ds U1 chp1
13 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
33 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Notes Unit1 Unit2
No ratings yet
Notes Unit1 Unit2
83 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
Data Science
No ratings yet
Data Science
8 pages
Introductiontodatascience 230122140841 B90a0856 1
No ratings yet
Introductiontodatascience 230122140841 B90a0856 1
44 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Module 1 Introduction To DataScience and Analytics
No ratings yet
Module 1 Introduction To DataScience and Analytics
10 pages
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
5 pages
Role of Mathematics in Data Science - Machine Lear
No ratings yet
Role of Mathematics in Data Science - Machine Lear
4 pages
Data Science Through R Lesson-1 Introduction To Data Science
No ratings yet
Data Science Through R Lesson-1 Introduction To Data Science
33 pages
Intro to Data Science Basics
No ratings yet
Intro to Data Science Basics
171 pages
Introductiontodatascience 230122140841 B90a0856
No ratings yet
Introductiontodatascience 230122140841 B90a0856
44 pages
Data Science
No ratings yet
Data Science
9 pages
Introduction To Data-Science
No ratings yet
Introduction To Data-Science
246 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
24 pages
FDS QB
No ratings yet
FDS QB
107 pages
Data Science Tips and Tricks To Learn Data Science Theories Effectively
No ratings yet
Data Science Tips and Tricks To Learn Data Science Theories Effectively
208 pages
Computational Mathematics in The Era of Data Science
No ratings yet
Computational Mathematics in The Era of Data Science
42 pages
Unit 2 Bi Unlocked Notes
No ratings yet
Unit 2 Bi Unlocked Notes
48 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Linearalgebra CIA1 Assignment Applications
No ratings yet
Linearalgebra CIA1 Assignment Applications
3 pages
Intro Lectures To DSA
0% (1)
Intro Lectures To DSA
17 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Lec 1 Course Introduction
No ratings yet
Lec 1 Course Introduction
16 pages
B Ei
No ratings yet
B Ei
44 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
Mathematical and Statistical Methods
No ratings yet
Mathematical and Statistical Methods
30 pages
Introduction To Data Science Lecture 1
No ratings yet
Introduction To Data Science Lecture 1
4 pages
The Transformative Role of Data Science in Contemporary Society
No ratings yet
The Transformative Role of Data Science in Contemporary Society
14 pages
Maths in Data Science
No ratings yet
Maths in Data Science
3 pages
Data Science
100% (2)
Data Science
33 pages
IDS Notes Unit 1
No ratings yet
IDS Notes Unit 1
22 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
28 pages
22mca341 - Data Science
No ratings yet
22mca341 - Data Science
109 pages
Lec 1 Course Introduction
No ratings yet
Lec 1 Course Introduction
16 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Comprehensive Reading Strategies
60% (10)
Comprehensive Reading Strategies
3 pages
Chapter I: Engineer'S Report: 3.I Water Supply Route Plan Route 1 Route 2 Route 3 Route 4 Main Route
No ratings yet
Chapter I: Engineer'S Report: 3.I Water Supply Route Plan Route 1 Route 2 Route 3 Route 4 Main Route
22 pages
Test Automation Process: Santhi Krishna Gogineni Sriharirao Kuchi
No ratings yet
Test Automation Process: Santhi Krishna Gogineni Sriharirao Kuchi
16 pages
Occipital Lobe: Vision and Functions
No ratings yet
Occipital Lobe: Vision and Functions
2 pages
Geotechnical Engineering Tasks
No ratings yet
Geotechnical Engineering Tasks
2 pages
IoT The Network Protocols and Technologies - v4
No ratings yet
IoT The Network Protocols and Technologies - v4
28 pages
Guidance Document Food Safety and Quality Culture V6 1
No ratings yet
Guidance Document Food Safety and Quality Culture V6 1
20 pages
III Internals Hydrology and Irrigation Engineering
No ratings yet
III Internals Hydrology and Irrigation Engineering
5 pages
1.amplitude Modulation and Demodulation
0% (1)
1.amplitude Modulation and Demodulation
6 pages
Untitled
No ratings yet
Untitled
30 pages
CP D155AX 6 S N 81028 UP (Chassi)
No ratings yet
CP D155AX 6 S N 81028 UP (Chassi)
5,092 pages
Ingles 1er Bchto U 1-5
No ratings yet
Ingles 1er Bchto U 1-5
40 pages
Rockwell Test
No ratings yet
Rockwell Test
4 pages
2024 Grade 11 Informal Test No 1 MG
No ratings yet
2024 Grade 11 Informal Test No 1 MG
3 pages
Standard 44 Standards and Acceptance Checklist Well Operations
100% (1)
Standard 44 Standards and Acceptance Checklist Well Operations
25 pages
Mukesh Ambani - Wikipedia
No ratings yet
Mukesh Ambani - Wikipedia
16 pages
Kibabii University
0% (1)
Kibabii University
29 pages
Sika® Bitumen 60 - 2020
No ratings yet
Sika® Bitumen 60 - 2020
3 pages
D6347D6347M
No ratings yet
D6347D6347M
17 pages
WPS 19
100% (1)
WPS 19
15 pages
JEE Main 2024 Magnetic Properties Questions
No ratings yet
JEE Main 2024 Magnetic Properties Questions
9 pages
5 FP Valves
No ratings yet
5 FP Valves
18 pages
126 Chapter 18
No ratings yet
126 Chapter 18
2 pages
Science-Religion Dialogue Crisis
No ratings yet
Science-Religion Dialogue Crisis
8 pages
Swati VL - Resume
No ratings yet
Swati VL - Resume
3 pages
DA65 We
No ratings yet
DA65 We
150 pages
RPC Programming1
No ratings yet
RPC Programming1
30 pages
Invoice
No ratings yet
Invoice
1 page
(A) Devoutly (B) Serenely (C) Hysterically (D) Forcefully
No ratings yet
(A) Devoutly (B) Serenely (C) Hysterically (D) Forcefully
18 pages
Slides - Design Guideline For HDI (MULTEK)
No ratings yet
Slides - Design Guideline For HDI (MULTEK)
11 pages

Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us

Uploaded by

Data Science Is An Amalgamation of Different Scientific Methods, Algorithms and Systems Which Enable Us

Uploaded by

Data science is an amalgamation of different scientific methods, algorithms and systems which

By the end of this course, you should be able

What else can we probe with data?

In order to derive such interesting

Components of Data Science

Let us now explore each component of Data Science.

Role of probability, statistics and

What is Linear Algebra?

Why Linear Algebra?

Role of linear algebra in data science

Iterative methods vs Linear Algebra

Step 2: The message is split into a sequence of 3x1 vectors:

Step 3: A 3 x 3 encoding matrix is used to encrypt the message vectors:

Example 2: Solving an electrical network

Step 2: These equations are converted into a matrix.

Step 3: The matrix is solved to get the values of the currents.

Example 3: Finding relationships on a social

Linear Algebra: Summary

What is machine learning?

How can a machine learn?

Why make a machine learn?

Machine beats Man

Digital recognition of handwritten notes

Types of machine learning

Supervised machine learning model:

Supervised machine learning model: Types

● Classification: Used to predict discrete results.

● Regression: Used to predict continuous numeric results

Unsupervised machine learning model

Unsupervised machine learning model –

Semi-supervised machine learning

● Step 1: Train the model with labeled data points only.

Applications of semi-supervised learning are text processing, video-indexing, bioinformatics, web

Integrating the blocks of Data Science

● Core algorithms need to be written in some programming language for implementation.

Tools and packages for Data Science

Step 1: Define the goal

Bad loan use-case: Define the goal

Step 2: Collect and manage data

● What all data is available?

Step 3: Build a model

Bad loan use-case: Build a model

Step 4: Model evaluation

● Is the model accurate enough for our needs?

Bad loan use-case: Model evaluation

Step 5: Present results and document

Step 6: Deploy model

● The model has been tested thoroughly and generalizes well.

Bad loan use-case: Deploy model

Characteristics of a successful Data Science

Truth and Veracity

Content Discovery/ Search

Other examples of intelligent learning include self-driving cars, smart-email categorization,

In this course, we have explored and

You might also like