0% found this document useful (0 votes)

23 views

Data Science Pipeline, EDA & Data Preparation

Uploaded by

willyamedome

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Data Science Pipeline, EDA & Data Preparation

Uploaded by

willyamedome

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Unit 4

Data Science Pipeline, EDA & Data Preparation

Structure:

4.1 Introduction to Data Science Pipeline

4.2 Data Wrangling

4.3 Exploratory Data Analysis

4.4 Data Extraction & Cleansing

4.5 Statistical Modelling

4.6 Data Visualisation

Summary

Keywords

Self-Assessment Questions

Answers to Check your Progress

Published by Symbiosis Centre for Distance Learning (SCDL), Pune

2019

Copyright © 2019 Symbiosis Open Education Society

All rights reserved. No part of this unit may be reproduced, transmitted or utilised in any form or by
any means, electronic or mechanical, including photocopying, recording or by any information storage
or retrieval system without written permission from the publisher.

Acknowledgement

Every attempt has been made to trace the copyright holders of materials reproduced in this unit.
Should any infringement have occurred, SCDL apologies for the same and will be pleased to make
necessary corrections in future editions of this unit.
Objectives

After going through this unit, you will able to:

 Understand what is meant by Data Science Pipeline

 The meaning of Data Wrangling and Exploratory Data Analysis
 Understand why cleansing the data is the most important part of Data Science
 Understand the basics of Statistical Modelling
 Know why visualising the data is an integral part of Data Science Work Cycle

4.1 INTRODUCTION TO DATA SCIENCE PIPELINE

A data science pipeline is the overall step by step process towards obtaining, cleaning, visualising,
modelling, and interpreting data within a business or group.

Data science pipelines are sequences of processing and analysis steps applied to data for a specific
purpose.

They're useful in production projects, and they can also be useful if one expects to encounter the same
type of business question in the future, so as to save on design time and coding.

Stages of Data Science Pipeline are as follows:

1) Problem Definition

 Contrary to common belief, the hardest part of data science isn’t building an accurate model
or obtaining good, clean data. It is much harder to define feasible problems and come up with
reasonable ways of measuring solutions. Problem definition aims at understanding, in depth,
a given problem at hand.
 Multiple brainstorming sessions are organized to correctly define a problem because of your
end goal with depending upon what problem you are trying to solve. Hence, if you go wrong
during the problem definition phase itself, you will be delivering a solution to a problem which
never even existed at first.
2) Hypothesis Testing

 Hypothesis testing is an act in statistics whereby an analyst tests an assumption regarding a

population parameter. The methodology employed by the analyst depends on the nature of
the data used and the reason for the analysis.
 Hypothesis testing is used to infer the result of a hypothesis performed on sample data from
a larger population. In simple words, we form some assumptions during problem definition
phase and then validate those assumptions statistically using data.

3) Data Collection and processing

 Data collection is the process of gathering and measuring information on variables of interest,
in an established systematic fashion that enables one to answer stated research questions,
test hypotheses, and evaluate outcomes. Moreover, the data collection component of
research is common to all fields of study including physical and social sciences, humanities,
business, etc.
 While methods vary by discipline, the emphasis on ensuring accurate and honest collection
remains the same.
 Data processing is more about a series of actions or steps performed on data to verify,
organize, transform, integrate, and extract data in an appropriate output form for subsequent
use. Methods of processing must be rigorously documented to ensure the utility and integrity
of the data.

4) EDA and Feature Engineering

 Once you have clean and transformed data, the next step for machine learning projects is to
become intimately familiar with the data using exploratory data analysis (EDA).
 EDA is about numeric summaries, plots, aggregations, distributions, densities, reviewing all
the levels of factor variables and applying general statistical methods.
 A clear understanding of the data provides the foundation for model selection, i.e. choosing
the correct machine learning algorithm to solve your problem.
 Feature engineering is the process of determining which predictor variables will contribute
the most to the predictive power of a machine learning algorithm.
 The process of feature engineering is as much of an art as a science. Often feature engineering
is a give-and-take process with exploratory data analysis to provide much-needed intuition
about the data. It’s good to have a domain expert around for this process, but it’s also good
to use your imagination.

5) Modelling and Prediction

 Machine learning can be used to make predictions about the future. You provide a model with
a collection of training instances, fit the model on this data set, and then apply the model to
new instances to make predictions.
 Predictive modelling is useful because you can make products that adapt based on expected
user behaviour. For example, if a viewer consistently watches the same broadcaster on a
streaming service, the application can load that channel on application start-up.

6) Data Visualisation

 Data visualisation is the process of displaying data/information in graphical charts, figures,

and bars. It is used as a means to deliver visual reporting to users for the performance,
operations or general statistics of data and model prediction.
7) Insight Generation and implementation

 Interpreting the data is more like communicating your findings to the interested parties. If you
can’t explain your findings to someone believe me, whatever you have done is of no use.
Hence, this step becomes very crucial.
 The objective of this step is to first identify the business insight and then correlate it to your
data findings. Secondly, you might need to involve domain experts in correlating the findings
with business problems.
 Domain experts can help you in visualising your findings according to the business dimensions
which will also aid in communicating facts to a non-technical audience.

4.2 DATA WRANGLING

Data wrangling is 80% of what a data scientist does. It’s where most of the real value is created

The first step in analytics is gathering data. Then as you begin to analyse and dig deep for answers, it
often becomes necessary to connect to and mashup information from a variety of data sources.

Data can be messy, disorganised, and contain errors. As soon as you start working with it, you will see
the need for enriching or expanding it, adding groupings and calculations. Sometimes it is difficult to
understand what changes have already been implemented.

Moving between data wrangling and analytics tools slows the analytics process—and can introduce
errors. It’s important to find a data wrangling function that lets you easily make adjustments to data
without leaving your analysis.

This is also called as Data Munging. It follows certain steps such as after extracting the data from
different data sources, sorting of data using certain algorithm is performed, decompose the data into
a different structured format and finally store the data into another database.

Some of the steps associated with Data Wrangling are:

1. Load, explore, and analyse your data

2. Drop the unnecessary columns like columns containing IDs, Names, etc.

3. Drop the columns which contain a lot of null or missing values

4. Impute missing values

5. Replace invalid values

6. Remove outliers

7. Log Transform Skewed Variables

8. Transform categorical variables to dummy variables

10. Binning the continuous numeric variables

11. Standardisation and Normalisation

Each of the above mentioned steps has a special importance with respect to Data Science.

Let us look at an example.

If you want to visualise number of customers of a telecom provider by city, then you need to ensure
that there is only one row per city before data visualisation.

If you have two rows like Bombay and Mumbai representing the same city, this could lead to wrong
results. One of the rows has to be changed manually by the data analyst and this is done by creating
a mapping on the fly in the visualisation tool and applied to every row of data to detect for more such
issues and the process is repeated for other cities.

Need of Data Wrangling

Data wrangling is an important aspect for implementing the statistical model.

Therefore, data is converted to the proper feasible format before applying any model intro it. By
performing filtering, grouping and selecting appropriate data accuracy and performance of the model
could be increased.

4.3 EXPLORATORY DATA ANALYSIS

Exploratory data analysis is, as the name mentions, a peek at the data you will be working with. Usually
this involves

1. Cleaning the data - finding junk values and removing them, finding outliers and replacing them
appropriately (with the 95% percentile, for example) etc.
2. Summary Statistics - finding the summary statistics - mean, median and if necessary, mode,
along with the standard deviation and variance of the particular distribution
3. Univariate analysis - A simple histogram that shows the frequency of a particular variable's
occurrence, or a line chart that shows how a particular variable changes over time to have a
look at all the variables in the data and understand them.

The idea is that, after performing Exploratory Data Analysis, you should have a sound understanding
of the data you are about to dive into. Further hypothesis based analysis (post EDA) could involve
statistical testing, bi-variate analysis etc.

Let's understand this with the help of an example.

We all must have seen our mother taking a spoonful of soup to judge whether or not the salt is
appropriate in the soup. The act of tasting the soup to check the salt level and to better understand
the taste of soup by taking a spoonful is exploratory data analysis. Based on that our mothers decide
the salt level, this is where they make inferences and their validity depends on whether or not the
soup is well stirred that is to say whether or not the sample represents the whole population.

Let us take another business case example,

Say we have given some data of sales and their daily revenue numbers for a big retail chain

Business problem is – A retail chain wants to improve its revenue.

The question that arises now is that what are the ways with which we can achieve this?

What will you look for? Do you know what to look for? Will you immediately run a code to find mean
median and mode and other statistics?

The main objective is to understand the data inside out. The first step in any EDA is asking the right
questions for which we want the answers for. If our questions go wrong, the whole EDA goes wrong.

The first step of any EDA is list down as many questions as you can on a piece of paper.
What are some of the questions that we can ask? They are:

 How many total stores are there in the retail company?

 Which stores and regions are performing the best and the worse?
 What are the actual sales across each and every store?
 How many stores are selling products below the average?
 How many stores are exclusively selling best profit making products?
 On which days are the sales maximum?
 Do we see seasonal sales across products?
 Are there any abnormal sales numbers?

These are some of the questions that need to be asked before deciding on the next steps.

It gives some very interesting insights out of data such as:

1. Listing the outliers and anomalies in our data

2. Identifying the most important variables
3. Understanding the relationship between variables
4. Checking for any errors such as missing variables or incorrect entries
5. Know the data types of the dataset – whether continuous/discreet/categorical
6. Understand how the data is distributed
7. Testing a hypothesis or checking assumptions related to a specific model

Exploratory data analysis (EDA) is very different from classical statistics. It is not about fitting models,
parameter estimation, or testing hypotheses, but is about finding information in data and generating
ideas.

So, this is the background of EDA. Technically, it involves steps like cleaning the data, calculating
summary statistics and then making plots to better understand the data at hand to make meaningful
inferences.

4.4 DATA EXTRACTION & CLEANSING

Data extraction & cleaning (sometimes also referred to as data cleansing or data scrubbing) is the act
of detecting and either removing or correcting corrupt or inaccurate records from a record set, table,
or database. Used mainly in cleansing databases, the process applies identifying incomplete, incorrect,
inaccurate, irrelevant, etc. items of data and then replacing, modifying, or deleting this “dirty”
information.

The next step after data cleaning is data reduction. This includes defining and extracting attributes,
decreasing the dimensions of data, representing the problems to be solved, summarising the data,
and selecting portions of the data for analysis.

There are multiple data cleansing practices in vogue to clean and standardize bad data and make it
effective, usable and relevant to business needs.

Organisations relying heavily on data driven business strategies need to choose a practice that best
fits in with their operational working. A standard practice is shown in the diagram below.
Detailed steps of this process are as follows:

1. Stored Data:

Put together the data collected from all sources and create a data warehouse. Once your data
is stored in a place, it is ready to be put through the cleansing process.

2. Identify errors:

Multiple problems contribute to lowering the quality of data and making it dirty. Problems like
inaccuracy, invalid data, incorrect data entry, missing values, spell error, incorrect data ranges,
multiple representation of data.

These are some of the common errors which should be taken care in creating a cleansed data
regime.

3. Remove duplication/redundancy

Multiple employees work on a single file where they collect and enter data. Most of the times,
they don’t realise they are entering the same data collected by some other employee, at some
other time. Such duplicate data corrupts the data results and must be weeded out.

4. Validate the accuracy of data

Effective marketing occurs with high quality of data and thus validating the accuracy is the
utmost prior thing organisations aim for. However, the method of collection is independent
of cleansing process.

A triple verification of data will enhance the dataset and build trust worthiness in marketers
and sales professional to utilize the power of data.

5. Standardise data format

Now that data is validated, it is more important to put all the data in a standardised and
accessible format. This ensures entered data is clean and enriched for ready to use.

Some of the other best practices which need to be followed while Data Cleansing are:

 Sort data by different attributes

 For large datasets cleanse it stepwise and improve the data with each step until you achieve
a good data quality
 For large datasets, break them into small data. Working with less data will increase your
iteration speed
 To handle common cleansing task create a set of utility functions/tools/scripts. It might
include, remapping values based on a CSV file or SQL database or, regex search-and-replace,
blanking out all values that don’t match a regex
 If you have an issue with data cleanliness, arrange them by estimated frequency and attack
the most common problems
 Analyse the summary statistics for each column (standard deviation, mean, number of missing
values)
 Keep track of every date cleaning operation, so you can alter changes or remove operations if
required

But keep in mind that all these are standard practices and they might and might not apply every time
to a given problem. For e.g. if we have a numerical data, we might want to remove missing values,
NAs at first.

For textual data, tokenisation, removing whitespace, punctuation, stopwords, stemming can be all
possible steps towards cleaning data for further analysis.

Thus Data Cleansing is imperative for model building. If the data is garbage, then the output will also
be garbage no matter how great of statistical analysis is applied on it.

4.5 STATISTICAL MODELLING

In simple terms, statistical modelling is a simplified, mathematically-formalized way to approximate
reality (i.e. what generates your data) and optionally to make predictions from this approximation.
The statistical model is the mathematical equation that is used.

Statistical modelling, is, literally, building statistical models. A linear regression is a statistical model.

To do any kind of statistical modelling, it is utmost necessary to know the basics of statistics like:

 Basic statistics: Mean, Median, Mode, Variance, Standard Deviation, Percentile, etc.
 Probability Distribution: Geometric Distribution, Binomial Distribution, Poisson
distribution, Normal Distribution, etc.
 Population and Sample: understanding the basic concepts, the concept of sampling
 Confidence Interval and Hypothesis Testing: How to Perform Validation Analysis
 Correlation and Regression Analysis: Basic Models for General Data Analysis

Statistical modeling is a step which comes after data cleansing. The most important parts are model
selection, configuration, prediction, evaluation & presentation.

Let us look at each one of these in brief.

1) Model Selection

 One among many machine learning algorithms may be appropriate for a given predictive
modeling problem. The process of selecting one method as the solution is called model
selection.
 This may involve a suite of criteria both from stakeholders in the project and the careful
interpretation of the estimated skill of the methods evaluated for the problem.
 As with model configuration, two classes of statistical methods can be used to interpret the
estimated skill of different models for the purposes of model selection. They are:
o Statistical Hypothesis Tests. Methods that quantify the likelihood of observing the
result given an assumption or expectation about the result (presented using critical
values and p-values).
o Estimation Statistics. Methods that quantify the uncertainty of a result using
confidence intervals.

2) Model Configuration

 A given machine learning algorithm often has a suite of hyperparameters (parameters passed
to the statistical model which can be changed) that allow the learning method to be tailored
to a specific problem.
 The configuration of the hyperparameters is often empirical in nature, rather than analytical,
requiring large suites of experiments in order to evaluate the effect of different
hyperparameter values on the skill of the model.
 Hyperparameters are the ones which can break or make a model. Hyperparameter tuning is a
very famous practice in the world of Data Science.
 The 2 methods by which we can do Hyperparameter tuning are:
o Grid Search
o Random Search

3) Model Evaluation

 A crucial part of a predictive modeling problem is evaluating a learning method.

 This often requires the estimation of the skill of the model when making predictions on data
not seen during the training of the model.
 Generally, the planning of this process of training and evaluating a predictive model is called
experimental design. This is a whole subfield of statistical methods.
 Experimental Design. Methods to design systematic experiments to compare the effect of
independent variables on an outcome, such as the choice of a machine learning algorithm on
prediction accuracy.
 As part of implementing an experimental design, methods are used to resample a dataset in
order to make economic use of available data in order to estimate the skill of the model. These
two represent a subfield of statistical methods.
 Resampling Methods. Methods for systematically splitting a dataset into subsets for the
purposes of training and evaluating a predictive model.

4) Model Presentation

 Once a final model has been trained, it can be presented to stakeholders prior to being used
or deployed to make actual predictions on real data.
 A part of presenting a final model involves presenting the estimated skill of the model.
 Methods from the field of estimation statistics can be used to quantify the uncertainty in the
estimated skill of the machine learning model through the use of tolerance intervals and
confidence intervals.
 Estimation Statistics. Methods that quantify the uncertainty in the skill of a model via
confidence intervals.

4.6 DATA VISUALISATION

Data Visualisation is the representation of information in the form of chart, diagram, picture, etc.
These are created as the visual representation of information.
Importance of Data Visualisation:

 Absorb information quickly

 Understand your next steps
 Connect the dots
 Hold your audience longer
 Kick the need for data scientists
 Share your insights with everyone
 Find the outliers
 Memorise the important insights
 Act on your findings quickly

There are 10 elements of successful data visualisation:

 It tells a visual story

 It’s easy to understand
 It’s tailored for your target audience
 It’s user friendly
 It’s useful
 It’s honest
 It’s succinct
 It provides context

Data science is useless if you can’t communicate your findings to others, and visualisations are
imperative if you’re speaking to a non-technical audience. If you come into a board room without
presenting any visuals, you’re going to run out of work pretty soon.

More than that, visualisations are very helpful for data scientists themselves. Visual representations
are much more intuitive to grasp than numerical abstractions

Let’s consider an example

The below plot is a chart which shows total air passengers across time for a particular airline.
Just by glancing at the chart for two seconds, we immediately recognize a seasonal trend and a long-
term trend. Identifying those patterns by analysing the numbers alone would require decomposing
the signal in several steps.

Thus you require visualisations at two places:

 You need to understand the data yourself so you need to create visualisations which will
probably never be shared.
 You need to get the data’s story across and visualisation is usually the best way to go.

Visualisations are helpful both in pre-processing and post-processing stages. They help us understand
our datasets and results in the form of shapes and objects which is somehow more real to the human
brain.

What is the future of data visualisation?

There are currently three key trends that are probably going to shape the future of data visualisation:
Interactivity, Automation, and storytelling (VR).

1) Interactivity

Interactivity has been a key element of online Data Visualisation for numerous years. But it is currently
beginning to overwhelm static visualisations as for the predominant manner in which visualisations
are introduced - particularly in news media. It is progressively expected that every online map, chart,
and a graph is interactive as well as energised.

The challenge of interactivity is to give choices obliging an extensive range of users and corresponding
necessities, without overcomplicating the UI of the data visualisation. There are 7 key sorts of
interactivity, as shown below:

 Reconfigure
 Choosing features
 Encode
 Abstract/elaborate
 Explore
 Connect
 Filter

2) Automation

In the past, Data Visualisation was a tedious and troublesome process. The current test is to automate
the Big Data Visualisation to regulate huge picture trends, however, without dismissing the sight of
interest.

Best practice visualisation and design standards are vital. But there should be a match between the
kind of visualisation and the reason for which it will be utilised.

3) Storytelling and VR

Storytelling with data is popular, and rightfully so. Data Visualisations are vacant of significance
without a story, and stories can be enormously enhanced when supplemented with data visualisation.

The future of storytelling might be virtual reality. The human visual awareness system is upgraded to
seeing and interfacing in three measurements. The full storytelling capability of data visualisation can
be investigated once it is no longer compelled to flat screens.
Some of the best Data Visualisation tools for Data Science are:

1) Tableau
2) QlikView
3) PowerBi
4) QlikSense
5) FusionCharts
6) HighCharts
7) Plotly

But the most important one if you are playing with R is Ggplot2 and that with respect to Python is
Seaborn or Matplotlib.

Let us discuss in detail a bit about Ggplot2.

What is GGPLOT?

Ggplot2 is a data visualisation package for the statistical programming language R, which tries to take
the good parts of base and lattice graphics and none of the bad parts.

It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as
providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

The 5 main reasons why you should explore ggplot are as follows:

 It can do quick-and-dirty and complex, so you only need one system

 The default colours and other aesthetics are nicer
 Never again lose an axis title (or get told your pdf can’t be created) due to wrongly specified
outer or inner margins
 You can save plots (or the beginnings of a plot) as objects
 Multivariate exploration is greatly simplified through faceting and colouring

Data Visualisation will change the manner in which our analysts work with data. They will be relied
upon to respond to issues more quickly and required to dig for more insights – look at information
differently, more creatively.

Data Visualisation will advance that imaginative data analysis.

Check your Progress 1

1. What are the components of Data Science Pipeline?
2. Name some Data Visualisation tools.
3. What are the four steps involved in model building?
4. What is EDA?
5. What is Data Wrangling?

Activity 1
Find and list the more data visualisation tools.
Summary
 Data Science is a combination of multiple fields which involves creation, preparation,
transformation, modelling, and visualisation of data.
 Data Science pipeline consists of Data Wrangling, Data Cleansing & Extraction, EDA, Statistical
Model Building, and Data Visualisation.
 Data Wrangling is a step in which the data needs to transformed and aggregated into usable
format through which insights can be derived.
 Data Cleansing is an important step in which data needs to be cleansed like replacing the
missing values, replacing NaN’s in data, removing outliers along with standardisation and
normalisation.
 Data Visualisation is a process of visualising the data so as to derive insights from it at a glance.
It is also used to present results of the data science problem.
 Statistical modelling is the core of Data Science problem solution. It is fitting of statistical
equations on the data at hand to predict a certain value on future observations.

Keywords
 Data Science Pipeline: The 7 major stages of solving a Data Science problem.
 Data Wrangling: The art of transforming the data into a format through which it is easier to
draw insights from.
 Data Cleansing: The process of cleaning the data of missing, garbage, Nan’s and outliers.
 Data Visualisation: The art of building graphs and charts so as to understand data easily and
find insights into it.
 Statistical Modelling: The implementation of statistical equations on existing data.

Self-Assessment Questions
1. What is Data Science Pipeline?
2. Why is there a need for Data Wrangling?
3. What are the steps involved in Data Cleansing?
4. What are the basics required to perform statistical modelling?
5. What do you mean by Data Visualisation and where is it used?

Answers to Check your Progress

Check your Progress 1

1) Components of Data Science Pipeline are:

a. Identifying the problem
b. Hypothesis testing
c. Data collection & data wrangling
d. EDA
e. Statistical Modelling
f. Interpreting and communicating results
g. Data Visualisation and Insight Generation
2) Some Data Visualisation tools are:
a. Tableau
b. Power Bi
c. R & Python
d. Qlikeview and Qliksense
3) 4 steps involved in model building are:
a. Model selection
b. Model configuration
c. Model evaluation
d. Model presentation
4) EDA is exploratory data analysis which refers to refers to the critical process of performing
initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis
and to check assumptions with the help of summary statistics and graphical representations
5) Data wrangling is the process of cleaning and unifying messy and complex data sets for easy
access and analysis.

Suggested Reading
1. Jeffrey Stanton, An Introduction to Data Science.
2. The Data Science Handbook, Book by Field Cady.
3. Hands-On Data Science and Python Machine Learning, Book by Frank Kane.
4. Data Science in Practice.

Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
EBA5005 Sample Exam Paper
No ratings yet
EBA5005 Sample Exam Paper
16 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Tasks 10 SCR Simulations
100% (3)
Tasks 10 SCR Simulations
3 pages
Query2Prod2Vec Grounded Word Embeddings For Ecommerce
No ratings yet
Query2Prod2Vec Grounded Word Embeddings For Ecommerce
14 pages
Data Science PPT-2
No ratings yet
Data Science PPT-2
34 pages
Intelligent Waste Management A Data Science Approach
No ratings yet
Intelligent Waste Management A Data Science Approach
8 pages
Data Science ppt
No ratings yet
Data Science ppt
17 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
III Year - Internship Review 5P7
No ratings yet
III Year - Internship Review 5P7
10 pages
M.Phil Computer Science Cloud Computing Projects
No ratings yet
M.Phil Computer Science Cloud Computing Projects
15 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
MINI PROJECT Report Format
No ratings yet
MINI PROJECT Report Format
6 pages
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
No ratings yet
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
15 pages
Supervised-Unsupervised Learning
No ratings yet
Supervised-Unsupervised Learning
2 pages
Comp 1835 Coursework
100% (1)
Comp 1835 Coursework
33 pages
BE02000041-Fundamental of Assignments
No ratings yet
BE02000041-Fundamental of Assignments
12 pages
Synopsis LPU UMS
No ratings yet
Synopsis LPU UMS
7 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Lab Master Record On
No ratings yet
Lab Master Record On
42 pages
Bigdata and Hadoop
No ratings yet
Bigdata and Hadoop
27 pages
Data Warehousing Logical Design
100% (1)
Data Warehousing Logical Design
23 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
How To Train AI Models Step by Step Effectively
No ratings yet
How To Train AI Models Step by Step Effectively
8 pages
2019 S2 FIT5145 期末复习资料
No ratings yet
2019 S2 FIT5145 期末复习资料
42 pages
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
No ratings yet
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
16 pages
Ipl Team Management
No ratings yet
Ipl Team Management
18 pages
Cloud (Naan Mudhalvan)
No ratings yet
Cloud (Naan Mudhalvan)
13 pages
HIEUBMSE130616 DBW301 Test1
No ratings yet
HIEUBMSE130616 DBW301 Test1
3 pages
Bridge Course Computer Science
No ratings yet
Bridge Course Computer Science
2 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
C C B J.D./M.B.A.: Hristopher - Uck
No ratings yet
C C B J.D./M.B.A.: Hristopher - Uck
2 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Unit 23 Example Assignment
No ratings yet
Unit 23 Example Assignment
19 pages
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
No ratings yet
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
29 pages
Service Quality Models A Review - Seth
No ratings yet
Service Quality Models A Review - Seth
53 pages
Unit 1 mk05
No ratings yet
Unit 1 mk05
6 pages
Session-3-Unit-1 Introduction to Project Management
No ratings yet
Session-3-Unit-1 Introduction to Project Management
7 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages
Memory Management in Operating System
No ratings yet
Memory Management in Operating System
2 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
Mini Project REPORT
No ratings yet
Mini Project REPORT
39 pages
AF5122 - Course Introduction
No ratings yet
AF5122 - Course Introduction
19 pages
Cybernetics Protector
No ratings yet
Cybernetics Protector
65 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
BIS Revision Lecture Notes
No ratings yet
BIS Revision Lecture Notes
6 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
Chapter 10 Asset Management 2014 From Machine To Machine To The Internet of Things
No ratings yet
Chapter 10 Asset Management 2014 From Machine To Machine To The Internet of Things
8 pages
Supervised Vs Unsupervised Learning
No ratings yet
Supervised Vs Unsupervised Learning
9 pages
Unit 1
No ratings yet
Unit 1
61 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
A6515 BDA Question Bank
No ratings yet
A6515 BDA Question Bank
9 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
2 pages
Keyphrase Extraction (3rd Review)
No ratings yet
Keyphrase Extraction (3rd Review)
22 pages
Cloud Computing and Security
100% (1)
Cloud Computing and Security
4 pages
Bachelor of Technology in Computer Science and Engineering: Mini Project Report
No ratings yet
Bachelor of Technology in Computer Science and Engineering: Mini Project Report
29 pages
Department of Information Technology: Assignment Aim: Content Beyond Syllabus
100% (1)
Department of Information Technology: Assignment Aim: Content Beyond Syllabus
2 pages
Call-for-Applications-2024-IoR-Fellowship-2
No ratings yet
Call-for-Applications-2024-IoR-Fellowship-2
3 pages
220-1101 A+ 19 july 2023
No ratings yet
220-1101 A+ 19 july 2023
308 pages
220-1101 30-06-2023 ExamTopics
No ratings yet
220-1101 30-06-2023 ExamTopics
263 pages
Basic of ds
No ratings yet
Basic of ds
14 pages
Basics of Big Data
No ratings yet
Basics of Big Data
14 pages
Measures
No ratings yet
Measures
36 pages
Instant Download Better Data Visualizations: A Guide For Scholars, Researchers, And Wonks Jonathan Schwabish PDF All Chapters
100% (6)
Instant Download Better Data Visualizations: A Guide For Scholars, Researchers, And Wonks Jonathan Schwabish PDF All Chapters
55 pages
CSE - Data Visualization Lab Manual
No ratings yet
CSE - Data Visualization Lab Manual
110 pages
ISE_Product Flyer_Richardson_Introduction to Business Analytics_1e
No ratings yet
ISE_Product Flyer_Richardson_Introduction to Business Analytics_1e
4 pages
Infosys Appraisal Project
No ratings yet
Infosys Appraisal Project
51 pages
Training Calendar 2025 2026 0
No ratings yet
Training Calendar 2025 2026 0
20 pages
Lecture 1 introduction PM (1)
No ratings yet
Lecture 1 introduction PM (1)
21 pages
Alt Assessment Report Format UG
No ratings yet
Alt Assessment Report Format UG
12 pages
LATIKA PROJECT
No ratings yet
LATIKA PROJECT
30 pages
Introduction To Infographic - JSI
No ratings yet
Introduction To Infographic - JSI
26 pages
Powerbi Syllabus
No ratings yet
Powerbi Syllabus
1 page
Machine Learning-1
No ratings yet
Machine Learning-1
64 pages
Thinking Like An: Foundational Skills For Aspiring Data Professionals
100% (1)
Thinking Like An: Foundational Skills For Aspiring Data Professionals
108 pages
ML Project Report
No ratings yet
ML Project Report
22 pages
Sem 5th Wheather Forcast Project
No ratings yet
Sem 5th Wheather Forcast Project
28 pages
What Is Interactive Data Visualization
No ratings yet
What Is Interactive Data Visualization
2 pages
SAP Analytics Cloud
No ratings yet
SAP Analytics Cloud
22 pages
IoT Based Low-Cost Weather Station and Monitoring System For Precision Agriculture in India
No ratings yet
IoT Based Low-Cost Weather Station and Monitoring System For Precision Agriculture in India
6 pages
100% Guaranteed Job in 15 Days - Data Analyst Job - Min 6 LPA - YouTube
No ratings yet
100% Guaranteed Job in 15 Days - Data Analyst Job - Min 6 LPA - YouTube
4 pages
Data Story Telling
No ratings yet
Data Story Telling
8 pages
Splunk Dashboard Quick Reference
No ratings yet
Splunk Dashboard Quick Reference
3 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
Data Visualization & Data Exploration - Unit II
No ratings yet
Data Visualization & Data Exploration - Unit II
26 pages
1738040192479 Class9th Data Literacy
No ratings yet
1738040192479 Class9th Data Literacy
34 pages
Literature Review Open Source Software
100% (1)
Literature Review Open Source Software
11 pages
Iot 1314
No ratings yet
Iot 1314
22 pages
SVG Documentation PDF
No ratings yet
SVG Documentation PDF
26 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
Research and Develop For Implementation of The Network Monitoring System by Using Auvik Tool
No ratings yet
Research and Develop For Implementation of The Network Monitoring System by Using Auvik Tool
5 pages
BDA 2024 Section 01
No ratings yet
BDA 2024 Section 01
34 pages
Visibility Matters: Diagrammatic Renderings of Human Evolution and Diversity in Physical, Serological and Molecular Anthropology
No ratings yet
Visibility Matters: Diagrammatic Renderings of Human Evolution and Diversity in Physical, Serological and Molecular Anthropology
14 pages