0% found this document useful (0 votes)
69 views16 pages

Data Analytics Lifecycle Explained

The Data Analytics Lifecycle consists of six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize. The Discovery phase focuses on understanding the business domain, identifying stakeholders, and developing hypotheses, while the Data Preparation phase involves creating an analytic sandbox, performing ETLT, and conditioning data. Tools like Alpine Miner and OpenRefine are commonly used for data preparation to facilitate advanced analytics projects.

Uploaded by

Vignesh U
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views16 pages

Data Analytics Lifecycle Explained

The Data Analytics Lifecycle consists of six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize. The Discovery phase focuses on understanding the business domain, identifying stakeholders, and developing hypotheses, while the Data Preparation phase involves creating an analytic sandbox, performing ETLT, and conditioning data. Tools like Alpine Miner and OpenRefine are commonly used for data preparation to facilitate advanced analytics projects.

Uploaded by

Vignesh U
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Analytics Lifecycle

Phase 1: Discovery

Phase 2: Data Preparation

Phase 3: Model Planning

Phase 4: Model Building

Phase 5: Communicate Results

Phase 6: Operationalize
Overview of
Data Analytics Lifecycle
Phase 1: Discovery
Phase 1: Discovery

Learning the Business Domain

Resources Available-
Time,People,Tech,data

Framing the Problem

Identifying Key Stakeholders

Interviewing the Analytics Sponsor

Developing Initial Hypotheses

Identifying Potential Data Sources


Phase 2: Data Preparation
Phase 2: Data Preparation

Preparing the Analytic Sandbox

Performing ETLT

Learning about the Data

Data Conditioning

Survey and Visualize

Common Tools for Data Preparation


Preparing the Analytic Sandbox
● Create the analytic sandbox (also called workspace)
● Allows team to explore data without interfering with
live production data
● Sandbox collects all kinds of data
● The sandbox allows organizations to undertake
ambitious projects beyond traditional data analysis
and BI to perform advanced predictive analytics
Performing ETLT
(Extract, Transform, Load, Transform)
● In ETL users perform extract, transform, load
● In the sandbox the process is often ELT – early
load preserves the raw data which can be useful
to examine
● [Link]
● Example – in credit card fraud detection, outliers
can represent high-risk transactions that might be
inadvertently filtered out or transformed before
being loaded into the database
Outlier

[Link]
Learning about the Data

Determines the data available to


the team early in the project

Highlights gaps – identifies data not


currently available

Identifies data outside the


organization that might be useful
Learning about the Data
Sample Dataset Inventory
Data Conditioning

Cleaning
data
Normalizing

Managing Missing
datasets
data, Outliers, and
Unwanted
Data Performing
transformation
Survey and Visualize

[Link]
Survey and Visualize

● Leverage data visualization tools to gain an


overview of the data
● “Overview first, zoom and filter, then details-on-demand”
○ This enables the user to find areas of interest, zoom
and filter to find more detailed information about a
particular area, then find the detailed data in that area

○ [Link]

[Link]
Survey and Visualize
Guidelines and Considerations

● Assess the granularity of the data, the range of values,


and the level of aggregation of the data
● Does the data represent the population of interest?
● Check time-related variables – daily, weekly, monthly?
Is this good enough?
● Is the data standardized/normalized? Scales consistent?
● For geospatial datasets, are state/country abbreviations
consistent
Common Tools for Data Preparation

Alpine Data
Open
Hadoop Wrangler
Miner Refine

Tool for
provides a GUI free, open data
Perform for creating source tool for cleansing &
parallel ingest analytic working with transformat
and analysis workflows messy data ion

You might also like