0% found this document useful (0 votes)

94 views

Data Science: Lesson 4

The document describes the data analytics life cycle, which consists of 6 phases: discovery, data preparation, model planning, model building, communicating results, and operationalizing. The discovery phase involves learning about the business problem, developing hypotheses, and assessing available data and resources. Key roles in analytics projects include business users, sponsors, project managers, analysts, database administrators, data engineers, and data scientists. The data preparation phase involves extracting, transforming, and loading data into an analytics sandbox.

Uploaded by

lia immie rigo

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Data Science: Lesson 4

Uploaded by

lia immie rigo

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

DATA ANALYTICS LIFE CYCLE

Part 1

Module Objectives

At the end of this module, students must be able to:

1. describe an overview of the data analytics life cycle;

2. enumerate the key roles in an analytics project;

3. elaborate the discovery phase of analytics life cycle ;

4. describe the activities and skillsets of a Data Scientist;

Data Analytics Lifecycle Overview

 Data science projects differ from most traditional Business Intelligence projects and many data analysis projects in
that data science projects are more exploratory in nature.

 Many problems that appear huge and daunting at first can be broken down into smaller pieces or actionable phases
that can be more easily addressed.

 Having a good process ensures a comprehensive and repeatable method for conducting analysis. In addition, it helps
focus time and energy early in the process to get a clear grasp of the business problem to be solved.

 A common mistake made in data science projects is rushing into data collection and analysis, which precludes
spending sufficient time to plan and scope the amount of work involved, understanding requirements, or even framing
the business problem properly.

 Consequently, participants may discover mid-stream that the project sponsors are actually trying to achieve an
objective that may not match the available data, or they are attempting to address an interest that differs from what
has been explicitly communicated.

 When this happens, the project may need to revert to the initial phases of the process for a proper discovery phase,
or the project may be canceled.

 Creating and documenting a process helps demonstrate rigor, which provides additional credibility to the project
when the data science team shares its findings.

 A well-defined process also offers a common framework for others to adopt, so the methods and analysis can be
repeated in the future or as new members join a team.

 The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle
has six phases, and project work can occur in several phases at once.

Key Roles in an Analytics Project

 The following depicts the various roles and key stakeholders of an analytics project. Each plays a critical part in a
successful analytics project.

 Although seven roles are listed, fewer or more people can accomplish the work depending on the scope of the
project, the organizational structure, and the skills of the participants.

 Business User: Someone who understands the domain area and usually benefits from the results. This person can
consult and advise the project team on the context of the project, the value of the results, and how the outputs will be
operationalized. Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfills
this role.
 Project Sponsor: Responsible for the genesis
of the project. Provides the impetus and
requirements for the project and defines the
core business problem. Generally provides the
funding and gauges the degree of value from
the final outputs of the working team. This
person sets the priorities for the project and
clarifies the desired outputs.

 Project Manager: Ensures that key milestones

and objectives are met on time and at the
expected quality.

 Business Intelligence Analyst: Provides

business domain expertise based on a deep
understanding of the data, key performance
indicators (KPIs), key metrics, and business intelligence from a reporting perspective. Business Intelligence Analysts
generally create dashboards and reports and have knowledge of the data feeds and sources.

 Database Administrator (DBA): Provisions and configures the database environment to support the analytics needs
of the working team. These responsibilities may include providing access to key databases or tables and ensuring the
appropriate security levels are in place related to the data repositories.

 Data Engineer: Leverages deep technical skills to assist with tuning SQL queries for data management and data
extraction, and provides support for data ingestion into the analytic sandbox. Whereas the DBA sets up and
configures the databases to be used, the data engineer executes the actual data extractions and performs substantial
data manipulation to facilitate the analytics. The data engineer works closely with the data scientist to help shape
data in the right ways for analyses.

 Data Scientist: Provides subject matter expertise for analytical techniques, data modeling, and applying valid
analytical techniques to given business problems. Ensures overall analytics objectives are met. Designs and
executes analytical methods and approaches with the data available to the project.

Data Analytics Life Cycle

 The following presents an overview of

the Data Analytics Lifecycle that
includes six phases.

 Teams commonly learn new things in

a phase that cause them to go back
and refine the work done in prior
phases based on new insights and
information that have been uncovered.

 The circular arrows convey iterative

movement between phases until the
team members have sufficient
information to move to the next phase.

Data Analytics Lifecycle Overview

Here is a brief overview of the main phases of

the Data Analytics Lifecycle:

 Phase 1—Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether
the organization or business unit has attempted similar projects in the past from which they can learn. The team
assesses the resources available to support the project in terms of people, technology, time, and data. Important
activities in this phase include framing the business problem as an analytics challenge that can be addressed in
subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.

 Phase 2— Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work
with data and perform analytics for the duration of the project. The team needs to execute extract, load, and
transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes
abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In
this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.

 Phase 3— Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and
workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the
relationships between variables and subsequently selects key variables and the most suitable models.

 Phase 4—Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In
addition, in this phase the team builds and executes models based on the work done in the model planning phase.
The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust
environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).

 Phase 5—Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the
results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify
key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.

 Phase 6—Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In
addition, the team may run a pilot project to implement the models in a production environment.

Once team members have run models and produced findings, it is critical to frame these results in a way that is tailored to the
audience that engaged the team. Moreover, it is critical to frame the results of the work in a manner that demonstrates clear
value.

If the team performs a technically accurate analysis but fails to translate the results into a language that resonates with the
audience, people will not see the value, and much of the time and effort on the project will have been wasted.

Phase 1: Discovery

 The first phase of the Data Analytics Lifecycle involves discovery.

 In this phase, the data science team must learn and investigate the problem, develop context and understanding, and
learn about the data sources needed and available for the project.

 In addition, the team formulates initial hypotheses that can later be tested with data

LEARNING THE BUSINESS DOMAIN

 At this early stage in the process, the team needs to determine how much business or domain knowledge the
data scientist needs to develop models in Phases 3 and 4.

 The earlier the team can make this assessment the better, because the decision helps dictate the resources
needed for the project team and ensures the team has the right balance of domain knowledge and technical
expertise.

RESOURCES

 As part of the discovery phase, the team needs to assess the resources available to support the project. In this
context, resources include technology, tools, systems, data, and people.

 During this scoping, there is a need to consider the available tools and technology the team will be using and the
types of systems needed for later phases to operationalize the models. In addition, try to evaluate the level of
analytical sophistication within the organization and gaps that may exist related to tools, technology, and skills.

 In addition to the skills and computing resources, it is advisable to take inventory of the types of data available to
the team for the project. Consider if the data available is sufficient to support the project’s goals. The team will
need to determine whether it must collect additional data, purchase it from outside sources, or transform existing
data.

 Often, projects are started looking only at the data available. When the data is less than hoped for, the size and
scope of the project is reduced to work within the constraints of the existing data.
 Ensure the project team has the right mix of domain experts, customers, analytic talent, and project management
to be effective. In addition, evaluate how much time is needed and if the team has the right breadth and depth of
skills.

Framing the Problem

 Framing the problem well is critical to the success of the project. Framing is the process of stating the analytics
problem to be solved. At this point, it is a best practice to write down the problem statement and share it with the
key stakeholders.

 As part of this activity, it is important to identify the main objectives of the project, identify what needs to be
achieved in business terms, and identify what needs to be done to meet the needs. Additionally, consider the
objectives and the success criteria for the project.

 Perhaps equally important is to establish failure criteria. The failure criteria will guide the team in understanding
when it is best to stop trying or settle for the results that have been gleaned from the data. Establishing criteria for
both success and failure helps the participants avoid unproductive effort and remain aligned with the project
sponsors

Identifying Key Stakeholders

 Another important step is to identify the key stakeholders and their interests in the project. During these
discussions, the team can identify the success criteria, key risks, and stakeholders, which should include anyone
who will benefit from the project or will be significantly impacted by the project. The team should plan to
collaborate with the stakeholders to clarify and frame the analytics problem.

 Depending on the number of stakeholders and participants, the team may consider outlining the type of activity
and participation expected from each stakeholder and participant. This will set clear expectations with the
participants and avoid delays later.

Interviewing the Analytics Sponsor

 When interviewing the main stakeholders, the team needs to take time to thoroughly interview the project
sponsor, who tends to be the one funding the project or providing the high-level requirements.

 This person understands the problem and usually has an idea of a potential working solution. It is critical to
thoroughly understand the sponsor’s perspective to guide the team in getting started on the project.

Interviewing the Analytics Sponsor

Following is a brief list of common questions that are helpful to ask during the discovery phase when
interviewing the project sponsor. The responses will begin to shape the scope of the project and give the team an
idea of the goals and objectives of the project.

 What business problem is the team trying to solve?

 What is the desired outcome of the project?

 What data sources are available?

 What industry issues may impact the analysis?

 What timelines need to be considered?

Developing Initial Hypotheses

 Developing a set of IHs is a key facet of the discovery phase. This step involves forming ideas that the team can test
with data. These IHs form the basis of the analytical tests the team will use in later phases and serve as the
foundation for the findings in Phase 5.

 Another part of this process involves gathering and assessing hypotheses from stakeholders and domain experts
who may have their own perspective on what the problem is, what the solution should be, and how to arrive at a
solution. These stakeholders would know the domain area well and can offer suggestions on ideas to test as the
team formulates hypotheses during this phase.

Identifying Potential Data Sources

The team should perform five main activities during this step of the discovery phase:

 Identify data sources: Make a list of candidate data sources the team may need to test the initial hypotheses
outlined in this phase. Make an inventory of the datasets currently available and those that can be purchased or
otherwise acquired for the tests the team wants to perform.

 Capture aggregate data sources: This is for previewing the data and providing high-level understanding. It enables
the team to gain a quick overview of the data and perform further exploration on specific areas. It also points the
team to possible areas of interest within the data.

 Review the raw data: Obtain preliminary data from initial data feeds. Begin understanding the interdependencies
among the data attributes, and become familiar with the content of the data, its quality, and its limitations.

 Evaluate the data structures and tools needed: The data type and structure dictate which tools the team can use
to analyze the data. This evaluation gets the team thinking about which technologies may be good candidates for the
project and how to start getting access to these tools.

 Scope the sort of data infrastructure needed for this type of problem: In addition to the tools needed, the data
influences the kind of infrastructure that’s required, such as disk storage and network capacity.

 The team can move to the next phase when it has enough information to draft an analytics plan and share it for peer
review. Although a peer review of the plan may not actually be required by the project, creating the plan is a good test
of the team’s grasp of the business problem and the team’s approach to addressing it.

 Creating the analytic plan also requires a clear understanding of the domain area, the problem to be solved, and
scoping of the data sources to be used. Developing success criteria early in the project clarifies the problem definition
and helps the team when it comes time to make choices about the analytical methods being used in later phase.

Phase 2: Data Preparation

 The second phase of the Data Analytics Lifecycle involves data preparation, which includes the steps to explore,
preprocess, and condition data prior to modeling and analysis. In this phase, the team needs to create a robust
environment in which it can explore the data that is separate from a production environment.

 Usually, this is done by preparing an analytics sandbox. To get the data into the sandbox, the team needs to perform
ETLT, by a combination of extracting, transforming, and loading data into the sandbox. Once the data is in the
sandbox, the team needs to learn about the data and become familiar with it. Understanding the data in detail is
critical to the success of the project.

 The team also must decide how to condition and transform data to get it into a format to facilitate subsequent
analysis. The team may perform data visualizations to help team members understand the data, including its trends,
outliers, and relationships among data variables. Each of these steps of the data preparation phase is discussed
throughout this section.

 Data preparation tends to be the most labor-intensive step in the analytics lifecycle. In fact, it is common for teams to
spend at least 50% of a data science project’s time in this critical phase. If the team cannot obtain enough data of
sufficient quality, it may be unable to perform the subsequent steps in the lifecycle process

 The data preparation phase is generally the most iterative and the one that teams tend to underestimate most often.

 This is because most teams and leaders are anxious to begin analyzing the data, testing hypotheses, and getting
answers to some of the questions posed in Phase 1.

 Many tend to jump into Phase 3 or Phase 4 to begin rapidly developing models and algorithms without spending the
time to prepare the data for modeling. Consequently, teams come to realize the data they are working with does not
allow them to execute the models they want, and they end up back in Phase 2 anyway.
Preparing the Analytic Sandbox

 The first subphase of data preparation requires the team to obtain an analytic sandbox (also commonly referred to as
a workspace), in which the team can explore the data without interfering with live production databases.

 Consider an example in which the team needs to work with a company’s financial data. The team should access a
copy of the financial data from the analytic sandbox rather than interacting with the production version of the
organization’s main database, because that will be tightly controlled and needed for financial reporting.

Preparing the Analytic Sandbox

 When developing the analytic sandbox, it is a best practice to collect all kinds of data there, as team members need
access to high volumes and varieties of data for a Big Data analytics project.

 This expansive approach for attracting data of all kind differs considerably from the approach advocated by many
information technology (IT) organizations. Many IT groups provide access to only a particular sub- segment of the
data for a specific purpose. Often, the mindset of the IT group is to provide the minimum amount of data required to
allow the team to achieve its objectives.

 Conversely, the data science team wants access to everything. From its perspective, more data is better, as
oftentimes data science projects are a mixture of purpose-driven analyses and experimental approaches to test a
variety of ideas.

 During these discussions, the data science team needs to give IT a justification to develop an analytics sandbox,
which is separate from the traditional IT-governed data warehouses within an organization.

 The analytic sandbox enables organizations to undertake more ambitious data science projects and move beyond
doing traditional data analysis and Business Intelligence to perform more robust and advanced predictive analytics

Performing ETLT

 As the team looks to begin data transformations, make sure the analytics sandbox has ample bandwidth and reliable
network connections to the underlying data sources to enable uninterrupted read and write.

 In ETL, users perform extract, transform, load processes to extract data from a datastore, perform data
transformations, and load the data back into the datastore. However, the analytic sandbox approach differs slightly; it
advocates extract, load, and then transform.

 In this case, the data is extracted in its raw form and loaded into the datastore, where analysts can choose to
transform the data into a new state or leave it in its original, raw condition. The reason for this approach is that there
is significant value in preserving the raw data and including it in the sandbox before any transformations take place.

 For instance, consider an analysis for fraud detection on credit card usage. Many times, outliers in this data
population can represent higher-risk transactions that may be indicative of fraudulent credit card activity.

 Using ETL, these outliers may be inadvertently filtered out or transformed and cleaned before being loaded into the
datastore. In this case, the very data that would be needed to evaluate instances of fraudulent activity would be
inadvertently cleansed, preventing the kind of analysis that a team would want to do.

 Following the ELT approach gives the team access to clean data to analyze after the data has been loaded into the
database and gives access to the data in its original form for finding hidden nuances in the data.

 This approach is part of the reason that the analytic sandbox can quickly grow large. The team may want clean data
and aggregated data and may need to keep a copy
of the original data to compare against or look for
hidden patterns that may have existed in the data
before the cleaning stage. This process can be
summarized as ETLT to reflect the fact that a team
may choose to perform ETL in one case and ELT in
another.

Learning About the Data

 A critical aspect of a data science project is to

become familiar with the data itself. Spending time
to learn the nuances of the datasets provides context to understand what constitutes a reasonable value and
expected output versus what is a surprising finding.

 In addition, it is important to catalog the data sources that the team has access to and identify additional data sources
that the team can leverage but perhaps does not have access to today. Some of the activities in this step may
overlap with the initial investigation of the datasets that occur in the discovery phase.

 The following table demonstrates one way to organize this type of data inventory.

Data Conditioning

 Data conditioning refers to the process of cleaning data, normalizing datasets, and performing transformations on
the data. A critical step within the Data Analytics Lifecycle, data conditioning can involve many complex steps to join
or merge datasets or otherwise get datasets into a state that enables analysis in further phases.

 Data conditioning is often viewed as a preprocessing step for the data analysis because it involves many operations
on the dataset before developing models to process or analyze the data. This implies that the data-conditioning step
is performed only by IT, the data owners, a DBA, or a data engineer.

 However, it is also important to involve the data scientist in this step because many decisions are made in the data
conditioning phase that affect subsequent analysis. Part of this phase involves deciding which aspects of particular
datasets will be useful to analyze in later steps.

Data Conditioning

Typically, data science teams would rather keep more data than too little data for the analysis. Additional questions and
considerations for the data conditioning step include these.

 What are the data sources? What are the target fields (for example, columns of the tables)?

 How clean is the data?

 How consistent are the contents and files? Determine to what degree the data contains missing or inconsistent
values and if the data contains values deviating from normal.

 Assess the consistency of the data types. For instance, if the team expects certain data to be numeric, confirm it is
numeric or if it is a mixture of alphanumeric strings and text.

Survey and Visualize

 After the team has collected and obtained at least some of the datasets needed for the subsequent analysis, a useful
step is to leverage data visualization tools to gain an overview of the data.

 Seeing high-level patterns in the data enables one to understand characteristics about the data very quickly. One
example is using data visualization to examine data quality, such as whether the data contains many unexpected
values or other indicators of dirty data. Another example is skewness, such as if the majority of the data is heavily
shifted toward one value or end of a continuum.

 Shneiderman [9] is well known for his mantra for visual data analysis of “overview first, zoom and filter, then details-
on-demand.”

 This is a pragmatic approach to visual data analysis. It enables the user to find areas of interest, zoom and filter to
find more detailed information about a particular area of the data, and then find the detailed data behind a particular
area.

 This approach provides a high-level view of the data and a great deal of information about a given dataset in a
relatively short period of time.

When pursuing this approach with a data visualization tool or statistical package, the following guidelines and considerations
are recommended

 Review data to ensure that calculations remained consistent within columns or across tables for a given data field.

 Does the data distribution stay consistent over all the data?

 Assess the granularity of the data, the range of values, and the level of aggregation of the data.
 Does the data represent the population of interest?

 For time-related variables, are the measurements daily, weekly, or monthly?

 Is the data standardized/ normalized?

 Etc.

Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Level of Aspiration
86% (7)
Level of Aspiration
13 pages
Google Certificate (Notes)
No ratings yet
Google Certificate (Notes)
10 pages
Cultural Meanings of Silence
No ratings yet
Cultural Meanings of Silence
11 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
50 pages
Module 1 (3)
No ratings yet
Module 1 (3)
50 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Ch1-Introduction to Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction to Data Analytics & LifeCycle
26 pages
OC - Module 2 - DA Lifecycle 021312
No ratings yet
OC - Module 2 - DA Lifecycle 021312
33 pages
Lec.3.Intro.D.S. Fall 2023
No ratings yet
Lec.3.Intro.D.S. Fall 2023
21 pages
Ch1-Introduction to data analytics & LifeCycle
No ratings yet
Ch1-Introduction to data analytics & LifeCycle
25 pages
MODULE 1
No ratings yet
MODULE 1
40 pages
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
BSR-Data Science
No ratings yet
BSR-Data Science
308 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
Overview of Data Analytics Lifecycle: Unit 2
No ratings yet
Overview of Data Analytics Lifecycle: Unit 2
100 pages
CSCI946 w2-BDLifecycle
No ratings yet
CSCI946 w2-BDLifecycle
76 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
Chap 1
No ratings yet
Chap 1
42 pages
Big Data Module 2
No ratings yet
Big Data Module 2
31 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
OC_Module 2_DA Lifecycle 021312
No ratings yet
OC_Module 2_DA Lifecycle 021312
33 pages
OC_Module 2_DA Lifecycle 021312
No ratings yet
OC_Module 2_DA Lifecycle 021312
33 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
DA_001
No ratings yet
DA_001
7 pages
Key Roles and Life Cycle
No ratings yet
Key Roles and Life Cycle
4 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
5 Data Analytics Life Cycle
No ratings yet
5 Data Analytics Life Cycle
18 pages
What Is A Data Analytics Lifecycle
No ratings yet
What Is A Data Analytics Lifecycle
8 pages
LIFE CYCLE
No ratings yet
LIFE CYCLE
35 pages
Course 1 Data Analyst Data Data Everywhere
No ratings yet
Course 1 Data Analyst Data Data Everywhere
83 pages
CDA C1 R 226 en File 49.en
No ratings yet
CDA C1 R 226 en File 49.en
2 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Data analysis course
No ratings yet
Data analysis course
11 pages
Unit - I DA.pptx
No ratings yet
Unit - I DA.pptx
107 pages
Different Roles in Data Science
No ratings yet
Different Roles in Data Science
11 pages
2-Data Analytics Lifecycle
No ratings yet
2-Data Analytics Lifecycle
17 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
Chapter 4
No ratings yet
Chapter 4
28 pages
Differences between Data Science and Data Analytics
No ratings yet
Differences between Data Science and Data Analytics
10 pages
DATA ANALYTICS 1
No ratings yet
DATA ANALYTICS 1
13 pages
Chapter 02 DataAnalyticsLifecycle
No ratings yet
Chapter 02 DataAnalyticsLifecycle
44 pages
Part1 Ds ML Introduction
No ratings yet
Part1 Ds ML Introduction
61 pages
Demonstrating Design for Six Sigma
From Everand
Demonstrating Design for Six Sigma
Robert Perrine
3/5 (2)
Two
No ratings yet
Two
10 pages
First Hand
No ratings yet
First Hand
4 pages
M 2 Data Analytics Lifecycle
100% (1)
M 2 Data Analytics Lifecycle
22 pages
2 Da
100% (1)
2 Da
17 pages
Complete Notes of BA
100% (1)
Complete Notes of BA
22 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Data Analytics
No ratings yet
Data Analytics
90 pages
Coursera
No ratings yet
Coursera
12 pages
unit-1-221226040256-44f48981
No ratings yet
unit-1-221226040256-44f48981
32 pages
Module I(Introduction Data Analytics Life Cycle) Part II (1)
No ratings yet
Module I(Introduction Data Analytics Life Cycle) Part II (1)
103 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Elicitation Techniques for Business Analysis
From Everand
Elicitation Techniques for Business Analysis
Kadir Çamoğlu
No ratings yet
Managing Project Teams: Shortcuts to success
From Everand
Managing Project Teams: Shortcuts to success
Elizabeth Harrin
No ratings yet
DataAnalytics-Chap-1
No ratings yet
DataAnalytics-Chap-1
36 pages
Q
No ratings yet
Q
28 pages
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 3 - Cement and Concrete
No ratings yet
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 3 - Cement and Concrete
40 pages
2020 BUILDING TECHNOLOGY 1 Module 2 Lecture 2 - Waterproofing and Dampproofing
100% (1)
2020 BUILDING TECHNOLOGY 1 Module 2 Lecture 2 - Waterproofing and Dampproofing
56 pages
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 2a - Stone and Concrete Masonry
No ratings yet
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 2a - Stone and Concrete Masonry
41 pages
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 2b - Stone and Concrete Masonry
No ratings yet
2020 BUILDING TECHNOLOGY 1 Module 1 Lecture 2b - Stone and Concrete Masonry
43 pages
2020 BUILDING TECHNOLOGY 1 - Module 2 Lecture 1 Metals
No ratings yet
2020 BUILDING TECHNOLOGY 1 - Module 2 Lecture 1 Metals
96 pages
Data Science: Lesson 5
No ratings yet
Data Science: Lesson 5
6 pages
RoadMap BBA
No ratings yet
RoadMap BBA
4 pages
Seminar 1 New 2021
No ratings yet
Seminar 1 New 2021
4 pages
Definition of Key Terms 1
No ratings yet
Definition of Key Terms 1
2 pages
Nicole McIntyre Teaching Resume
No ratings yet
Nicole McIntyre Teaching Resume
2 pages
Eat That Frog
No ratings yet
Eat That Frog
54 pages
Tour Guiding
No ratings yet
Tour Guiding
2 pages
SSC Prev Year Spelling Correction - Booklet
No ratings yet
SSC Prev Year Spelling Correction - Booklet
20 pages
Immediate download Emotions, Value, and Agency Christine Tappolet ebooks 2024
100% (6)
Immediate download Emotions, Value, and Agency Christine Tappolet ebooks 2024
55 pages
Bullington, J - The Expression of The Psychosomatic Body From A Phenomenological Perspective - The Lived Body
No ratings yet
Bullington, J - The Expression of The Psychosomatic Body From A Phenomenological Perspective - The Lived Body
20 pages
Theories and Guidance
No ratings yet
Theories and Guidance
17 pages
Cape Verdean Creole, The Syntax of (Baptista)
100% (2)
Cape Verdean Creole, The Syntax of (Baptista)
317 pages
DLL Math 3 4
75% (4)
DLL Math 3 4
6 pages
Reflection Notes On Demonstration of Teaching Practices
No ratings yet
Reflection Notes On Demonstration of Teaching Practices
4 pages
Unit 5:We'Ve Got English PERIOD 3: Skillstime Objectives:: Course: Family and Friends 2 Prepared By: Nguyen Tam
No ratings yet
Unit 5:We'Ve Got English PERIOD 3: Skillstime Objectives:: Course: Family and Friends 2 Prepared By: Nguyen Tam
4 pages
Northern Iloilo Polytechnic State College Concepcion Campus Concepcion, Iloilo Teacher Education Department
No ratings yet
Northern Iloilo Polytechnic State College Concepcion Campus Concepcion, Iloilo Teacher Education Department
8 pages
Student: Grade: Subject: Semester /school Year: Activity:: GRADING SCALE 1-Learning Through Cooperation
No ratings yet
Student: Grade: Subject: Semester /school Year: Activity:: GRADING SCALE 1-Learning Through Cooperation
3 pages
English Quarter 1week 2 - Performance Tasks
No ratings yet
English Quarter 1week 2 - Performance Tasks
3 pages
Why 70% of The Change Initiatives Fail
No ratings yet
Why 70% of The Change Initiatives Fail
19 pages
Aimee Eberhard Assignment 1 Tiered Lesson
No ratings yet
Aimee Eberhard Assignment 1 Tiered Lesson
6 pages
An Attentive Translation Model For Next-Item Recommendation
No ratings yet
An Attentive Translation Model For Next-Item Recommendation
12 pages
Subconscious Programming
100% (4)
Subconscious Programming
51 pages
Edtpa Lesson Plan Guide LPG - 1 1
No ratings yet
Edtpa Lesson Plan Guide LPG - 1 1
4 pages
Multiple Intelligence (MI) - Howard Gardner: Traditional Definition
No ratings yet
Multiple Intelligence (MI) - Howard Gardner: Traditional Definition
3 pages
AI Tests
No ratings yet
AI Tests
13 pages
Employee Testing and Selection
No ratings yet
Employee Testing and Selection
34 pages
2011 Engleză Etapa Judeteana Subiecte Clasa A XI-A 0
100% (1)
2011 Engleză Etapa Judeteana Subiecte Clasa A XI-A 0
3 pages
Concept & Nature of Communication
No ratings yet
Concept & Nature of Communication
13 pages
Barriers in Communication
No ratings yet
Barriers in Communication
2 pages

Data Science: Lesson 4

Uploaded by

Data Science: Lesson 4

Uploaded by

DATA ANALYTICS LIFE CYCLE

At the end of this module, students must be able to:

1. describe an overview of the data analytics life cycle;

2. enumerate the key roles in an analytics project;

3. elaborate the discovery phase of analytics life cycle ;

4. describe the activities and skillsets of a Data Scientist;

Data Analytics Lifecycle Overview

Key Roles in an Analytics Project

 Project Manager: Ensures that key milestones

 Business Intelligence Analyst: Provides

Data Analytics Life Cycle

 The following presents an overview of

 Teams commonly learn new things in

 The circular arrows convey iterative

Data Analytics Lifecycle Overview

Here is a brief overview of the main phases of

 The first phase of the Data Analytics Lifecycle involves discovery.

LEARNING THE BUSINESS DOMAIN

Framing the Problem

Identifying Key Stakeholders

Interviewing the Analytics Sponsor

Interviewing the Analytics Sponsor

 What business problem is the team trying to solve?

 What is the desired outcome of the project?

 What data sources are available?

 What industry issues may impact the analysis?

 What timelines need to be considered?

Developing Initial Hypotheses

Identifying Potential Data Sources

Phase 2: Data Preparation

Preparing the Analytic Sandbox

Learning About the Data

 A critical aspect of a data science project is to

 How clean is the data?

Survey and Visualize

 For time-related variables, are the measurements daily, weekly, or monthly?

 Is the data standardized/ normalized?

You might also like