0% found this document useful (0 votes)
75 views

Ch1-Introduction to data analytics & LifeCycle

Chapter 1 Introduction to data analytics and lifecycle

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Ch1-Introduction to data analytics & LifeCycle

Chapter 1 Introduction to data analytics and lifecycle

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Ch1-Introduction to data

analytics & LifeCycle


Prepaired by
Prof. Rashmi Mahajan
Key Roles for Data Analytics project
• There are certain key roles that are required for the complete and
fulfilled functioning of the data science team to execute projects on
analytics successfully. The key roles are seven in number.
• Each key plays a crucial role in developing a successful analytics
project. There is no hard and fast rule for considering the listed
seven roles, they can be used fewer or more depending on the
scope of the project, skills of the participants, and organizational
structure.
• Example –
For a small, versatile team, these listed seven roles may be fulfilled
by only three to four people but a large project on the contrary may
require 20 or more people for fulfilling the listed roles.
1.Business User :
1.The business user is the one who understands
the main area of the project and is also basically
benefited from the results.
2.This user gives advice and consult the team
working on the project about the value of the
results obtained and how the operations on the
outputs are done.
3.The business manager, line manager, or deep
subject matter expert in the project mains
fulfills this role.
2. Project Sponsor :
1. The Project Sponsor is the one who is responsible to initiate the
project. Project Sponsor provides the actual requirements for the
project and presents the basic business issue.
2. He generally provides the funds and measures the degree of
value from the final output of the team working on the project.
3. This person introduce the prime concern and brooms the desired
output.

3. Project Manager :
1. This person ensures that key milestone and purpose of the
project is met on time and of the expected quality.
1.Business Intelligence Analyst :
1. Business Intelligence Analyst provides business domain perfection
based on a detailed and deep understanding of the data, key
performance indicators (KPIs), key matrix, and business intelligence
from a reporting point of view.
2. This person generally creates fascia and reports and knows about
the data feeds and sources.

1.Database Administrator (DBA) :


1. DBA facilitates and arrange the database environment to support
the analytics need of the team working on a project.
2. His responsibilities may include providing permission to key
databases or tables and making sure that the appropriate security
stages are in their correct places related to the data repositories or
not.
Data Engineer :
1.Data engineer grasps deep technical skills to assist with tuning
SQL queries for data management and data extraction and
provides support for data intake into the analytic sandbox.
2.The data engineer works jointly with the data scientist to help
build data in correct ways for analysis.

Data Scientist :
1.Data scientist facilitates with the subject matter expertise for
analytical techniques, data modelling, and applying correct
analytical techniques for a given business issues.
2.He ensures overall analytical objectives are met.
3.Data scientists outline and apply analytical methods and
proceed towards the data available for the concerned project.
What is Data Analytics Life Cycle?
In today’s digital world, data is of excellent
significance importance. It undergoes many
stages throughout life, during its creation, tests,
processes, consumption, and reuse. The Data
Analytics Life Cycle maps out the locations for
professionals working on data analytics projects.
These phases are arranged in a circular structure
that forms a Data Analytics Life Cycle. Each step
has its significance and characteristics.
Why is Data Analytics Lifecycle Essential?

The Life cycle of Data Analytics is designed to be used


with significant big data projects. It is used to portray the
actual project correctly; the cycle is iterative. A step-by-
step technique is needed to arrange the actions and
tasks involved in gathering, processing, analyzing, and
reusing data to explore the various needs for assessing
the information on big data. Data analysis is modifying,
processing, and cleaning raw data to obtain a valuable,
significant statement that supports business decision-
making.
Importance of life cycle of data analytics
• The life cycle of Data Analytics defines the roadmap of how
data is generated, collected, processed, used, and analyzed to
achieve business goals. It offers a systematic way to manage
data for converting it into information that can use to fulfil
organizational and project goals. The process provides the
direction and methods to extract information from the data and
proceed in the right direction to accomplish business goals.
• Data professionals use the lifecycle’s circular form to proceed
with data analytics in either a forward or backward direction.
Based on the newly received insights, they can decide whether
to proceed with their existing research or scrap it and redo the
complete analysis. The life cycle of Data Analytics guides them
throughout this process.
Phase 1: Data Discovery and Formation
• Every good journey begins with a purpose in mind. In this phase, you will
identify your desired data objectives and how best to attain them through
data analytics Life Cycle implementation. Evaluations and assessments
should also be undertaken during this initial phase to develop a basic
hypothesis capable of solving business issues or problems.
• In the initial step, data will be evaluated for its potential uses and demands
– such as where it comes from, what message you wish for it to send and
how this incoming information benefits your business.
• As a data analyst, you will need to explore case studies using similar data
analytics and, most crucially, examine current company trends. Then you
must evaluate all in-house infrastructure and resources, as well as time and
technological needs, in order to match the previously acquired data.
• Following the completion of the evaluations, the team closes this stage with
hypotheses that will be tested using data later on. This is the first and most
critical step in the life cycle of big data analytics.
Key takeaways:
1.The data science team investigates and learns about
the challenge.
2.Create context and understanding.
3.Learn about the data sources that will be required and
available for the project.
4.The team develops preliminary hypotheses that can
later be tested with data.
Phase 2: Data Preparation and Processing
Data preparation and processing involves gathering, sorting,
processing and purifying collected information to make sure it can
be utilized by subsequent steps of analysis. An important element
of this step is making sure all necessary information is readily
accessible before moving ahead with processing it further.
Following are methods of data acquisition
• Data Collection: Draw information from external sources.
• Data Entry: Within an organization, data entry refers to creating
new points of information using either digital technologies or
manual input procedures.
• Signal Reception: Accumulating data from digital devices like
the Internet of Things devices and control systems.
Key Takeaways :
• Steps to explore, preprocess, and condition data
prior to modeling and analysis.
• It requires the presence of an analytic sandbox,
the team execute, load, and transform, to get data
into the sandbox.
• Data preparation tasks are likely to be performed
multiple times and not in predefined order.
• Several tools commonly used for this phase are –
Hadoop, Alpine Miner, Open Refine, etc.
Phase 3: Design a Model
After you’ve defined your business goals and gathered a large amount of
data (formatted, unformatted, or semi-formatted), it’s time to create a
model that uses the data to achieve the goal. Model planning is the name
given to this stage of the data analytics process.
There are numerous methods for loading data into the system and starting
to analyze it:
• ETL (Extract, Transform, and Load) converts the information before
loading it into a system using a set of business rules.
• ELT (Extract, Load, and Transform) loads raw data into the sandbox
before transforming it.
• ETLT (Extract, Transform, Load, Transform) is a combination of two
layers of transformation.
This step also involves teamwork to identify the approaches, techniques,
and workflow to be used in the succeeding phase to develop the model.
The process of developing a model begins with finding the relationship
between data points to choose the essential variables and, subsequently,
create a suitable model.
Key Takeaways :

• Team explores data to learn about relationships between


variables and subsequently, selects key variables and the
most suitable models.
• In this phase, data science team develop data sets for
training, testing, and production purposes.
• Team builds and executes models based on the work done
in the model planning phase.
• Several tools commonly used for this phase are – Matlab,
STASTICA.
Phase 4: Model Building
This stage of the data analytics life cycle involves creating
datasets for testing, training, and production. The data
analytics professionals develop and operate the model
they designed in the previous stage with proper effort.
They use tools and methods, such as decision trees,
regression techniques logistic regression), and neural
networks to create and run the model. The experts also
run the model through a trial run to see if it matches the
datasets.
It assists them in determining whether the tools they now
have will be enough to execute the model or if a more
robust system is required for it to function successfully.
Key Takeaways:
• The team creates datasets for use in testing, training,
and production.
• The team also examines if its present tools will serve
for running the models or if a more robust
environment is required for model execution.
• Rand PL/R, Octave, and WEKA are examples of free
or open-source tools.
Phase 5: Result Communication and Publication
Recall the objective you set for your company in phase 1.
Now is the time to see if the tests you ran in the
previous phase matched those criteria.
The communication process begins with cooperation with
key stakeholders to decide whether the project’s
outcomes are successful or not.
The project team is responsible for identifying the major
conclusions of the analysis, calculating the business value
associated with the outcome, and creating a narrative to
summarize and communicate the results to stakeholders.
Key Takeaways:
•After executing model team need to compare
outcomes of modeling to criteria established for
success and failure.
•Team considers how best to articulate findings and
outcomes to various team members and
stakeholders, taking into account warning,
assumptions.
•Team should identify key findings, quantify business
value, and develop narrative to summarize and
convey findings to stakeholders.
Phase 6: Measuring Effectiveness / Operationalize

As your data analytics life cycle comes to an end, the final


stage is to offer stakeholders a complete report that includes
important results, coding, briefings, and technical papers or
documents.
Furthermore, to assess the effectiveness of the study, the data
is transported from the sandbox to a live environment and
observed to see if the results match the desired business aim.
If the findings meet the objectives, the reports and outcomes
are finalized. However, if the conclusion differs from the
purpose stated in phase 1, then you can go back in the data
analytics life cycle to any of the previous phases to adjust your
input and get a different result.
Keytakeways :
• The team distributes the benefits of the project to a wider
audience. It sets up a pilot project that will deploy the work in a
controlled manner prior to expanding the project to the entire
enterprise of users.
• This technique allows the team to gain insight into the
performance and constraints related to the model within a
production setting at a small scale and then make necessary
adjustments before full deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL, MADlib, and
Octave.
Thank You
Any Questions????’s

You might also like