Data Science
Data Science
Phase 1—Discovery: Before you begin the project, it is important to understand the
various specifications, requirements, priorities and required budget. You must possess
the ability to ask the right questions. Here, you assess if you have the required
resources present in terms of people, technology, time and data to support the
project. In this phase, you also need to frame the business problem and formulate initial
hypotheses (IH) to test.
Phase 2—Data preparation: In this phase, you require analytical sandbox in which
you can perform analytics for the entire duration of the project. You need to explore,
preprocess and condition data prior to modeling. Further, you will perform ETLT
(extract, transform, load and transform) to get data into the sandbox. Let’s have a look
at the Statistical Analysis flow below.
Phase 3—Model planning: Here, you will determine the methods and techniques to
draw the relationships between variables.
1
These relationships will set the base for the algorithms which you will implement
in the next phase. You will apply Exploratory Data Analytics (EDA) using various
statistical formulas and visualization tools.
Phase 4—Model building: In this phase, you will develop datasets for training and
testing purposes. You will consider whether your existing tools will suffice for running
the models or it will need a more robust environment (like fast and parallel
processing). You will analyze various learning techniques like classification, association
and clustering to build the model.
2
Phase 5—Operationalize: In this phase, you deliver final reports, briefings, code and
technical documents. In addition, sometimes a pilot project is also implemented in a
real-time production environment. This will provide you a clear picture of the
performance and other related constraints on a small scale before full deployment.