0% found this document useful (0 votes)
119 views

Research Methodology (Data Analysis)

Data processing involves editing, coding, classifying, tabulating and organizing data to extract relevant information and establish order. It consists of five main steps: 1) editing data for errors and omissions, 2) coding data into classes/categories, 3) classifying data into groups, 4) tabulating data into tables for analysis, and 5) analyzing the tabulated data to discover useful information for decision making. Key challenges include collecting complete and consistent data from various sources in different formats and integrating the diverse data.

Uploaded by

Masood Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views

Research Methodology (Data Analysis)

Data processing involves editing, coding, classifying, tabulating and organizing data to extract relevant information and establish order. It consists of five main steps: 1) editing data for errors and omissions, 2) coding data into classes/categories, 3) classifying data into groups, 4) tabulating data into tables for analysis, and 5) analyzing the tabulated data to discover useful information for decision making. Key challenges include collecting complete and consistent data from various sources in different formats and integrating the diverse data.

Uploaded by

Masood Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Processing

Data processing is concerned with editing, coding, classifying, tabulating and charting and diagramming
research data. The essence of data processing in research is data reduction. Data reduction involves winnowing
out the irrelevant from the relevant data and establishing order from chaos and giving shape to a mass of
data. Data processing in research consists of five important steps. They are:

1.Editing of Data

Editing is the first step in data processing. Editing is the process of examining the data collected in
questionnaires/schedules to detect errors and omissions and to see that they are corrected and the schedules are
ready for tabulation. There are different types of editing. They are:

1. Editing for quality asks the following questions: are the data forms complete, are the data free of bias,
are the recordings free of errors, are the inconsistencies in responses within limits, are there evidences to
show dishonesty of enumerators or interviewers and are there any wanton manipulation of data.
2. Editing for tabulation does certain accepted modification to data or even rejecting certain pieces of data
in order to facilitate tabulation. or instance, extremely high or low value data item may be ignored or
bracketed with suitable class interval.
3. Field Editing is done by the enumerator. The schedule filled up by the enumerator or the respondent
might have some abbreviated writings, illegible writings and the like. These are rectified by the
enumerator. This should be done soon after the enumeration or interview before the loss of memory. The
field editing should not extend to giving some guess data to fill up omissions.
4. Central Editing is done by the researcher after getting all schedules or questionnaires or forms from the
enumerators or respondents. Obvious errors can be corrected. For missed data or information, the editor
may substitute data or information by reviewing information provided by likely placed other respondents.
A definite inappropriate answer is removed and “no answer” is entered when reasonable attempts to get
the appropriate answer fail to produce results.

2.Coding of Data

Coding is necessary for efficient analysis and through it the several replies may be reduced to a small number of
classes which contain the critical information required for analysis. Coding decisions should usually be taken at
the designing stage of the questionnaire.

Coding is the process/operation by which data/responses are organized into classes/categories and numerals or
other symbols are given to each item according to the class in which it falls. In other words, coding involves two
important operations;

(a) Deciding the categories to be used and

(b) Allocating individual answers to them.

3.Classification of Data

Classification or categorization is the process of grouping the data under various understandable homogeneous
groups for the purpose of convenient interpretation. A uniformity of attributes is the basic criterion for
classification; and the grouping of data is made according to similarity. Classification becomes necessary when
there is diversity in the data collected for meaningless for meaningful presentation and analysis. A good
classification should have the characteristics of clarity, homogeneity, equality of scale, purposefulness and
accuracy.

4.Tabulation of Data

Tabulation is the process of summarizing raw data and displaying it in compact form for further analysis.
Therefore, preparing tables is a very important step. Tabulation may be by hand, mechanical, or electronic. The
choice is made largely on the basis of the size and type of study, alternative costs, time pressures, and the
availability of computers, and computer programmes. If the number of questionnaire is small, and their length
short, hand tabulation is quite satisfactory.

Generally a research table has the following parts:

• table number
• title of the table
• caption
• stub (row heading)
• body
• head note
• foot note

Stages of Data Processing

1. Data collection

Collecting data is the first step in data processing. Data is pulled from available sources, including data lakes and
data warehouses. It is important that the data sources available are trustworthy and well-built so the data collected
(and later used as information) is of the highest possible quality.

2. Data preparation

Once the data is collected, it then enters the data preparation stage. Data preparation, often referred to as “pre-
processing” is the stage at which raw data is cleaned up and organized for the following stage of data processing.
During preparation, raw data is diligently checked for any errors. The purpose of this step is to eliminate bad data
(redundant, incomplete, or incorrect data) and begin to create high-quality data for the best business intelligence.

3. Data input

The clean data is then entered into its destination (perhaps a CRM like Sales force or a data warehouse
like Redshift), and translated into a language that it can understand. Data input is the first stage in which raw data
begins to take the form of usable information.

4. Processing

During this stage, the data inputted to the computer in the previous stage is actually processed for interpretation.
Processing is done using machine learning algorithms, though the process itself may vary slightly depending on
the source of data being processed (data lakes, social networks, connected devices etc.) and its intended use
(examining advertising patterns, medical diagnosis from connected devices, determining customer needs, etc.).

5. Data output/interpretation

The output/interpretation stage is the stage at which data is finally usable to non-data scientists. It is translated,
readable, and often in the form of graphs, videos, images, plain text, etc.). Members of the company or institution
can now begin to self-serve the data for their own data analytics projects.

6. Data storage

The final stage of data processing is storage. After all of the data is processed, it is then stored for future use.
While some information may be put to use immediately, much of it will serve a purpose later on. When data is
properly stored, it can be quickly and easily accessed by members of the organization when needed.

Challenges in Data Processing

1.Collection of Data

The very first challenge in data processing comes in the collection or acquisition of the correct data for the
input. The challenge here is to collect the exact data to get the proper result. As the result directly depends
on the input data. Hence, it is vital to collect the correct and exact data to get the desired result.

2.Duplicacy of Data

Data is collected from different data sources, then many times it happens that there is duplicacy in data. The
same entries and entities may present a number of times during the data encoding stage. This duplicate data
is redundant and may produce an incorrect result.

3.Inconsistency of Data

When we collect a huge amount of data and there is no guarantee that the data would be complete or all the
fields that we need are filled correctly. Then, the data may be ambiguous. the input/raw data is heterogeneous
in nature and is collected from autonomous data sources, the data may conflict with each other in three
different levels:

• Schema Level: Different data sources have different data models and different schemas within the
same data model.
• Data representation level: Data in different sources are represented in different structures, languages,
and measurements.
• Data value level: Sometimes, the same data objects have factual discrepancies among various data
sources. This occurs when we obtain two data objects from different sources and they are identified
as versions of each other. But, the value corresponding to their attributes differ.
4.Variety of Data

The input data, as it is collected from different sources, can contain different forms. The rows and columns
of a relational database don’t limit the data. The data varies from application to application and source to
source. Much of these data is unstructured and cannot fit into a spreadsheet or a relational database.

There may be that the collected data is in text or tabular format. On the other hand, it may be a collection of
photographs and videos and sometimes maybe just audio. Sometimes to get the desired result, there is a need
to process different forms of data altogether.

5.Data Integration

Data integration means to combine the data from various sources and present it in a unified view. With the
increased variety of data and different formats of data, the challenge to integrate the data enlarges.

The data integration consists of various challenges that are as follows:

• Isolation: Majority of the applications are developed and deployed in isolation which makes it difficult to
integrate the data across various applications.
• Technological Advancements: With the advancement in the technology, the ways to store and retrieve
data changes. The problem here occurs with the integration of newer data to the legacy data.
• Data Problems: The challenge in data integration rises when the data is incorrect, incomplete or is of the
wrong format.

Data Analysis

Data Analysis is a process of collecting, transforming, cleaning, and modeling data with the goal of discovering
the required information. The purpose of Data Analysis is to extract useful information from data and taking the
decision based upon the data analysis.

Whenever we take any decision in our day-to-day life is by thinking about what happened last time or what will
happen by choosing that particular decision. This is nothing but analyzing our past or future and making decisions
based on it. For that, we gather memories of our past or dreams of our future. So that is nothing but data analysis.
Now same thing analyst does for business purposes, is called Data Analysis.

Types of Data Analysis

1.Text Analysis

Text Analysis is also referred to as Data Mining. It is a method to discover a pattern in large data sets using
databases or data mining tools. It used to transform raw data into business information. Business Intelligence
tools are present in the market which is used to take strategic business decisions. Overall it offers a way to extract
and examine data and deriving patterns and finally interpretation of the data.

2.Statistical Analysis

Statistical Analysis shows "What happen?" by using past data in the form of dashboards. Statistical Analysis
includes collection, Analysis, interpretation, presentation, and modeling of data. It analyses a set of data or a
sample of data. There are two categories of this type of Analysis - Descriptive Analysis and Inferential Analysis.

• Descriptive Analysis: Analyses complete data or a sample of summarized numerical data. It shows mean and
deviation for continuous data whereas percentage and frequency for categorical data.

• Inferential Analysis: Analyses sample from complete data. In this type of Analysis, you can find different
conclusions from the same data by selecting different samples.

3.Diagnostic Analysis

Diagnostic Analysis shows "Why did it happen?" by finding the cause from the insight found in Statistical
Analysis. This Analysis is useful to identify behavior patterns of data. If a new problem arrives in your business
process, then you can look into this Analysis to find similar patterns of that problem. And it may have chances to
use similar prescriptions for the new problems.

4.Predictive Analysis

Predictive Analysis shows "what is likely to happen" by using previous data. The simplest example is like if last
year I bought two dresses based on my savings and if this year my salary is increasing double then I can buy four
dresses. But of course it's not easy like this because you have to think about other circumstances like chances of
prices of clothes is increased this year or maybe instead of dresses you want to buy a new bike, or you need to
buy a house!

So here, this Analysis makes predictions about future outcomes based on current or past data. Forecasting is just
an estimate. Its accuracy is based on how much detailed information you have and how much you dig in it.

5.Prescriptive Analysis

Prescriptive Analysis combines the insight from all previous Analysis to determine which action to take in a
current problem or decision. Most data-driven companies are utilizing Prescriptive Analysis because predictive
and descriptive Analysis is not enough to improve data performance. Based on current situations and problems,
they analyze the data and make decisions.
Data Analysis Process

Data
Data
Visualisation
Interpretation
Data
Analysing
Data
Data Cleaning
Collection
Data
Requirement
Gathering

Data Requirement Gathering

First of all, we have to think about why do we want to do this data analysis? All we need to find out the purpose
or aim of doing the Analysis. We have to decide which type of data analysis we wanted to do! In this phase, we
have to decide what to analyze and how to measure it, we have to understand why we are investigating and what
measures we have to use to do this Analysis.

Data Collection

After requirement gathering, we will get a clear idea about what things we have to measure and what should be
our findings. Now it's time to collect data based on requirements. Once we collect data, remember that the
collected data must be processed or organized for Analysis. As we collected data from various sources, we must
have to keep a log with a collection date and source of the data.

Data Cleaning

Now whatever data is collected may not be useful or irrelevant to aim of Analysis, hence it should be cleaned.
The data which is collected may contain duplicate records, white spaces or errors. The data should be cleaned and
error free. This phase must be done before Analysis because based on data cleaning, output of Analysis will be
closer to our expected outcome.

Data Analysis

Once the data is collected, cleaned, and processed, it is ready for Analysis. As we manipulate data, we may find,
we have the exact information to our need, or we might need to collect more data. During this phase, we can use
data analysis tools and software which will help us to understand, interpret, and derive conclusions based on the
requirements.
Data Interpretation

After analyzing data, it's finally time to interpret results. We can choose the way to express or communicate data
analysis either we can use simply in words or maybe a table or chart. Then use the results of data analysis process
to decide best course of action.

Data Visualization

Data visualization is very common in our day to day life; they often appear in the form of charts and graphs. In
other words, data shown graphically so that it will be easier for the human brain to understand and process it.
Data visualization often used to discover unknown facts and trends. By observing relationships and comparing
datasets, we can find a way to find out meaningful information.

You might also like