0% found this document useful (0 votes)
22 views14 pages

Example of Research Process

The document outlines a research process to develop a framework for data warehouse development that incorporates data quality considerations. It describes 4 phases: literature review to identify key elements, exploratory studies and expert validation, data analysis to achieve research objectives, and developing the proposed framework. The final section describes a case study protocol for applying the framework.

Uploaded by

Teddy Iswahyudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

Example of Research Process

The document outlines a research process to develop a framework for data warehouse development that incorporates data quality considerations. It describes 4 phases: literature review to identify key elements, exploratory studies and expert validation, data analysis to achieve research objectives, and developing the proposed framework. The final section describes a case study protocol for applying the framework.

Uploaded by

Teddy Iswahyudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATA WAREHOUSE DEVELOPMENT FRAMEWORK

WITH DATA QUALITY CONSIDERATION

The research problem undertaken in this study can be defined in the question, ‘How
can organisations build a framework for the design and development process of DW projects
to effectively incorporate DQ issues and accordingly improve data warehouse quality? The
aim articulated in the research question is achieved by fulfilling the following objectives:

Objective 1: To investigate the common practices of DW development stages in some


organizations;
Objective 2: To identify DQ dimensions related to DW development stages that is
applicable to all types of organization;
Objective 3: To identify the correlation between DQ dimensions and DW benefits;
Objective 4: To develop a DQ-oriented framework for the design and development
process of DWs to improve DW quality.

Research Process
Finding primary elements to achieve research objectives based on literature review
Phase 1
Understanding DW Investigate DQ aspects in
Literature study
development process DW development.
(Chapter 2) Section 2.1. Section 2.2.

Finding DQ dimensions in the entire DW


development stages
Section 2.2.2.

Discovering DW benefits and DW quality measurement


Initial acquisition of DW
development process
Section 2.3.
Section 3.4.

Confirmation of primary elements based on literature review through exploratory study and expert judgement
Phase 2
Exploratory study Expert judgement
Interviews & data (5 organizations) (3 consultants)
collection Section 3.6.2.2
Section 3.6.2.1
(Chapter 3)

Data analysis based on findings in exploratory study and expert judgement

Descriptive Finding the common practiced


Phase 3 qualitative DW development stages Objective 1

Data analysis Chapter 4

(Chapters 4–6) Obtaining DQ dimensions in


the common practiced DW
development stages Objective 2

Chapter 5

Discovering effect of DQ in DW
and obtaining DW benefits
(Section 6.2)

Assessing correlation between


DQ dimensions in the entire
Bivariate DW development stages and
DW benefits Objective 3
analysis
Section 6.3

Objective 4
Phase 4

Building a framework to Develop a framework for DQ consideration in the


incorporate DQ as an handling DQ issues in the entire DW development
integral part of DW entire DW development stages Final
stages framework
development
Sections 7.3 step 6
(Chapter 7) Section 7.2.3.3
Refinement
(if any)
Adopting the proposed
Validation by DW
framework into the real case
users
Section 7.3
End Comparison

Figure 3.1 Research process


Detail description of the process of this research can be seen in the following:
Phase 1.
This research begins with finding primary elements to achieve research objectives
based on reviewed literature as described in the following:
 Before investigating the common practices of DW development stage in organisations
(objective 1), DW development process based on literature review should be
understood.
 Before identifying DQ dimensions that may be correlated with DW development
stages (objective 2), DQ aspects in DW development based on literature review
should be investigated, followed by finding DQ dimensions in the entire DW
development stages based on literature review .
 Before determining DW benefits (objective 3), DW benefits and measurement for
DW quality based on literature review should be discovered.
Before moving to phase 2, initial acquisition of DW development process were done through
a brainstorming session with two DW experts. Refinement in the DW development stage was
conducted to justify the DW development process based on reviewed literature.

Phase 2.
After primary elements to achieve research objectives based on literature review can
be identified, empirical evidence should be conducted to confirm these elements through
exploratory study in five organizations and then justified by three DW consultants.

Phase 3.
Data collected from previous phase then be analysed using descriptive qualitative and
bivariate analysis to achieve research objectives. Descriptive qualitative is the method of
choice when straight descriptions of phenomena are desired and its ultimate goal is a
comprehensive summarization (Sandelowski, 2000). One of the simplest techniques for
defining the empirical relationship between two variables is bivariate analysis (Babbie,
2009). This technique can help in testing hyphotheses of correlation (Que, 1988).
Objective 1 can be achieved through investigation of the common practices of DW
development in five organisations that is applicable to all types of organizations. Objective 2
can be obtained through DQ identification in five organisations followed by confirmation/
negation from the experts. Finally, correlation between DQ dimensions and DW benefits
(objective 3) can be reached through bivariate analysis between DQ dimensions in the entire
DW development stages and DW benefits.
Phase 4.
Based on achievement on objectives 1-3 in previous phase, the proposed framework
was developed. Reviewed literature and reviewed practices in the case studies can be used to
justify the proposed framework. In order to check whether the proposed framework is met
with the real world or not, a case study on student admission process was conducted,
followed by end users validation to obtain the quality of DW resulted from the proposed
framework. Comparison between DW quality measurement in the case studies (without the
proposed framework) and in a case study on student admission (using the proposed
framework) can be used to assess whether objective 4 can be reached or not.

Case Study Protocol

A Case Study Protocol (CSP) is a set of guidelines that can be used to structure and
govern a case research project (Yin, 2003) to ensure uniformity in data collection and
analysis even though data is to be collected in multiple locations over an extended period. A
CSP can be particularly useful in research projects involving multiple researchers (Pervan
and Maimbo, 2005). According to Miles and Huberman (1994), such a protocol should
outline the procedures and rules that govern the conduct of the researcher and the research
project. The following table is case study protocol for this research:

Table 1 Outline of case study protocol for this research (adopted from Pervan and Maimbo,
2005)
Section Contents
General It has been widely accepted that data quality issues can emerge
overview at any stage of data warehouse development. However, yet little work
is done for formulating a framework for data quality consideration in
data warehouse development. Lack of data quality provided by data
warehouse can lead to bad strategic decisions and indicates a
significant failure rate. Thus, data quality in data warehouse needs to
be assured.
This study was achieved by reviewing the most common
practices data warehouse development in some organisations that is
applicable to all type of organizations and then tried to confirm
whether data quality in data warehouse from literature review is
practiced in the some organisations followed by confirmation or
negation from the experts in order to determine specific data quality
dimensions that correlated with data warehouse development.
Data Organisations list based on recommendations from some
collection consultants were contacted. The organisations should represent
procedures government, semi-government and private organisations. Many tipes
of industry type should be represented in this list. The final list
includes five organizations that are engaged in a range of businesses,
namely, education, government, insurance, hospital services and
banking.
To ensure uniformity in the data collection process and
consequently facilitate both within case and cross case analyses,
procedures should be utilised.
Field visits were conducted from November 2013 to April
2014. The interviews were held in person, on-site and lasted between
60 minutes to 120 minutes for 4-5 visits to each organisation. Experts
judgement are used to confirm the findings of field visits in five
organisations.
Items needed during the visit are list of questions, questionaire,
block notes and tape recorder. Interview results were recorded.
Research Research instruments were made highly structured to facilitate
Instrument(s) the data collection process and uniformity in the collection of data.
Research instruments in this research are:
a) Qualitative – interview guides were made utilising either open-
ended or close-ended questions. The interview questions were
divided into 2 groups to address each group of stakeholders in the
DW development: the DW manager/teams and the business users.
b) Quantitative – survey questionnaire applied in face to face
interviews to fill out a seven point graded questionnaire to each of
these questions.
Data analysis Three remedies have been addressed in this research. The data
was collected from documents, interviews and questionnaires.
 Document collection and interviews result were used to determine
common practices of DW development in five organisations that is
applicable to all types of organisations
 Questionaires result from five organisations were summarised to
obtain mean score and variance in each DQ dimensions in the
entire DW development stages. Overall average value of mean and
variance were used as a center point of x-axis (mean) and y-axis
(variance) for quadrant analysis technique. To determine the
relationships amongst DQ dimensions in every phase of DW
development and DW benefits, bivariate analysis was performed
along with descriptive analysis.
Appendix Research instrument
APPENDIX A - DESIGN OF RESEARCH INSTRUMENTS

This appendix discusses the design of the research instruments used to conduct this
empirical research. The case studies were conducted using structured interviews and
questionnaires. This chapter presents the interview questions and the design of the
questionnaires based on literature review.
The questions (below) were divided into 2 groups to address each group of
stakeholders in the data warehousing development process: the DW manager/teams and the
business users. In addition to these open-ended questions, each participant filled out a seven
point graded questionnaire to each of these questions. This served to capture the degree of
agreement or disagreement with the research element and the degree of satisfaction or
dissatisfaction of the interview participants.

A.1 Design of Interview Instruments for Data warehouse Manager / Teams


At the outset of the interview, questions were asked of the data warehouse manager/
team to gather an overview of the IT infrastructure and the data warehouse infrastructure.
• Please describe your organization’s data warehouse architecture and data warehouse
development strategies?
• What are the applications supported by the data warehouse?
• What are the tools you are using in the data warehouse and why?
• How many users currently access the data warehouse?

A.1.1. Questions for Data Warehouse Managers

A.1.1.1. Data Warehouse Architecture


1. How was the data warehouse architecture selected (enterprise-wide DW, data mart,
other)?
2. How was the data mart construction strategy do you use to develop your full data
warehouse?
3. Over the last 5 years has the data warehouse architecture changed and how?
4. How is the data integrated from different systems across the organization?
5. How is the data warehouse architecture integrated into existing IT systems’ architecture?
A.1.1.2. Data Warehouse Development Methodology
1. How was the data warehouse development methodologies selected? Why? Describe step
by step when your organization use it to develop your data warehouse.
2. What kind of framework do you use for your data warehouse development?
3. What are the organization’s motivations to develop data warehouse?
4. What kind of requirement collection techniques do you use for your data warehouse?
5. What in your opinion are your user’s key requirements now?
6. What kind of input, processes and deliverables do you get in every stage of data
warehouse development?

A.1.1.3. Data Warehouse Quality


1. How is the performance of the data warehouse evaluated? (Does your evaluation
primarily depend on feedback by users?)
2. In your opinion, how flexible is the data warehouse to accommodate new business needs
or changes?
3. In your opinion, what data quality dimensions are responsible for the data warehouse
success?

Overall questions
Given an option, what are the things you would like to improve or change in the data
warehouse? What need would these changes fulfil?
In your opinion, what factors are responsible for the data warehouse success?
Table A.1. Questionnaire to be completed by Data Warehouse Manager/ Team

Indicator (description) Conceptual Logical


Construct Label ETL Physical
(1 = Strongly Disagree .... 7 = Strongly Agree) MD SC

Comprehensiveness IQ1 Is the scope of information adequate (not too much nor too little)
Relevance

Accuracy IQ2 Is the information precise enough and close enough to reality?

Clarity IQ3 Is the information understandable or comprehensible to the target group


Content

Applicability IQ4 Can the information directly applied? Is it useful?

Conciseness IQ5 Is the information to the point, void of unnecessary elements


Soundness

Consistency IQ6 Is the information free of contradictions or convention breaks?


Information Quality

Correctness IQ7 Is the information free of distortion, bias or error

Currency IQ8 Is the information up-to-date and not obsolete

Convenience IQ9 Does the information provision correspond to the user's needs and habits?
Process

Timeliness IQ10 Is the information processed and delivered rapidly without delays?

Traceability IQ11 Is the background of the information visible (author, date etc)?
Access

Interactivity IQ12 Can the information process be adapted by the information consumer?

Accessibility IQ13 Is the continuous and unobstructed way to get the information?
Infrastructure

Security IQ14 Is the information protected against loss or unauthorized access?

Maintainability IQ15 Can all the information be organized and updated on an on-going basis?

Speed IQ16 Can the infrastructure match the user's working pace?

Legend
MD : Multidimensional Model
SC : Star Schema
ETL : Extract Transform Loading
Physical : Physical design
A.2. Design of Interview Instruments for Business Users

At the outset of the interview, questions were asked of the business users to gather an
overview of the organization, such as:
• Name, locations and subsidiaries of the organization
• History
• Industry/business function
• Organizational structure

A.2.1. Questions for Business Users

A.2.1.1. Data Warehouse Development


1. How are your needs communicated to the data warehouse team and vice versa?
2. Is there any specific strategy when you have to make a prioritization in data warehouse
development project?

A.2.1.2. Data Warehouse Quality


1. How well does the data warehouse support your needs?
2. How does the data warehouse respond to a change in business need?
3. Does the data warehouse provide accurate information?
4. Does the data warehouse provide consistent and reliable information?
5. Does the data warehouse provide timely information?
6. Is the data warehouse easy to use?
7. Does the data warehouse enable day-to-day-decisions?
8. Are the data warehouse functions and technical features easy to understand?
9. Finally, in your opinion, has the data warehouse been successful? What data quality
dimensions do you think are responsible for the data warehouse quality or data
warehouse benefits?

Overall question
In your opinion, what data quality dimensions are responsible for the data warehouse
benefits?
Table A.2.1. Questionnaire to be completed by business users

Rank the following questions with score (1=strongly disagree ...7=strongly agree)

Indicator (description) Req. Analysis OLAP


Construct Label
(1 = Strongly Disagree .... 7 = Strongly Agree) UD GD ED DD PD

Comprehensiveness IQ1 Is the scope of information adequate (not too much nor too little)
Relevance

Accuracy IQ2 Is the information precise enough and close enough to reality?

Clarity IQ3 Is the information understandable or comprehensible to the target group


Content

Applicability IQ4 Can the information directly applied? Is it useful?

Conciseness IQ5 Is the information to the point, void of unnecessary elements


Soundness

Consistency IQ6 Is the information free of contradictions or convention breaks?


Information Quality

Correctness IQ7 Is the information free of distortion, bias or error

Currency IQ8 Is the information up-to-date and not obsolete

Convenience IQ9 Does the information provision correspond to the user's needs and habits?
Process

Timeliness IQ10 Is the information processed and delivered rapidly without delays?

Traceability IQ11 Is the background of the information visible (author, date etc)?
Access

Interactivity IQ12 Can the information process be adapted by the information consumer?

Accessibility IQ13 Is the continuous and unobstructed way to get the information?
Infrastructure

Security IQ14 Is the information protected against loss or unauthorized access?

Maintainability IQ15 Can all the information be organized and updated on an on-going basis?

Speed IQ16 Can the infrastructure match the user's working pace?

Legend
UD : User driven DD : Data driven
GD : Goal driven PD : Process driven
ED : External driven OLAP : Online Analytical Processing

Table A.2.2. Rank the following questions with score (1=strongly disagree ...7=strongly
agree)
# Question 1 2 3 4 5 6 7
1 The data warehouse supports heterogeneous data integration
(internal/ external)
2 The data warehouse supports time savings
3 The data warehouse provides more and better information
4 The data warehouse supports better decision
5 The data warehouse can improve/ redesign business process
6 The data warehouse supports for accomplishing strategic
business objectives
7 The data warehouse can be a single source of truth
8 Application in data warehouse can be easily use
Table A.2.3. In your opinion, rank the existing data warehouse quality (with score between 0
– 100) using the following expression and give your reason.
Quality Expression Score Reason
Interpretability
suitable for interpretation with
respect to adequately;
requirements on a given type of
target in terms of quality and
scale.
Accessibility
ability to obtain at least as much
information from the DW as
form current system
Usefulness
ability to give crucial
information to decision makers
that will allow them to make
better decisions
Believability
believability of the data in the
DW is obviously dependent on
the DW design process to
interpret the data on the
sources. Believability can be
measured in terms of drill-down
capability
Validity
measurement of DW reports to
validate the documentation of
data sources
Mean
A.3 Design of Interview Instruments for End Users after Implementation of the
Proposed Framework

A.3.1 Questions for End Users


1. How well does the data warehouse support your needs?
2. How does the data warehouse respond to a change in business need?
3. Does the data warehouse provide usefulness information?
4. Does the data warehouse provide consistent and reliable information?
5. Does the data warehouse easy to access?
6. Is information in the data warehouse easy to interpret?
7. Is information in the data warehouse easy to validate?
8. Does the data warehouse enable day-to-day-decisions?
9. Are the data warehouse functions and technical features easy to understand?
10. Finally, in your opinion, has the data warehouse been successful? What data quality
dimensions do you think are responsible for the data warehouse quality?

Table A.3.1. Rank the following questions with score (1=strongly disagree ...7=strongly
agree)
# Question 1 2 3 4 5 6 7
1 The data warehouse supports heterogeneous data integration
(internal/ external)
2 The data warehouse supports time savings
3 The data warehouse provides more and better information
4 The data warehouse supports better decision
5 The data warehouse can improve/ redesign business process
6 The data warehouse supports for accomplishing strategic
business objectives
7 The data warehouse can be a single source of truth
8 Application in data warehouse can be easily use
Table A.3.2. In your opinion, rank the existing data warehouse quality (with score between 0
– 100) using the following expression and give your reason.

Quality Expression Score Reason


Interpretability
suitable for interpretation with
respect to adequately;
requirements on a given type of
target in terms of quality and
scale.
Accessibility
ability to obtain at least as much
information from the DW as
form current system
Usefulness
ability to give crucial
information to decision makers
that will allow them to make
better decisions
Believability
believability of the data in the
DW is obviously dependent on
the DW design process to
interpret the data on the
sources. Believability can be
measured in terms of drill-down
capability
Validity
measurement of DW reports to
validate the documentation of
data sources
Mean

You might also like