MI0036-Business Intelligence Tools - F1
MI0036-Business Intelligence Tools - F1
Q.1 Define the term business intelligence tools? Discuss the roles in Business Intelligence project?
Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools are
discussed further. Whatever is the type, the Business Intelligencecapabilities of the
system is to let its users slice and dice the information from their organization's numerous
databases without having to wait for their IT departments to develop complex queries and
elicit answers.
Roles in Business Intelligence project:
A typical BI Project consists of the following roles and the responsibilities of each of
these roles are detailed below:
Project Manager: Monitors the progress on continuum basis and is responsible for
the success of the project.
Database Administrator (DBA): Keeps the database available for the applications
to run smoothly and also involves in planning and executing a backup/recovery
plan, as well as performance tuning.
Sikkim Manipal University – MI0036
Data Modeler: Is responsible for taking the data structure that exists in the
enterprise and model it into a scheme that is suitable for OLAP analysis.
Trainer: Works with the end users to make them familiar with how the front end
is set up so that the end users can get the most benefit out of the system.
Q.2. What do you mean by data ware house? What are the major concepts and
terminology used in the study of data ware house?
Sikkim Manipal University – MI0036
A data warehouse maintains its functions in three layers: staging, integration, and
access. Staging is used to store raw data for use by developers. The integration layer is
used to integrate data and to have a level of abstraction from users. The access layer is for
getting data out for users.
Data warehouses can be subdivided into data marts. Data marts store subsets of data from
a warehouse.
This definition of the data warehouse focuses on data storage. The main source of the
data is cleaned, transformed, catalogued and made available for use by managers and
other business professionals for data mining, online analytical processing, market
research and decision support (Marakas & O'Brien 2009). However, the means to retrieve
Sikkim Manipal University – MI0036
and analyze data, to extract, transform and load data, and to manage the data
dictionary are also considered essential components of a data warehousing system. Many
references to data warehousing use this broader context. Thus, an expanded definition for
data warehousing includes business intelligence tools, tools to extract, transform and
load data into the repository, and tools to manage and retrieve metadata.
Subject Oriented
Integrated
Nonvolatile
Time Variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about
your company's sales data, you can build a warehouse that concentrates on sales. Using
this warehouse, you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter, sales in this case,
makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming
conflicts and inconsistencies among units of measure. When they achieve this, they are
said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
Sikkim Manipal University – MI0036
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's
focus on change over time is what is meant by the term time variant.
DATA WAREHOUSE TERMINOLOGY
Bruce W. Johnson, M.S.
Ad Hoc Query:
Aggregation:
Catalog:
A component of a data dictionary that describes and organizes the various aspects of a
database such as its folders, dimensions, measures, prompts, functions, queries and other
database objects. It is used to create queries, reports, analyses and cubes.
Cross Tab:
Dashboard:
A data visualization method and workflow management tool that brings together useful
information on a series of screens and/or web pages. Some of the information that may
be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail,
etc. When incorporated into a DSS or EIS key performance indicators may be
Sikkim Manipal University – MI0036
represented as graphics that are linked to various hyperlinks, graphs, tables and other
reports. The dashboard draws its information from multiple sources applications, office
products, databases, Internet, etc.
Cube:
Data-based Knowledge:
Factual information used in the decision making process that is derived from data marts
or warehouses using business intelligence tools. Data warehousing organizes information
into a format so that it represents an organizations knowledge with respect to a particular
subject area, e.g. finance or clinical outcomes.
Data Cleansing:
The process of cleaning or removing errors, redundancies and inconsistencies in the data
that is being imported into a data mart or data warehouse. It is part of the quality
assurance process.
Data Mart:
A database that is similar in structure to a data warehouse, but is typically smaller and is
focused on a more limited area. Multiple, integrated data marts are sometimes referred to
as an Integrated Data Warehouse. Data marts may be used in place of a larger data
warehouse or in conjunction with it. They are typically less expensive to develop and
faster to deploy and are therefore becoming more popular with smaller organizations.
Data Migration:
Sikkim Manipal University – MI0036
The transfer of data from one platform to another. This may include conversion from one
language, file structure and/or operating environment to another.
Data Mining:
The process of researching data marts and data warehouses to detect specific patterns in
the data sets. Data mining may be performed on databases and multi-dimensional data
cubes with ad hoc query tools and OLAP software. The queries and reports are typically
designed to answer specific questions to uncover trends or hidden relationships in the
data.
Data Scrubbing:
Data Transformation:
The modification of transaction data extracted from one or more data sources before it is
loaded into the data mart or warehouse. The modifications may include data cleansing,
translation of data into a common format so that is can be aggregated and compared,
summarizing the data, etc.
Data Warehouse:
The software that is used to create data warehouses and data marts. For the purposes of
data warehousing, they typically include relational database management systems and
Sikkim Manipal University – MI0036
A set of queries, reports, rule-based analyses, tables and charts that are designed to aid
management with their decision-making responsibilities. These functions are typically
“wrapped around” a data mart or data warehouse. The DSS tends to employ more
detailed level data than an EIS.
Dimension:
Drill Down:
The ability of a data-mining tool to move down into increasing levels of detail in a data
mart, data warehouse or multi-dimensional data cube.
Drill Up:
The ability of a data-mining tool to move back up into higher levels of data in a data
mart, data warehouse or multi-dimensional data cube.
A type of decision support system designed for executive management that reports
summary level information as opposed to greater detail derived in a decision support
system.
Software that is used to extract data from a data source like a operational system or data
warehouse, modify the data and then load it into a data mart, data warehouse or multi-
dimensional data cube.
Granularity:
Hierarchy:
The organization of data, e.g. a dimension, into a outline or logical tree structure. The
strata of a hierarchy are referred to as levels. The individual elements within a level are
referred to as categories. The next lower level in a hierarchy is the child; the next higher
level containing the children is their parent.
Legacy System:
Older systems developed on platforms that tend to be one or more generations behind the
current state-of-the-art applications. Data marts and warehouses were developed in large
part due to the difficulty in extracting data from these system and the inconsistencies and
incompatibilities among them.
Level:
Measure:
Member:
Sikkim Manipal University – MI0036
Meta Data:
Information in a data mart or warehouse that describes the tables, fields, data types,
attributes and other objects in the data warehouse and how they map to their data sources.
Meta data is contained in database catalogs and data dictionaries.
Software that creates and analyzes multi-dimensional cubes to store its information.
Non-Volatile Data:
Data that is static or that does not change. In transaction processing systems the data is
updated on a continual regular basis. In a data warehouse the database is added to or
appended, but the existing data seldom changes.
Normalization:
A database standard developed by Microsoft and the SQL Access Group Consortium that
defines the “rules” for accessing or retrieving data from a database.
Database management systems that have the ability to link tables of data through a
common or key field. Most databases today use relational technologies and support a
standard programming language called Structured Query Language (SQL).
OLAP software that employs a relational strategy to organize and store the data in its
database.
Replication:
Scalable:
Synchronization:
Sikkim Manipal University – MI0036
The process by which the data in two or more separate database are synchronized so that
the records contain the same information. If the fields and records are updated in one
database the same fields and records are updated in the other.
Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an
information management consultant who specializes in working with social service,
healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail
at [email protected].
Q.3. what are the data modeling techniques used in data warehousing environment?
Answer. Two data modeling techniques that are relevant in a data warehousing
environment are ER modeling and dimensional modeling.
ER modeling produces a data model of the specific area of interest, using two basic
concepts: entities and the relationships between those entities. Detailed
ER models also contain attributes, which can be properties of either the entities or the
relationships. The ER model is an abstraction tool because it can be used to understand
and simplify the ambiguous data relationships in the business world and complex systems
environments.
Dimensional modeling uses three basic concepts: measures, facts, and dimensions.
Dimensional modeling is powerful in representing the requirements of the business user
in the context of database tables.
Both ER and dimensional modeling can be used to create an abstract model of a specific
subject. However, each has its own limited set of modeling concepts and associated
notation conventions. Consequently, the techniques look different, and they are indeed
different in terms of semantic representation. The following sections describe the
modeling concepts and notation conventions for both ER modeling and dimensional
modeling that will be used throughout this book.
Sikkim Manipal University – MI0036
ER Modeling
A prerequisite for reading this book is a basic knowledge of ER modeling.
Therefore we do not focus on that traditional technique. We simply define the necessary
terms to form some consensus and present notation conventions used in the rest of this
book.
Basic Concepts
An ER model is represented by an ER diagram, which uses three basic graphic symbols
to conceptualize the data: entity, relationship, and attribute.
6.3.1.1 Entity
An entity is defined to be a person, place, thing, or event of interest to the business or the
organization. An entity represents a class of objects, which are things in the real world
that can be observed and classified by their properties and characteristics. In some books
on IE, the term entity type is used to represent classes of objects and entity for an instance
of an entity type. In this book, we will use them interchangeably.
6.3.1.2 Relationship
Sikkim Manipal University – MI0036
A relationship is represented with lines drawn between entities. It depicts the structural
interaction and association among the entities in a model. A relationship is designated
grammatically by a verb, such as owns, belongs, and has. The relationship between two
entities can be defined in terms of the cardinality. This is the maximum number of
instances of one entity that are related to a single instance in another table and vice versa.
The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many
(M:M).
In a detailed (normalized) ER model, any M:M relationship is not shown because it is
resolved to an associative entity.
6.3.1.3 Attributes
Attributes describe the characteristics of properties of the entities. In Figure 12,
Product ID, Description, and Picture are attributes of the PRODUCT entity. For
clarification, attribute naming conventions are very important. An attribute name should
be unique in an entity and should be self-explanatory. For example, simply saying date1
or date2 is not allowed, we must clearly define each. As examples, they could be defined
as the order date and delivery date.
Dimensional Modeling
In some respects, dimensional modeling is simpler, more expressive, and easier to
understand than ER modeling. But, dimensional modeling is a relatively new concept and
not firmly defined yet in details, especially when compared to ER modeling techniques.
This section presents the terminology that we use in this book as we discuss dimensional
modeling.
Basic Concepts
Dimensional modeling is a technique for conceptualizing and visualizing data models as
a set of measures that are described by common aspects of the business. It is especially
useful for summarizing and rearranging the data and presenting views of the data to
support data analysis. Dimensional modeling focuses on numeric data, such as values,
counts, weights, balances, and occurrences.
Sikkim Manipal University – MI0036
Q.4 Discuss the categories in which data is divided before structuring it into data
ware house?
Sikkim Manipal University – MI0036
Data Warehouses and Data Warehouse applications are designed primarily to support
executives, senior managers, and business analysts in making complex business
decisions. Data Warehouse applications provide the business community with access to
accurate, consolidated information from various internal and external sources.
Managing the scope of each subject area which will be implemented into the
Warehouse on an iterative basis
Sikkim Manipal University – MI0036
Establishing a refresh program that is consistent with business needs, timing and
cycles
Providing user-friendly, powerful tools at the desktop to access the data in the
Warehouse
Educating the business community about the realm of possibilities that are
available to them through Data Warehousing
Establishing a Data Warehouse Help Desk and training users to effectively utilize
the desktop tools
Until the advent of Data Warehouses, enterprise databases were expected to serve
multiple purposes, including online transaction processing, batch processing, reporting,
and analytical processing. In most cases, the primary focus of computing resources was
on satisfying operational needs and requirements. Information reporting and analysis
needs were secondary considerations. As the use of PCs, relational databases, 4GL
technology and end-user computing grew and changed the complexion of information
processing, more and more business users demanded that their needs for information be
Sikkim Manipal University – MI0036
addressed. Data Warehousing has evolved to meet those needs without disrupting
operational processing.
In the Data Warehouse model, operational databases are not accessed directly to perform
information processing. Rather, they act as the source of data for the Data Warehouse,
which is the information repository and point of access for information processing. There
are sound reasons for separating operational and informational databases, as described
below.
The technology used for operational processing frequently differs from the
technology required to support informational needs.
The Data Warehouse functions as a Decision Support System (DSS) and an Executive
Information System (EIS), meaning that it supports informational and analytical needs by
providing integrated and transformed enterprise-wide historical data from which to do
management analysis. A variety of sophisticated tools are readily available in the
marketplace to provide user-friendly access to the information stored in the Data
Warehouse.
Warehouse comes from the operational environment and external sources. Data
Warehouses are physically separated from operational systems, even though the
operational systems feed the Warehouse with source data.
Subject Orientation
Data Warehouses are designed around the major subject areas of the enterprise; the
operational environment is designed around applications and functions. This difference in
orientation (data vs. process) is evident in the content of the database. Data Warehouses
do not contain information that will not be used for informational or analytical
processing; operational databases contain detailed data that is needed to satisfy
processing requirements but which has no relevance to management or analysis.
The data within the Data Warehouse is integrated. This means that there is consistency
among naming conventions, measurements of variables, encoding structures, physical
attributes, and other salient data characteristics. An example of this integration is the
treatment of codes such as gender codes. Within a single corporation, various
applications may represent gender codes in different ways: male vs. female, m vs. f, and
1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way,
regardless of the many ways by which it may be encoded and stored in the source data.
As the data is moved to the Warehouse, it is transformed into a consistent representation
as required.
Time Variance
Sikkim Manipal University – MI0036
Non-Volatility
Data in the Warehouse is static, not dynamic. The only operations that occur in Data
Warehouse applications are the initial loading of data, access of data, and refresh of data.
For these reasons, the physical design of a Data Warehouse optimizes the access of data,
rather than focusing on the requirements of data update and delete processing.
A Data Warehouse configuration, also known as the logical architecture, includes the
following components:
One Enterprise Data Store (EDS) - a central repository which supplies atomic
(detail level) integrated information to the whole organization.
The EDS is the cornerstone of the Data Warehouse. It can be accessed for both
immediate informational needs and for analytical processing in support of strategic
decision making, and can be used for drill-down support for the Data Marts which
contain only summarized data. It is fed by the existing subject area operational systems
and may also contain data from external sources. The EDS in turn feeds individual Data
Marts that are accessed by end-user query tools at the user's desktop. It is used to
consolidate related data from multiple sources into a single source, while the Data Marts
are used to physically distribute the consolidated data into logical categories of data, such
as business functional departments or geographical regions. The EDS is a collection of
daily "snapshots" of enterprise-wide data taken over an extended time period, and thus
retains and makes available for tracking purposes the history of changes to a given data
element over time. This creates an optimum environment for strategic analysis. However,
access to the EDS can be slow, due to the volume of data it contains, which is a good
reason for using Data Marts to filter, condense and summarize information for specific
business areas. In the absence of the Data Mart layer, users can access the EDS directly.
Metadata is "data about data," a catalog of information about the primary data that
defines access to the Warehouse. It is the key to providing users and developers with a
road map to the information in the Warehouse. Metadata comes in two different forms:
end-user and transformational. End-user metadata serves a business purpose; it translates
a cryptic name code that represents a data element into a meaningful description of the
data element so that end-users can recognize and use the data. For example, metadata
would clarify that the data element "ACCT_CD" represents "Account Code for Small
Business." Transformational metadata serves a technical purpose for development and
maintenance of the Warehouse. It maps the data element from its source system to the
Sikkim Manipal University – MI0036
While an Enterprise Data Store and Metadata Store(s) are always included in a sound
Data Warehouse design, the specific number of Data Marts (if any) and the need for an
Operational Data Store are judgment calls. Potential Data Warehouse configurations
should be evaluated and a logical architecture determined according to business
requirements.
The james martin + co Data Warehouse Process does not encompass the analysis and
identification of organizational value streams, strategic initiatives, and related business
goals, but it is a prescription for achieving such goals through a specific architecture. The
Process is conducted in an iterative fashion after the initial business requirements and
architectural foundations have been developed with the emphasis on populating the Data
Warehouse with "chunks" of functional subject-area information each iteration. The
Process guides the development team through identifying the business requirements,
developing the business plan and Warehouse solution to business requirements, and
implementing the configuration, technical, and application architecture for the overall
Data Warehouse. It then specifies the iterative activities for the cyclical planning, design,
construction, and deployment of each population project. The following is a description
of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process also
includes conventional project management, startup, and wrap-up activities which are
detailed in the Plan, Activate, Control and End stages, not described here.)
A variety of kinds of strategic analysis, including Value Stream Assessment, have likely
already been done by the customer organization at the point when it is necessary to
develop a Business Case. The Business Case Development stage launches the Data
Warehouse development in response to previously identified strategic business initiatives
and "predator" (key) value streams of the organization. The organization will likely have
identified more than one important value stream. In the long term it is possible to
implement Data Warehouse solutions that address multiple value streams, but it is the
predator value stream or highest priority strategic initiative that usually becomes the
focus of the short-term strategy and first run population projects resulting in a Data
Warehouse.
At the conclusion of the relevant business reengineering, strategic visioning, and/or value
stream assessment activities conducted by the organization, a Business Case can be built
to justify the use of the Data Warehouse architecture and implementation approach to
solve key business issues directed at the most important goals. The Business Case defines
the outlying activities, costs, benefits, and critical success factors for a multi-generation
implementation plan that results in a Data Warehouse framework of an information
storage/access system. The Warehouse is an iterative designed/developed/refined solution
to the tactical and strategic business requirements. The Business Case addresses both the
short-term and long-term Warehouse strategies (how multiple data stores will work
together to fulfill primary and secondary business goals) and identifies both immediate
and extended costs so that the organization is better able to plan its short and long-term
budget appropriation.
Once a Business Case has been developed, the short-term strategy for implementing the
Data Warehouse is mapped out by means of the Business Question Assessment (BQA)
stage. The purpose of BQA is to:
Define and prioritize the business requirements and the subsequent information
(data) needs the Warehouse will address
Identify the business directions and objectives that may influence the required
data and application architectures
Determine which business subject areas provide the most needed information;
prioritize and sequence implementation projects accordingly
Drive out the logical data model that will direct the physical implementation
model
Measure the quality, availability, and related costs of needed source data at a high
level
Define the iterative population projects based on business needs and data
validation
The prioritized predator value stream or most important strategic initiative is analyzed to
determine the specific business questions that need to be answered through a Warehouse
implementation. Each business question is assessed to determine its overall importance to
the organization, and a high-level analysis of the data needed to provide the answers is
undertaken. The data is assessed for quality, availability, and cost associated with
bringing it into the Data Warehouse. The business questions are then revisited and
prioritized based upon their relative importance and the cost and feasibility of acquiring
the associated data. The prioritized list of business questions is used to determine the
Sikkim Manipal University – MI0036
scope of the first and subsequent iterations of the Data Warehouse, in the form of
population projects. Iteration scoping is dependent on source data acquisition issues and
is guided by determining how many business questions can be answered in a three to six
month implementation time frame. A "business question" is a question deemed by the
business to provide useful information in determining strategic direction. A business
question can be answered through objective analysis of the data that is available.
The Architecture is the logical and physical foundation on which the Data Warehouse
will be built. The Architecture Review and Design stage, as the name implies, is both a
requirements analysis and a gap analysis activity. It is important to assess what pieces of
the architecture already exist in the organization (and in what form) and to assess what
pieces are missing which are needed to build the complete Data Warehouse architecture.
During the Architecture Review and Design stage, the logical Data Warehouse
architecture is developed. The logical architecture is a configuration map of the necessary
data stores that make up the Warehouse; it includes a central Enterprise Data Store, an
optional Operational Data Store, one or more (optional) individual business area Data
Marts, and one or more Metadata stores. In the metadata store(s) are two different kinds
of metadata that catalog reference information about the primary data.
Once the logical configuration is defined, the Data, Application, Technical and Support
Architectures are designed to physically implement it. Requirements of these four
architectures are carefully analyzed so that the Data Warehouse can be optimized to serve
the users. Gap analysis is conducted to determine which components of each architecture
Sikkim Manipal University – MI0036
already exist in the organization and can be reused, and which components must be
developed (or purchased) and configured for the Data Warehouse.
The Data Architecture organizes the sources and stores of business information and
defines the quality and management standards for data and metadata.
The Application Architecture is the software framework that guides the overall
implementation of business functionality within the Warehouse environment; it controls
the movement of data from source to user, including the functions of data extraction, data
cleansing, data transformation, data loading, data refresh, and data access (reporting,
querying).
The Technical Architecture provides the underlying computing infrastructure that enables
the data and application architectures. It includes platform/server, network,
communications and connectivity hardware/software/middleware, DBMS, client/server
2-tier vs.3-tier approach, and end-user workstation hardware/software. Technical
architecture design must address the requirements of scalability, capacity and volume
handling (including sizing and partitioning of tables), performance, availability, stability,
chargeback, and security.
The Support Architecture includes the software components (e.g., tools and structures for
backup/recovery, disaster recovery, performance monitoring, reliability/stability
compliance reporting, data archiving, and version control/configuration management) and
organizational functions necessary to effectively manage the technology investment.
Architecture Review and Design applies to the long-term strategy for development and
refinement of the overall Data Warehouse, and is not conducted merely for a single
iteration. This stage develops the blueprint of an encompassing data and technical
structure, software application configuration, and organizational support structure for the
Warehouse. It forms a foundation that drives the iterative Detail Design activities. Where
Sikkim Manipal University – MI0036
Design tells you what to do; Architecture Review and Design tells you what pieces you
need in order to do it.
The Architecture Review and Design stage can be conducted as a separate project that
runs mostly in parallel with the Business Question Assessment stage. For the technical,
data, application and support infrastructure that enables and supports the storage and
access of information is generally independent from the business requirements of which
data is needed to drive the Warehouse. However, the data architecture is dependent on
receiving input from certain BQA activities (data source system identification and data
modeling), so the BQA stage must conclude before the Architecture stage can conclude.
Tool Selection
The purpose of this stage is to identify the candidate tools for developing and
implementing the Data Warehouse data and application architectures, and for performing
technical and support architecture functions where appropriate. Select the candidate tools
that best meet the business and technical requirements as defined by the Data Warehouse
architecture, and recommend the selections to the customer organization. Procure the
tools upon approval from the organization.
It is important to note that the process of selecting tools is often dependent on the existing
technical infrastructure of the organization. Many organizations feel strongly for various
reasons about using tools for the Data Warehouse applications that they already have in
their "arsenal" and are reluctant to purchase new application packages. It is recommended
that a thorough evaluation of existing tools and the feasibility of their reuse be done in the
context of all tool evaluation activities. In some cases, existing tools can be form-fitted to
Sikkim Manipal University – MI0036
the Data Warehouse; in other cases, the customer organization may need to be convinced
that new tools would better serve their needs.
It may even be feasible that this series of activities is skipped altogether, if the
organization is insistent that particular tools be used (no room for negotiation), or if tools
have already been assessed and selected in anticipation of the Data Warehouse project.
Data Cleansing
Data Load
Data Refresh
Data Access
Security Enforcement
Disaster Recovery
Performance Monitoring
Database Management
Platform
Data Modeling
Sikkim Manipal University – MI0036
Metadata Management
The Data Warehouse is implemented (populated) one subject area at a time, driven by
specific business questions to be answered by each implementation cycle. The first and
subsequent implementation cycles of the Data Warehouse are determined during the
BQA stage. At this point in the Process the first (or next if not first) subject area
implementation project is planned. The business requirements discovered in BQA and, to
a lesser extent, the technical requirements of the Architecture Design stage are now
refined through user interviews and focus sessions to the subject area level. The results
are further analyzed to yield the detail needed to design and implement a single
population project, whether initial or follow-on. The Data Warehouse project team is
expanded to include the members needed to construct and deploy the Warehouse, and a
detailed work plan for the design and implementation of the iteration project is developed
and presented to the customer organization for approval.
Detail Design
In the Detail Design stage, the physical Data Warehouse model (database schema) is
developed, the metadata is defined, and the source data inventory is updated and
expanded to include all of the necessary information needed for the subject area
implementation project, and is validated with users. Finally, the detailed design of all
procedures for the implementation project is completed and documented. Procedures to
achieve the following activities are designed:
Data Extraction/Transformation/Cleansing
Sikkim Manipal University – MI0036
Data Load
Security
Data Refresh
Data Access
Disaster Recovery
Data Archiving
Configuration Management
Testing
Transition to Production
User Training
Help Desk
Change Management
Implementation
Once the Planning and Design stages are complete, the project to implement the current
Data Warehouse iteration can proceed quickly. Necessary hardware, software and
middleware components are purchased and installed, the development and test
environment is established, and the configuration management processes are
implemented. Programs are developed to extract, cleanse, transform and load the source
data and to periodically refresh the existing data in the Warehouse, and the programs are
individually unit tested against a test database with sample source data. Metrics are
Sikkim Manipal University – MI0036
captured for the load process. The metadata repository is loaded with transformational
and business user metadata. Canned production reports are developed and sample ad-hoc
queries are run against the test database, and the validity of the output is measured. User
access to the data in the Warehouse is established. Once the programs have been
developed and unit tested and the components are in place, system functionality and user
acceptance testing is conducted for the complete integrated Data Warehouse system.
System support processes of database security, system backup and recovery, system
disaster recovery, and data archiving are implemented and tested as the system is
prepared for deployment. The final step is to conduct the Production Readiness Review
prior to transitioning the Data Warehouse system into production. During this review, the
system is evaluated for acceptance by the customer organization.
Transition to Production
The Transition to Production stage moves the Data Warehouse development project into
the production environment. The production database is created, and the
extraction/cleanse/transformation routines are run on the operations system source data.
The development team works with the Operations staff to perform the initial load of this
data to the Warehouse and execute the first refresh cycle. The Operations staff is trained,
and the Data Warehouse programs and processes are moved into the production libraries
and catalogs. Rollout presentations and tool demonstrations are given to the entire
customer community, and end-user training is scheduled and conducted. The Help Desk
is established and put into operation. A Service Level Agreement is developed and
approved by the customer organization. Finally, the new system is positioned for ongoing
maintenance through the establishment of a Change Management Board and the
implementation of change control procedures for future development cycles.
They are able to access data about specific issues and problems as well as
aggregate reports
They provide extensive on-line analysis tools including trend analysis, exception
reporting & "drill-down" capability
They access a broad range of internal and external data
They are particularly easy to use (typically mouse or touchscreen driven)
They are used directly by executives without assistance
They present information in a graphical form
Purpose of EIS
The primary purpose of an Executive Information System is to support
managerial learning about an organization, its work processes, and its interaction with the
external environment. Informed managers can ask better questions and make better
decisions. Vandenbosch and Huff (1992) from the University of Western Ontario found
that Canadian firms using an EIS achieved better business results if their EIS promoted
managerial learning. Firms with an EIS designed to maintain managers' "mental models"
were less effective than firms with an EIS designed to build or enhance managers'
knowledge.
This distinction is supported by Peter Senge in The Fifth Dimension. He
illustrates the benefits of learning about the behaviour of systems versus simply learning
more about their states. Learning more about the state of a system leads to reactive
management fixes. Typically these reactions feed into the underlying system behaviour
and contribute to a downward spiral. Learning more about system behaviour and how
various system inputs and actions interrelate will allow managers to make more proactive
changes to create long-term improvement.
A secondary purpose for an EIS is to allow timely access to information. All of
the information contained in an EIS can typically be obtained by a manager through
traditional methods. However, the resources and time required to manually compile
Sikkim Manipal University – MI0036
information in a wide variety of formats, and in response to ever changing and ever more
specific questions usually inhibit managers from obtaining this information. Often, by the
time a useful report can be compiled, the strategic issues facing the manager have
changed, and the report is never fully utilized.
Timely access also influences learning. When a manager obtains the answer to a
question, that answer typically sparks other related questions in the manager's mind. If
those questions can be posed immediately, and the next answer retrieved, the learning
cycle continues unbroken. Using traditional methods, by the time the answer is produced,
the context of the question may be lost, and the learning cycle will not continue. An
executive in Rockart & Treacy's 1982 study noted that:
Your staff really can't help you think. The problem with giving a question to the
staff is that they provide you with the answer. You learn the nature of the real question
you should have asked when you muck around in the data (p. 9).
A third purpose of an EIS is commonly misperceived. An EIS has a powerful
ability to direct management attention to specific areas of the organization or specific
business problems. Some managers see this as an opportunity to discipline subordinates.
Some subordinates fear the directive nature of the system and spend a great deal of time
trying to outwit or discredit it. Neither of these behaviours is appropriate or productive.
Rather, managers and subordinates can work together to determine the root causes of
issues highlighted by the EIS.
The powerful focus of an EIS is due to the maxim "what gets measured gets
done." Managers are particularly attentive to concrete information about their
performance when it is available to their superiors. This focus is very valuable to an
organization if the information reported is actually important and represents a balanced
view of the organization's objectives.
Misaligned reporting systems can result in inordinate management attention to
things that are not important or to things which are important but to the exclusion of other
equally important things. For example, a production reporting system might lead
Sikkim Manipal University – MI0036
managers to emphasize volume of work done rather than quality of work. Worse yet,
productivity might have little to do with the organization's overriding customer service
objectives.
Contents of EIS
A general answer to the question of what data is appropriate for inclusion in an
Executive Information System is "whatever is interesting to executives." While this
advice is rather simplistic, it does reflect the variety of systems currently in use.
Executive Information Systems in government have been constructed to track data about
Ministerial correspondence, case management, worker productivity, finances, and human
resources to name only a few. Other sectors use EIS implementations to monitor
information about competitors in the news media and databases of public information in
addition to the traditional revenue, cost, volume, sales, market share and quality
applications.
Frequently, EIS implementations begin with just a few measures that are clearly
of interest to senior managers, and then expand in response to questions asked by those
managers as they use the system. Over time, the presentation of this information becomes
stale, and the information diverges from what is strategically important for the
organization. A "Critical Success Factors" approach is recommended by many
management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).
Practitioners such as Vandenbosch (1993) found that:
While our efforts usually met with initial success, we often found that after six
months to a year, executives were almost as bored with the new information as they had
been with the old. A strategy we developed to rectify this problem required organizations
to create a report of the month. That is, in addition to the regular information provided for
management committee meetings, the CEO was charged with selecting a different
indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).
Sikkim Manipal University – MI0036
While the above indicates that selection of data for inclusion in an EIS is difficult,
there are several guidelines that help to make that assessment. A practical set of
principles to guide the design of measures and indicators to be included in an EIS is
presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting
measures that reflect organizational objectives, see the section "EIS and Organizational
Objectives."
EIS measures must be easy to understand and collect. Wherever possible, data
should be collected naturally as part of the process of work. An EIS should not add
substantially to the workload of managers or staff.
EIS measures must be based on a balanced view of the organization's objective.
Data in the system should reflect the objectives of the organization in the areas of
productivity, resource management, quality and customer service.
Performance indicators in an EIS must reflect everyone's contribution in a fair and
consistent manner. Indicators should be as independent as possible from variables outside
the control of managers.
EIS measures must encourage management and staff to share ownership of the
organization's objectives. Performance indicators must promote both team-work and
friendly competition. Measures will be meaningful for all staff; people must feel that
they, as individuals, can contribute to improving the performance of the organization.
EIS information must be available to everyone in the organization. The objective
is to provide everyone with useful information about the organization's performance.
Information that must remain confidential should not be part of the EIS or the
management system of the organization.
EIS measures must evolve to meet the changing needs of the organization.
Barriers to Effectiveness
There are many ways in which an EIS can fail. Dozens of high profile, high cost
EIS projects have been cancelled, implemented and rarely used, or implemented and used
with negative results. An EIS is a high risk project precisely because it is intended for use
Sikkim Manipal University – MI0036
by the most powerful people in an organization. Senior managers can easily misuse the
information in the system with strongly detrimental effects on the organization. Senior
managers can refuse to use a system if it does not respond to their immediate personal
needs or is too difficult to learn and use.
To make effective use of an EIS, mangers must have the self-confidence to accept
negative results and focus on the resolution of problems rather than on denial and blame.
Since organizations with limited exposure to planning and targeting, data-based decision-
making, statistical process control, and team-based work models may not have dealt with
these behavioural issues in the past, they are more likely to react defensively and reject an
EIS.
Technical Excellence
An interesting result from the Vandenbosch & Huff (1988) study was that the
technical excellence of an EIS has an inverse relationship with effectiveness. Systems
that are technical masterpieces tend to be inflexible, and thus discourage innovation,
experimentation and mental model development.
Flexibility is important because an EIS has such a powerful ability to direct
attention to specific issues in an organization. A technical masterpiece may accurately
direct management attention when the system is first implemented, but continue to direct
attention to issues that were important a year ago on its first anniversary. There is
substantial danger that the exploration of issues necessary for managerial learning will be
limited to those subjects that were important when the EIS was first developed. Managers
must understand that as the organization and its work changes, an EIS must continually
be updated to address the strategic issues of the day.
A number of explanations as to why technical masterpieces tend to be less
flexible are possible. Developers who create a masterpiece EIS may become attached to
the system and consciously or unconsciously dissuade managers from asking for changes.
Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece
EIS may not want to spend more on system maintenance and improvements. The time
required to create a masterpiece EIS may mean that it is outdated before it is
implemented.
Sikkim Manipal University – MI0036
While usability and response time are important factors in determining whether
executives will use a system, cost and flexibility are paramount. A senior manager will be
more accepting of an inexpensive system that provides 20% of the needed information
within a month or two than with an expensive system that provides 80% of the needed
information after a year of development. The manager may also find that the inexpensive
system is easier to change and adapt to the evolving needs of the business. Changing a
large system would involve throwing away parts of a substantial investment. Changing
the inexpensive system means losing a few weeks of work. As a result, fast, cheap,
incremental approaches to developing an EIS increase the chance of success.
Technical Problems
Paradoxically, technical problems are also frequently reported as a significant
barrier to EIS success. The most difficult technical problem -- that of integrating data
from a wide range of data sources both inside and outside the organization -- is also one
of the most critical issues for EIS users. A marketing vice-president, who had spent
several hundred thousand dollars on an EIS, attended a final briefing on the system. The
technical experts demonstrated the many graphs and charts of sales results, market share
and profitability. However, when the vice-president asked for a graph of market share
and advertising expense over the past ten years, the system was unable to access
historical data. The project was cancelled in that meeting.
The ability to integrate data from many different systems is important because it
allows managerial learning that is unavailable in other ways. The president of a
manufacturing company can easily get information about sales and manufacturing from
the relevant VPs. Unfortunately, the information the president receives will likely be
incompatible, and learning about the ways in which sales and manufacturing processes
influence each other will not be easy. An EIS will be particularly effective if it can
overcome this challenge, allowing executives to learn about business processes that cross
organizational boundaries and to compare business results in disparate functions.
Sikkim Manipal University – MI0036
Another technical problem that can kill EIS projects is usability. Senior managers
simply have the choice to stop using a system if they find it too difficult to learn or use.
They have very little time to invest in learning the system, a low tolerance for errors, and
initially may have very little incentive to use it. Even if the information in the system is
useful, a difficult interface will quickly result in the manager assigning an analyst to
manipulate the system and print out the required reports. This is counter-productive
because managerial learning is enhanced by the immediacy of the question - answer
learning cycle provided by an EIS. If an analyst is interacting with the system, the analyst
will acquire more learning than the manager, but will not be in a position to put that
learning to its most effective use.
Usability of Executive Information Systems can be enhanced through the use of
prototyping and usability evaluation methods. These methods ensure that clear
communication occurs between the developers of the system and its users. Managers
have an opportunity to interact with systems that closely resemble the functionality of the
final system and thus can offer more constructive criticism than they might be able to
after reading an abstract specification document. Systems developers also are in a
position to listen more openly to criticisms of a system since a prototype is expected to be
disposable. Several evaluation protocols are available including observation and
monitoring, software logging, experiments and benchmarking, etc. (Preece et al, 1994).
The most appropriate methods for EIS design are those with an ethnographic flavour
because the experience base of system developers is typically so different from that of
their user population (senior executives).
examples of this sort of destructive reporting. Grant, Higgins and Irving (1988) report the
account of an employee working under a misaligned reporting system.
I like the challenge of solving customer problems, but they get in the way of
hitting my quota. I'd like to get rid of the telephone work. If (the company) thought
dealing with customers was important, I'd keep it; but if it's just going to be production
that matters, I'd gladly give all the calls to somebody else.
Traditional cost accounting systems are also often misaligned with organizational
objectives, and placing these measures in an EIS will continue to draw attention to the
wrong things. Cost accounting allocates overhead costs to direct labour hours. In some
cases the overhead burden on each direct labour hour is as much as 1000%. A manager
operating under this system might decide to sub-contract 100 hours of direct labor at $20
per hour. On the books, this $2,000 saving is accompanied by $20,000 of savings in
overhead. If the sub-contractor charges $5,000 for the work, the book savings are $2,000
+ $20,000 - $5,000 = $17,000. In reality, however, the overhead costs for an idle machine
in a factory do not go down much at all. The sub-contract actually ends up costing $5,000
- $2,000 = $3,000. (Peters, 1987)
prefer to assign blame rather than discover the true root cause of problems. The culture
of an organization can have a dramatic influence on the adoption and use of an Executive
Information System. The following cultural characteristics will contribute directly to the
success or failure of an EIS project.
Learning vs Blaming
A learning organization is one that seeks first to understand why a problem
occurred, and not who is to blame. It is a common and natural response for managers to
try to deflect responsibility for a problem on to someone else. An EIS can help to do this
by indicating very specifically who failed to meet a statistical target, and by how much. A
senior manager, armed with EIS data, can intimidate and blame the appropriate person.
The blamed person can respond by questioning the integrity of the system, blaming
someone else, or even reacting in frustration by slowing work down further.
In a learning organization, any unusual result is seen as an opportunity to learn
more about the business and its processes. Managers who find an unusual statistic explore
it further, breaking it down to understand its components and comparing it with other
numbers to establish cause and effect relationships. Together as a team, management uses
numerical results to focus learning and improve business processes across the
organization. An EIS facilitates this approach by allowing instant exploration of a
number, its components and its relationship to other numbers.
can be helpful in responding to this sort of crisis by providing instant data about the
actual facts of the situation. However, this use of the EIS does little to prevent future
crises.
An organizational culture in which continual improvement is the norm can use the
EIS as an early warning system pointing to issues that have not yet reached the crisis
point, but are perhaps the most important areas on which to focus management attention
and learning. Organizations with a culture of continuous improvement already have an
appetite for the sort of data an EIS can provide, and thus will exhibit less resistance.
Crockett (1992) suggests a four step process for developing EIS information
requirements based on a broader understanding of organizational objectives. The steps
are: (1) identify critical success factors and stakeholder expectations, (2) document
performance measures that monitor the critical success factors and stakeholder
Sikkim Manipal University – MI0036
expectations, (3) determine reporting formats and frequency, and (4) outline information
flows and how information can be used. Crockett begins with stakeholders to ensure that
all relevant objectives and critical success factors are reflected in the EIS.
Kaplan and Norton (1992) suggest that goals and measures need to be developed
from each of four perspectives: financial, customer, internal business, and innovation and
learning. These perspectives help managers to achieve a balance in setting objectives, and
presenting them in a unified report exposes the tough tradeoffs in any management
system. An EIS built on this basis will not promote productivity while ignoring quality,
or customer satisfaction while ignoring cost.
Meyer (1994) raises several questions that should be asked about measurement
systems for teams. Four are appropriate for evaluating objectives and measures
represented in an EIS. They are:
Methodology
Implementation of an effective EIS requires clear consensus on the objectives and
measures to be monitored in the system and a plan for obtaining the data on which those
measures are based. The sections below outline a methodology for achieving these two
results. As noted earlier, successful EIS implementations generally begin with a simple
Sikkim Manipal University – MI0036
prototype rather than a detailed planning process. For that reason, the proposed planning
methodologies are as simple and scope-limited as possible.
Q.6 Discuss the challenges involved in data integration and coordination process?
Answer. One of the most fundamental challenges in the process of data integration is
setting realistic expectations. The term data integration conjures a perfect coordination of
diversified databases, software, equipment, and personnel into a smoothly functioning
alliance, free of the persistent headaches that mark less comprehensive systems of
information management. Think again.
The requirements analysis stage offers one of the best opportunities in the process
to recognize and digest the full scope of complexity of the data integration task.
Thorough attention to this analysis is possibly the most important ingredient in creating a
system that will live to see adoption and maximum use.
Heterogeneous Data
Challenges
in data format. Legacy systems may have been created around flat file, network, or
hierarchical databases, unlike newer generations of databases which use relational data.
Data in different formats from external sources continue to be added to the legacy
databases to improve the value of the information. Each generation, product, and home-
grown system has unique demands to fulfill in order to store or extract data. So data
integration can involve various strategies for coping with heterogeneity. In some cases,
the effort becomes a major exercise in data homogenization, which may not enhance the
quality of the data offered.
Strategies
Bad Data
Challenges
Data quality is a top concern in any data integration strategy. Legacy data must be
cleaned up prior to conversion and integration, or an agency will almost certainly face
Sikkim Manipal University – MI0036
serious data problems later. Legacy data impurities have a compounding effect; by
nature, they tend to concentrate around high volume data users.
If this information is corrupt, so, too, will be the decisions made from it. It is not
unusual for undiscovered data quality problems to emerge in the process of cleaning
information for use by the integrated system. The issue of bad data leads to procedures
for regularly auditing the quality of information used. But who holds the ultimate
responsibility for this job is not always clear.
Strategies:
The issue of data quality exists throughout the life of any data integration system.
So it is best to establish both practices and responsibilities right from the start, and
make provisions for each to continue in perpetuity.
The best processes result when developers and users work together to determine
the quality controls that will be put in place in both the development phase and
the ongoing use of the system.
Challenges
The unanticipated need for additional performance and capacity is one of the most
common challenges to data integration, particularly in data warehousing. Two storage-
related requirements generally come into play: extensibility and scalability. Anticipating
the extent of growth in an environment in which the need for storage can increase
exponentially once a system is initiated drives fears that the storage cost will exceed the
benefit of data integration. Introducing such massive quantities of data can push the limits
of hardware and software. This may force developers to instigate costly fixes if an
architecture for processing much larger amounts of data must be retrofitted into the
planned system.
Sikkim Manipal University – MI0036
Strategies
Alternative storage is becoming routine for data warehouses that are likely to
grow in size. Planning for such options helps keep expanding databases
affordable.
The cost per gigabyte of storage on disk drives continues to decline as technology
improves. From 2000 to 2004, for instance, the cost of data storage declined ten-
fold. High-performance storage disks are expected to follow the downward
pricing spiral.
Unanticipated Costs
Challenges
Data integration costs are fueled largely by items that are difficult for the
uninitiated to quantify, and thus predict. These might include:
Labor costs for initial planning, evaluation, programming and additional data
acquisition
Both labor and the direct costs of data storage and maintenance
The investment in time and labor required to extract, clean, load, and maintain data
can creep if the quality of the data presented is weak. It is not unusual for this to produce
unanticipated labor costs that are rather alarmingly out of proportion to the total project
budget.
Strategies
The approach to estimating project costs must be both far-sighted and realistic.
This requires an investment in experienced analysts, as well as cooperation, where
possible, among sister agencies on lessons learned.
Special effort should be made to identify items that may seem unlikely but could
dramatically impact total project cost.
A viable data integration approach must recognize that the better data integration
works for users, the more fundamental it will become to business processes. This
level of use must be supported by consistent maintenance. It might be tempting to
think that a well designed system will, by nature, function without much upkeep
Sikkim Manipal University – MI0036
or tweaking. In fact, the best systems and processes tend to thrive on the routine
care and support of well-trained personnel, a fact that wise managers generously
anticipate in the data integration plan and budget.
Challenges
User groups within an agency may have developed databases on their own,
sometimes independently from information systems staff, that are highly responsive to
the users' particular needs. It is natural that owners of these functioning standalone units
might be skeptical that the new system would support their needs as effectively.
Other proprietary interests may come into play. For example, division staff may
not want the data they collect and track to be at all times transparently visible to
headquarters staff without the opportunity to address the nuances of what the data appear
to show. Owners or users may fear that higher ups without appreciation of the
peculiarities of a given method of operation will gain more control over how data is
collected and accessed organization-wide.
Strategies
Sikkim Manipal University – MI0036
Informing and involving the diversity of players during the crucial requirements
analysis stage, and then in each subsequent phase and step, is probably the single
most effective way to gain buy-in, trust, and cooperation. Collecting and
addressing each user's concerns may be a daunting proposition, particularly for
knowledgeable information professionals who prefer to "cut to the chase."
However, without a personal stake in the process and a sense of ownership of the
final product, the long-term health of this major investment is likely to be
compromised by users who feel that change has been enforced upon them rather
than designed to advance their interests.
Peer Perspectives...
At least three conditions were required for the success of Virginia DOT's
development effort:
Upper management had to support the business objectives of the project and the
creation of a new system to meet the objectives
Project managers had to receive the budget, staff, and IT resources necessary to
initiate and complete the process
All stakeholders and eventual system users from the agency's districts and
headquarters had to cooperate with the project team throughout the process(22)
Challenges
Also, the process of transferring historical data from its independent source to the
integrated system may benefit from the knowledge of the manager who originally
captured and stored the information. High turnover in such positions, along with early
retirements and other personnel shifts driven by an historically tight budget environment,
may complicate the mining and preparation of this data for convergence with the new
system.
Sikkim Manipal University – MI0036
Strategies
A seasoned and highly knowledgeable data integration project leader and a data
manager with state of the practice experience are the minimum required to design
a viable approach to integration. Choosing this expertise very carefully can help
ensure that the resulting architecture is sufficiently modular, can be maintained,
and is robust enough to support a wide range of owner and user needs while
remaining flexible enough to accommodate changing transportation decision-
support requirements over a period of years.
Challenges
Strategies
A business goes through stages of development similar to the cycle of life for the
human race. Parenting strategies that work for your toddler can not be applied to your
teenager. The same goes for your small business. It will be faced with a different cycle
throughout its life. What you focus on today will change and require different approaches
to be successful.
Seed
The seed stage of your business life cycle is when your business is just a thought or an
idea. This is the very conception or birth of a new business.
Challenge: Most seed stage companies will have to overcome the challenge of
market acceptance and pursue one niche opportunity. Do not spread money and time
resources too thin.
Focus: At this stage of the business the focus is on matching the business opportunity
with your skills, experience and passions. Other focal points include: deciding on a
business ownership structure, finding professional advisors, and business planning.
Money Sources: Early in the business life cycle with no proven market or customers
the business will rely on cash from owners, friends and family. Other potential
sources include suppliers, customers, government grants and banks.
Start-Up
Your business is born and now exists legally. Products or services are in production and
you have your first customers.
Challenge: If your business is in the start-up life cycle stage, it is likely you have
overestimated money needs and the time to market. The main challenge is not to
burn through what little cash you have. You need to learn what profitable needs
your clients have and do a reality check to see if your business is on the right track.
Focus: Start-ups require establishing a customer base and market presence along with
tracking and conserving cash flow.
Money Sources: Owner, friends, family, suppliers, customers, grants, and banks.
Sikkim Manipal University – MI0036
WNB products to consider: Seed Stage Products / Working Capital Loan / Line of
Credit / Equipment Financing / Business Internet Banking / Bill Payer / Credit Card
Processing
Growth
Your business has made it through the toddler years and is now a child. Revenues and
customers are increasing with many new opportunities and issues. Profits are strong, but
competition is surfacing.
Challenge: The biggest challenge growth companies face is dealing with the constant
range of issues bidding for more time and money. Effective management is required
and a possible new business plan. Learn how to train and delegate to conquer this
stage of development.
Focus: Growth life cycle businesses are focused on running the business in a more
formal fashion to deal with the increased sales and customers. Better accounting and
management systems will have to be set-up. New employees will have to be hired to
deal with the influx of business.
Established
Your business has now matured into a thriving company with a place in the market and
loyal customers. Sales growth is not explosive but manageable. Business life has become
more routine.
Challenge: It is far too easy to rest on your laurels during this life stage. You have
worked hard and have earned a rest but the marketplace is relentless and
Sikkim Manipal University – MI0036
competitive. Stay focused on the bigger picture. Issues like the economy,
competitors or changing customer tastes can quickly end all you have work for.
Expansion
This life cycle is characterized by a new period of growth into new markets and
distribution channels. This stage is often the choice of the business owner to gain a larger
market share and find new revenue and profit channels.
Challenge: Moving into new markets requires the planning and research of a seed or
start-up stage business. Focus should be on businesses that complement your
existing experience and capabilities. Moving into unrelated businesses can be
disastrous.
Focus: Add new products or services to existing markets or expand existing business
into new markets and customer types.
Money Sources: Joint ventures, banks, licensing, new investors and partners.
Mature
Sikkim Manipal University – MI0036
Year over year sales and profits tend to be stable, however competition remains fierce.
Eventually sales start to fall off and a decision is needed whether to expand or exit the
company.
Challenge: Businesses in the mature stage of the life cycle will be challenged with
dropping sales, profits, and negative cash flow. The biggest issue is how long the
business can support a negative cash flow. Ask is it time to move back to the
expansion stage or move on to the final life cycle stage...exit.
Focus: Search for new opportunities and business ventures. Cutting costs and finding
ways to sustain cash flow are vital for the mature stage.
Exit
This is the big opportunity for your business to cash out on all the effort and years of hard
work. Or it can mean shutting down the business.
Challenge: Selling a business requires your realistic valuation. It may have been
years of hard work to build the company, but what is its real value in the current
market place. If you decide to close your business, the challenge is to deal with the
financial and psychological aspects of a business loss.
Focus: Get a proper valuation on your company. Look at your business operations,
management and competitive barriers to make the company worth more to the
buyer. Set-up legal buy-sell agreements along with a business transition plan.
Money Sources: Find a business valuation partner. Consult with your accountant and
financial advisors for the best tax strategy to sell or close-out down business.
Sikkim Manipal University – MI0036
including aggregates, multi-table joins and drill-downs, have become drivers for different
technological approaches to the data warehouse database. These approaches include:
Parallel relational database designs for scalability that include shared-memory,
shared disk, or shared-nothing models implemented on various multiprocessor
configurations (symmetric multiprocessors or SMP, massively parallel processors
or MPP, and/or clusters of uni- or multiprocessors).
An innovative approach to speed up a traditional RDBMS by using new index
structures to bypass relational table scans.
Multidimensional databases (MDDBs) that are based on proprietary database
technology; conversely, a dimensional data model can be implemented using a
familiar RDBMS. Multi-dimensional databases are designed to overcome any
limitations placed on the warehouse by the nature of the relational data model.
MDDBs enable on-line analytical processing (OLAP) tools that architecturally
belong to a group of data warehousing components jointly categorized as the data
query, reporting, analysis and mining tools.
Meta data
Meta data is data about data that describes the data warehouse. It is used for
building, maintaining, managing and using the data warehouse. Meta data can be
classified into:
Technical meta data, which contains information about warehouse data for use by
warehouse designers and administrators when carrying out warehouse
development and management tasks.
Business meta data, which contains information that gives users an easy-to-
understand perspective of the information stored in the data warehouse.
Sikkim Manipal University – MI0036
Access Tools
The principal purpose of data warehousing is to provide information to business
users for strategic decision-making. These users interact with the data warehouse using
front-end tools. Many of these tools require an information specialist, although many end
users develop expertise in the tools. Tools fall into four main categories: query and
reporting tools, application development tools, online analytical processing tools, and
data mining tools.
Query and Reporting tools can be divided into two groups: reporting tools and
managed query tools. Reporting tools can be further divided into production reporting
tools and report writers. Production reporting tools let companies generate regular
Sikkim Manipal University – MI0036
operational reports or support high-volume batch jobs such as calculating and printing
paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for
end-users.
Managed query tools shield end users from the complexities of SQL and database
structures by inserting a metalayer between users and the database. These tools are
designed for easy-to-use, point-and-click operations that either accept SQL or generate
SQL database queries.
Often, the analytical needs of the data warehouse user community exceed the
built-in capabilities of query and reporting tools. In these cases, organizations will often
rely on the tried-and-true approach of in-house application development using graphical
development environments such as PowerBuilder, Visual Basic and Forte. These
application development platforms integrate well with popular OLAP tools and access all
major database systems including Oracle, Sybase, and Informix.
OLAP tools are based on the concepts of dimensional data models and
corresponding databases, and allow users to analyze the data using elaborate,
multidimensional views. Typical business applications include product performance and
profitability, effectiveness of a sales program or marketing campaign, sales forecasting
and capacity planning. These tools assume that the data is organized in a
multidimensional model.
A critical success factor for any business today is the ability to use information
effectively. Data mining is the process of discovering meaningful new correlations,
patterns and trends by digging into large amounts of data stored in the warehouse using
artificial intelligence, statistical and mathematical techniques.
Data Marts
The concept of a data mart is causing a lot of excitement and attracts much
attention in the data warehouse industry. Mostly, data marts are presented as an
alternative to a data warehouse that takes significantly less time and money to build.
Sikkim Manipal University – MI0036
However, the term data mart means different things to different people. A rigorous
definition of this term is a data store that is subsidiary to a data warehouse of integrated
data. The data mart is directed at a partition of data (often called a subject area) that is
created for the use of a dedicated group of users. A data mart might, in fact, be a set of
denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on
the data warehouse rather than a physically separate store of data. In most instances,
however, the data mart is a physically separate store of data and is resident on separate
database server, often a local area network serving a dedicated user group. Sometimes the
data mart simply comprises relational OLAP technology which creates highly
denormalized dimensional model (e.g., star schema) implemented on a relational
database. The resulting hypercubes of data are used for analysis by groups of users with a
common interest in a limited portion of the database.
These types of data marts, called dependent data marts because their data is
sourced from the data warehouse, have a high value because no matter how they are
deployed and how many different enabling technologies are used, different users are all
accessing the information views derived from the single integrated version of the data.
Unfortunately, the misleading statements about the simplicity and low cost of data
marts sometimes result in organizations or vendors incorrectly positioning them as an
alternative to the data warehouse. This viewpoint defines independent data marts that in
fact, represent fragmented point solutions to a range of business problems in the
enterprise. This type of implementation should be rarely deployed in the context of an
overall technology or applications architecture. Indeed, it is missing the ingredient that is
at the heart of the data warehousing concept -- that of data integration. Each independent
data mart makes its own assumptions about how to consolidate the data, and the data
across several data marts may not be consistent.
Moreover, the concept of an independent data mart is dangerous -- as soon as the
first data mart is created, other organizations, groups, and subject areas within the
enterprise embark on the task of building their own data marts. As a result, you create an
Sikkim Manipal University – MI0036
environment where multiple operational systems feed multiple non-integrated data marts
that are often overlapping in data content, job scheduling, connectivity and management.
In other words, you have transformed a complex many-to-one problem of building a data
warehouse from operational and external data sources to a many-to-many sourcing and
management nightmare.
Q.3. Discuss data extraction process? What are the various methods being used for
data extraction?
The source systems for a data warehouse are typically transaction processing
applications. For example, one of the source systems for a sales analysis data warehouse
might be an order entry system that records all of the current order activities.
Designing and creating the extraction process is often one of the most time-
consuming tasks in the ETL process and, indeed, in the entire data warehousing process.
The source systems might be very complex and poorly documented, and thus determining
which data needs to be extracted can be difficult. The data has to be extracted normally
not only once, but several times in a periodic manner to supply all changed data to the
data warehouse and keep it up-to-date. Moreover, the source system typically cannot be
modified, nor can its performance or availability be adjusted, to accommodate the needs
of the data warehouse extraction process.
These are important considerations for extraction and ETL in general. This
chapter, however, focuses on the technical considerations of having different kinds of
sources and extraction methods. It assumes that the data warehouse team has already
identified the data that will be extracted, and discusses common techniques used for
extracting data from source databases.
Designing this process means making decisions about the following two main
aspects:
This influences the source system, the transportation process, and the time needed
for refreshing the warehouse.
This influences the transportation method, and the need for cleaning and
transforming the data.
The extraction method you should choose is highly dependent on the source
system and also from the business needs in the target data warehouse environment. Very
Sikkim Manipal University – MI0036
often, there is no possibility to add additional logic to the source systems to enhance an
incremental extraction of data due to the performance or the increased workload of these
systems. Sometimes even the customer is not allowed to add anything to an out-of-the-
box application system.
The estimated amount of the data to be extracted and the stage in the ETL process
(initial load or maintenance of data) may also impact the decision of how to extract, from
a logical and a physical perspective. Basically, you have to decide how to extract data
logically and physically.
Incremental Extraction
Full Extraction
The data is extracted completely from the source system. Because this extraction
reflects all the data currently available on the source system, there's no need to keep track
of changes to the data source since the last successful extraction. The source data will be
provided as-is and no additional logical information (for example, timestamps) is
necessary on the source site. An example for a full extraction may be an export file of a
distinct table or a remote SQL statement scanning the complete source table.
Incremental Extraction
At a specific point in time, only the data that has changed since a well-defined
event back in history will be extracted. This event may be the last time of extraction or a
more complex business event like the last booking day of a fiscal period. To identify this
delta change there must be a possibility to identify all the changed information since this
specific time event. This information can be either provided by the source data itself such
as an application column, reflecting the last-changed timestamp or a change table where
Sikkim Manipal University – MI0036
an appropriate additional mechanism keeps track of the changes besides the originating
transactions. In most cases, using the latter method means adding extraction logic to the
source system.
Many data warehouses do not use any change-capture techniques as part of the
extraction process. Instead, entire tables from the source systems are extracted to the data
warehouse or staging area, and these tables are compared with a previous extract from the
source system to identify the changed data. This approach may not have significant
impact on the source systems, but it clearly can place a considerable burden on the data
warehouse processes, particularly if the data volumes are large.
Oracle's Change Data Capture mechanism can extract and maintain such delta
information. See Chapter 16, " Change Data Capture" for further details about the Change
Data Capture framework.
Depending on the chosen logical extraction method and the capabilities and
restrictions on the source side, the extracted data can be physically extracted by two
mechanisms. The data can either be extracted online from the source system or from an
offline structure. Such an offline structure might already exist or it might be generated by
an extraction routine.
Online Extraction
Offline Extraction
Online Extraction
The data is extracted directly from the source system itself. The extraction process can
connect directly to the source system to access the source tables themselves or to an
intermediate system that stores the data in a preconfigured manner (for example, snapshot
Sikkim Manipal University – MI0036
logs or change tables). Note that the intermediate system is not necessarily physically
different from the source system.
With online extractions, you need to consider whether the distributed transactions are
using original source objects or prepared source objects.
Offline Extraction
The data is not extracted directly from the source system but is staged explicitly outside
the original source system. The data already has an existing structure (for example, redo
logs, archive logs or transportable tablespaces) or was created by an extraction routine.
Flat files
Dump files
Transportable tablespaces
When it is possible to efficiently identify and extract only the most recently
changed data, the extraction process (as well as all downstream operations in the ETL
process) can be much more efficient, because it must extract a much smaller volume of
data. Unfortunately, for many source systems, identifying the recently modified data may
be difficult or intrusive to the operation of the system. Change Data Capture is typically
the most challenging technical issue in data extraction.
Because change data capture is often desirable as part of the extraction process and it
might not be possible to use the Change Data Capture mechanism, this section describes
several techniques for implementing a self-developed change capture on Oracle Database
source systems:
Timestamps
Partitioning
Triggers
These techniques are based upon the characteristics of the source systems, or may require
modifications to the source systems. Thus, each of these techniques must be carefully
evaluated by the owners of the source system prior to implementation.
Sikkim Manipal University – MI0036
Each of these techniques can work in conjunction with the data extraction technique
discussed previously. For example, timestamps can be used whether the data is being
unloaded to a file or accessed through a distributed query. See Chapter 16, " Change Data
Capture" for further details.
Timestamps
The tables in some operational systems have timestamp columns. The timestamp
specifies the time and date that a given row was last modified. If the tables in an
operational system have columns containing timestamps, then the latest data can easily be
identified using the timestamp columns. For example, the following query might be
useful for extracting today's data from an orders table:
Partitioning
Some source systems might use range partitioning, such that the source tables are
partitioned along a date key, which allows for easy identification of new data. For
example, if you are extracting from an orders table, and the orders table is partitioned by
week, then it is easy to identify the current week's data.
Triggers
exact time and date when a given row was last modified. You do this by creating a trigger
on each source table that requires change data capture. Following each DML statement
that is executed on the source table, this trigger updates the timestamp column with the
current time. Thus, the timestamp column provides the exact time and date when a given
row was last modified.
Materialized view logs rely on triggers, but they provide an advantage in that the
creation and maintenance of this change-data system is largely managed by the database.
Most database systems provide mechanisms for exporting or unloading data from
the internal database format into flat files. Extracts from mainframe systems often use
COBOL programs, but many databases, as well as third-party software vendors, provide
export or unload utilities.
Data extraction does not necessarily mean that entire database structures are
unloaded in flat files. In many cases, it may be appropriate to unload entire database
tables or objects. In other cases, it may be more appropriate to unload only a subset of a
given table such as the changes on the source system since the last extraction or the
results of joining multiple tables together. Different extraction techniques vary in their
capabilities to support these two scenarios.
When the source system is an Oracle database, several alternatives are available for
extracting data into files:
The most basic technique for extracting data is to execute a SQL query in
SQL*Plus and direct the output of the query to a file. For example, to extract a flat file,
country_city.log, with the pipe sign as delimiter between column values, containing a list
Sikkim Manipal University – MI0036
of the cities in the US in the tables countries and customers, the following SQL script
could be run:
The exact format of the output file can be specified using SQL*Plus system
variables.
SPOOL order_jan.dat
SELECT * FROM orders PARTITION (orders_jan1998);
SPOOL OFF
files can be used as is for a parallel load with 12 SQL*Loader sessions. See Chapter 13, "
Transportation in Data Warehouses" for an example.
Even if the orders table is not partitioned, it is still possible to parallelize the
extraction either based on logical or physical criteria. The logical method is based on
logical ranges of column values, for example:
Note that all parallel techniques can use considerably more CPU and I/O
resources on the source system, and the impact on the source system should be evaluated
before parallelizing any extraction technique.
OCI programs (or other programs using Oracle call interfaces, such as Pro*C
programs), can also be used to extract data. These techniques typically provide improved
performance over the SQL*Plus approach, although they also require additional
programming. Like the SQL*Plus approach, an OCI program can extract the results of
Sikkim Manipal University – MI0036
any SQL query. Furthermore, the parallelization techniques described for the SQL*Plus
approach can be readily applied to OCI programs as well.
When using OCI or SQL*Plus for extraction, you need additional information
besides the data itself. At minimum, you need information about the extracted columns. It
is also helpful to know the extraction format, which might be the separator between
distinct columns.
The Export utility allows tables (including data) to be exported into Oracle Database
export files. Unlike the SQL*Plus and OCI approaches, which describe the extraction of
the results of a SQL statement, Export provides a mechanism for extracting database
objects. Thus, Export differs from the previous approaches in several important ways:
The export files contain metadata as well as data. An export file contains not only
the raw data of a table, but also information on how to re-create the table,
potentially including any indexes, constraints, grants, and other attributes
associated with that table.
A single export file may contain a subset of a single object, many database
objects, or even an entire schema.
Export cannot be directly used to export the results of a complex SQL query.
Export can be used only to extract subsets of distinct database objects.
The output of the Export utility must be processed using the Import utility.
Oracle provides the original Export and Import utilities for backward compatibility
and the data pump export/import infrastructure for high-performant, scalable and parallel
extraction. See Oracle Database Utilities for further details.
Sikkim Manipal University – MI0036
In addition to the Export Utility, you can use external tables to extract the results
from any SELECT operation. The data is stored in the platform independent, Oracle-
internal data pump format and can be processed as regular external table on the target
system. The following example extracts the result of a join operation in parallel into the
four specified files. The only allowed external table type for extracting data is the Oracle-
internal format ORACLE_DATAPUMP.
The total number of extraction files specified limits the maximum degree of
parallelism for the write operation. Note that the parallelizing of the extraction does not
automatically parallelize the SELECT portion of the statement.
Unlike using any kind of export/import, the metadata for the external table is not
part of the created files when using the external table data pump unload. To extract the
appropriate metadata for the external table, use the DBMS_METADATA package, as
illustrated in the following statement:
Using distributed-query technology, one Oracle database can directly query tables
located in various different source systems, such as another Oracle database or a legacy
system connected with the Oracle gateway technology. Specifically, a data warehouse or
staging database can directly access tables and data located in a connected source system.
Gateways are another form of distributed-query technology. Gateways allow an Oracle
database (such as a data warehouse) to access database tables stored in remote, non-
Oracle databases. This is the simplest method for moving data between two Oracle
databases because it combines the extraction and transformation into a single step, and
requires minimal programming. However, this is not always feasible.
Suppose that you wanted to extract a list of employee names with department
names from a source database and store this data into the data warehouse. Using an
Oracle Net connection and distributed-query technology, this can be achieved using a
single SQL statement:
This statement creates a local table in a data mart, country_city, and populates it
with data from the countries and customers tables on the source system.
This technique is ideal for moving small volumes of data. However, the data is
transported from the source system to the data warehouse through a single Oracle Net
connection. Thus, the scalability of this technique is limited. For larger data volumes,
file-based data extraction and transportation techniques are often more scalable and thus
more appropriate.
Sikkim Manipal University – MI0036
Answer. OLAP tools take you a step beyond query and reporting tools. Via OLAP tools,
data is represented using a multidimensional model rather than the more traditional
tabular data model. The traditional model defines a database schema that focuses on
modeling a process of function, and the information is viewed as a set of transactions,
each which occurred at some single point in time. The multidimensional model usually
defines a star schema, viewing data not as a single event but rather as the cumulative
effect of events over some period of time, such as weeks, then months, then years. With
OLAP tools, the user generally vies the data in grids or corsstabs that can be pivoted to
offer different perspectives on the data. OLAP also enables interactive querying of the
data. For example, a user can look at information at one aggregation (such as a sales
region) and then drill down to more detail information, such as sales by state, then city,
then store.
OLAP tools do not indicate how the data is actually stored. Given that, it’s not
surprising that there are multiple ways to store the data, including storing the data in a
dedicated multidimensional database (also referred to as MOLAP or MDD). Examples
include Arbors Software’s Essbase and Oracle Express Server. The other choice involves
storing the data in relational databases and having an OLAP tool work directly against the
data, referred to as relational OLAP (also referred to as ROLAP or RDBMS). Examples
include MicroStrategy’s DSS server and related products, Informix’s Informix-
MetaCube, Information Advantage’s Decision Suite, and Platinum Technologies’
Plantinum InfoBeacon. (Some also include Red Brick’s Warehouse in this category, but it
isn’t really an OLAP tool. Rather, it is a relations database optimized for performing the
types of operations that ROLAP tools need.)
Sikkim Manipal University – MI0036
Relational Databases
ROLAP servers contain both numeric and textual data, serving a much wider
purpose than their MOLAP counterparts. Unlike MOLAP DBMSs (supported by
specialized database management systems). ROLAP DBMSs (or RDMBSs) are
supported by relational technology. RDBMSs support numeric, textual, spatial, audio,
graphic, and video data, general-purpose DSS analysis, freely structured data, numerous
indexes, and star schema’s. ROLAP servers can have both disciplined and ad hoc usage
and can contain both detailed and summarized data.
ROLAP supports large databases while enabling good performance, platform
portability, exploitation of hardware advances such as parallel processing, robust
Sikkim Manipal University – MI0036
Multidimensional Databases
MDDs deliver impressive query performance by pre-calculating or pre-
consolidating transactional data rather than calculating on-the-fly. (MDDs pre-calculate
and store every measure at every hierarchy summary level at load time and store them in
efficiently indexed cells for immediate retrieval.) However, to fully preconsolidate
incoming data, MDDs require an enormous amount of overhead both in processing time
and in storage. An input file of 200MB can easily expand to 5GB; obviously, a file this
size take many minutes to load and consolidate. As a result, MDDs do not scale, making
them a lackluster choice for the enterprise atomic-level data in the data warehouse.
However, MDDs are great candidates for the <50GB department data marts.
To manage large amounts of data, MDD servers aggregate data along hierarchies.
Not only do hierarchies provide a mechanism for aggregating data, they also provide a
technique for navigation. The ability to navigate data by zooming in and out of detail is
key. With MDDs, application design is essentially the definition of dimensions and
calculation rules, while the RDBMS requires that the database schema be a star or
snowflake. With MDDs, for example, it is common to see the structure of time separated
Sikkim Manipal University – MI0036
from the repletion of time. One dimension may be the structure of a year, month, quarter,
half-year, and year. A separate dimension might be different years: 1996, 1997, and so
on. Adding a new year to the MDD simply means adding a new member to the calendar
dimension. Adding a new year to a RDBMS usually requires that each month, quarter,
half-year and year also be added.
In General
Usually, a scaleable, parallel database is used for the large, atomic. organizationally-
structured data warehouse, and subsets or summarized data from the warehouse are
extracted and replicated to proprietary MDDs. Because MDD vendors have enabled drill-
through features, when a user reaches the limit of what is actually stored in the MDD and
seeks more detail data, he/she can drill through to the detail stored in the enterprise
database. However, the drill through functionality usually requires creating views for
every possible query.
As relational database vendors incorporate sophisticated analytical
multidimensional features into their core database technology, the resulting capacity for
higher performance salability and parallelism will enable more sophisticated analysis.
Proprietary database and nonitegrated relational OLAP query tool vendors will find it
difficult to compete with this integrated ROLAP solution.
Both storage methods have strengths and weaknesses -- the weaknesses, however,
are being rapidly addressed by the respective vendors. Currently, data warehouses are
predominantly built using RDBMSs. If you have a warehouse built on a relational
database and you want to perform OLAP analysis against it, ROLAP is a natural fit. This
isn’t to say that MDDs can’t be a part of your data warehouse solution. It’s just that
MDDs aren’t currently well-suited for large volumes of data (10-50GB is fine, but
anything over 50GB is stretching their capabilities). If your really want the functionality
benefits that come with MDD, consider subsetting the data into smaller MDD-based data
marts.
Sikkim Manipal University – MI0036
1) Performance: How fast will the system appear to the end-user? MDD server vendors
believe this is a key point in their favor. MDD server databases typically contain indexes
that provide direct access to the data, making MDD servers quicker when trying to solve
a multidimensional business problem. However, MDDs have significant performance
differences due to the differing ability of data models to be held in memory, sparsely
handling, and use of data compression. And, the relational database vendors argue that
they have developed performance improvement techniques, such as IBM’s DB2 Starburst
optimizer and Red Brick’s Warehouse VPT STARindex capabilities. (Before you use
performance as an objective measure for selecting an OLAP server, remember that OLAP
systems are about effectiveness (how to make better decisions), not efficiency (how to
make faster decisions).)
2) Data volume and scalability: While MDD servers can handle up to 50GB of storage,
RDBMS servers can handle hundreds of gigabytes and terabytes. And, although MDD
servers can require up to 50% less disk space than relational databases to store the same
amount of data (because of relational indexes and overhead), relational databases have
more capacity. MDD advocates believe that you should perform multidimensional
modeling on summary, not detail, information, thus mitigating the need for large
databases.
in addition to performance, data volume, and scalabiltiy, you should consider which
architecture better supports systems management and data distribution, which vendors
have a better user interface and functionality, which architecture is easier to understand,
which architecture better handles aggregation and complex calculations, and your
perception of open versus proprietary architectures. Besides these issues, you must also
consider which architecture will be a more strategic technology. In fact, MDD servers
Sikkim Manipal University – MI0036
and RDBMS products can be used together -- one for fast reposes, the other for access to
large databases.
What if?
IF
A. You require write access for What if? analysis
B. Your data is under 50 GB
C. Your timetable to implement is 60-90 days
D. You don’t have a DBA or data modeler personnel
E. You’re developing a general-purpose application for inventory movement or assets
management
THEN
Consider an MDD solution for your data mart (like Oracle Express, Arbor’s Essbase, and
Pilot’s Lightship)
IF
A. Your data is over 100 GB
B. You have a "read-only" requirement
THEN
Consider an RDBMS for your data mart.
IF
A. Your data is over 1TB
B. Need data mining at a detail level
If, you’ve decided to build a data mart using a MDD, you don’t need a data
modeler. Rather, you need an MDD data mart application builder who will design the
business model (identifying dimensions and defining business measures based on the
source systems identified.
Prior to building separate stove pipe data marts, understand that at some point you
will need to: 1) integrate and consolidate these data marts at the detail enterprise level; 2)
load the MDD data marts; and 3) drill through from the data marts to the detail. Note that
your data mart may outgrow the storage limitations an MDD, creating the need for an
Sikkim Manipal University – MI0036
RDMBS (in turn, requiring data modeling similar to constructing the detailed, atomic
enterprise-level RDBMS).
Q.5 what do you understand by the term statistical analysis? Discuss the most
important statistical techniques?
Answer. Data mining is a relatively new data analysis technique. It is very different from
query and reporting and multidimensional analysis in that is uses what is called a
discovery technique. That is, you do not ask a particular question of the data but rather
use specific algorithms that analyze the data and report what they have discovered.
Unlike query and reporting and multidimensional analysis where the user has to create
and execute queries based on hypotheses, data mining searches for answers to questions
that may have not been previously asked. This discovery could take the form of finding
significance in relationships between certain data elements, a clustering together of
specific data elements, or other patterns in the usage of specific sets of data elements.
After finding these patterns, the algorithms can infer rules. These rules can then be used
to generate a model that can predict a desired behavior, identify relationships among the
data, discover patterns, and group clusters of records with similar attributes.
Data mining is most typically used for statistical data analysis and knowledge discovery.
Statistical data analysis detects unusual patterns in data and applies statistical and
mathematical modeling techniques to explain the patterns. The models are then used to
forecast and predict. Types of statistical data analysis techniques include linear and
nonlinear analysis, regression analysis, multivariant analysis, and time series analysis.
Knowledge discovery extracts implicit, previously unknown information from the data.
This often results in uncovering unknown business facts.
Data mining is data driven (see Figure 4 on page 13). There is a high level of complexity
in stored data and data interrelations in the data warehouse that are difficult to discover
without data mining. Data mining offers new insights into the business that may not be
Sikkim Manipal University – MI0036
discovered with query and reporting or multidimensional analysis. Data mining can help
discover new insights about the business by giving us answers to questions we might
never have thought to ask.
Even within the scope of your data warehouse project, when mining data you want to
define a data scope, or possibly multiple data scopes. Because patterns are based on
various forms of statistical analysis, you must define a scope in which a statistically
significant pattern is likely to emerge. For example, buying patterns that show different
products being purchased together may differ greatly in different geographical locations.
To simply lump all of the data together may hide all of the patterns that exist in each
location. Of course, by imposing such a scope you are defining some, though not all, of
the business rules. It is therefore important that data scoping be done in concert with
someone knowledgeable in both the business and in statistical analysis so that artificial
patterns are not imposed and real patterns are not lost.
Data architecture modeling and advanced modeling techniques such as those suitable for
multimedia databases and statistical databases are beyond the scope
Q.6 what are the methods for determining the executive needs?
Answer. An EIS is a tool that provides direct on-line access to relevant information about
aspects of a business that are of particular interest to the senior manager.
Contents of EIS
information about competitors in the news media and databases of public information in
addition to the traditional revenue, cost, volume, sales, market share and quality
applications.
Frequently, EIS implementations begin with just a few measures that are clearly
of interest to senior managers, and then expand in response to questions asked by those
managers as they use the system. Over time, the presentation of this information becomes
stale, and the information diverges from what is strategically important for the
organization. A "Critical Success Factors" approach is recommended by many
management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).
Practitioners such as Vandenbosch (1993) found that:
While our efforts usually met with initial success, we often found that after six
months to a year, executives were almost as bored with the new information as they had
been with the old. A strategy we developed to rectify this problem required organizations
to create a report of the month. That is, in addition to the regular information provided for
management committee meetings, the CEO was charged with selecting a different
indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).
While the above indicates that selection of data for inclusion in an EIS is difficult,
there are several guidelines that help to make that assessment. A practical set of
principles to guide the design of measures and indicators to be included in an EIS is
presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting
measures that reflect organizational objectives, see the section "EIS and Organizational
Objectives."
EIS measures must be easy to understand and collect. Wherever possible, data
should be collected naturally as part of the process of work. An EIS should not add
substantially to the workload of managers or staff.
Sikkim Manipal University – MI0036
EIS measures must encourage management and staff to share ownership of the
organization's objectives. Performance indicators must promote both team-work and
friendly competition. Measures will be meaningful for all staff; people must feel that
they, as individuals, can contribute to improving the performance of the organization.
EIS measures must evolve to meet the changing needs of the organization.
Barriers to Effectiveness
There are many ways in which an EIS can fail. Dozens of high profile, high cost
EIS projects have been cancelled, implemented and rarely used, or implemented and used
with negative results. An EIS is a high risk project precisely because it is intended for use
by the most powerful people in an organization. Senior managers can easily misuse the
information in the system with strongly detrimental effects on the organization. Senior
managers can refuse to use a system if it does not respond to their immediate personal
needs or is too difficult to learn and use.
Sikkim Manipal University – MI0036
Issues of organizational behaviour and culture are perhaps the most deadly
barriers to effective Executive Information Systems. Because an EIS is typically
positioned at the top of an organization, it can create powerful learning experiences and
lead to drastic changes in organizational direction. However, there is also great potential
for misuse of the information. Green, Higgins and Irving (1988) found that performance
monitoring can promote bureaucratic and unproductive behaviour, can unduly focus
organizational attention to the point where other important aspects are ignored, and can
have a strongly negative impact on morale.
Technical Excellence
An interesting result from the Vandenbosch & Huff (1988) study was that the
technical excellence of an EIS has an inverse relationship with effectiveness. Systems
that are technical masterpieces tend to be inflexible, and thus discourage innovation,
experimentation and mental model development.
EIS may not want to spend more on system maintenance and improvements. The time
required to create a masterpiece EIS may mean that it is outdated before it is
implemented.
While usability and response time are important factors in determining whether
executives will use a system, cost and flexibility are paramount. A senior manager will be
more accepting of an inexpensive system that provides 20% of the needed information
within a month or two than with an expensive system that provides 80% of the needed
information after a year of development. The manager may also find that the inexpensive
system is easier to change and adapt to the evolving needs of the business. Changing a
large system would involve throwing away parts of a substantial investment. Changing
the inexpensive system means losing a few weeks of work. As a result, fast, cheap,
incremental approaches to developing an EIS increase the chance of success.
Methodology
those measures. Objectives must be specific and measurable, and data availability is
critical to measuring progress against objectives.
Since there is little use in defining measures for which data is not available, it is
recommended that an EIS project team including technical staff be established at the
outset. This cross-functional team can provide early warning if data is not available to
support objectives or if senior manager's expectations for the system are impractical.
A preliminary EIS project team might consist of as few as three people. An EIS
Project Leader organizes and directs the project. An Executive Sponsor promotes the
project in the organization, contributes senior management requirements on behalf of the
senior management team, and reviews project progress regularly. A Technical Leader
participates in requirements gathering, reviewing plans, and ensuring technical feasibility
of all proposals during EIS definition.
As the focus of the project becomes more technical, the EIS project team may be
complemented by additional technical staff who will be directly involved in extracting
data from legacy systems and constructing the EIS data repository and user interface.
Measures and EIS requirements are best established through a three-stage process.
First, the EIS team solicits the input of the most senior executives in the organization in
order to establish a broad, top-down perspective on EIS requirements. Second, interviews
are conducted with the managers who will be most directly involved in the collection,
analysis, and monitoring of data in the system to assess bottom-up requirements. Third, a
summary of results and recommendations is presented to senior executives and
operational managers in a workshop where final decisions are made.
Interview Format
The focus of the interviews would be to establish all of the measures managers
require in the EIS. Questions would include the following:
What are the five most important pieces of information you need to do your job?
What results do you think the general public expects you to accomplish?