Trends in Data Warehousing and
Business Intelligence
Trends in Data Warehousing
Data Warehousing is Becoming
Mainstream
In the early stages, four significant factors drove many
companies to move into data warehousing:
Fierce competition
Government deregulation
Need to revamp internal processes
Imperative for customized marketing
Walmart vs. Amazon.com
Walmart is the US company most quoted for the successful
application & deployment of Data Warehousing
technology.
Walmart filed lawsuit against Amazon.com for its unlawful
way of pirating its DW technology by hiring away its DA
personnel by offering hefty stock option to these people.
Significant Factors
These significant factors reflect the new trends in data
warehousing:
Multiple Data Types
Data Visualization
Parallel Processing
Query Tools
Browser Tools
Data Fusion
Multidimensional Analysis
Agent Technology
E-Business- ERP, KM, CRM
Decision Making
and Data Warehousing
“A data warehouse is the data, processes, tools, and facilities to
manage and deliver complete, timely, accurate, and
understandable business information to authorized individuals
for effective decision making.”
Structured Data
– Includes traditional relational databases
– Typically internal and enterprise-owned
– Predetermined
Unstructured Data
– Includes articles, reports, images, and videos
– Utilizes external data and expert opinion
– Ad hoc
Decision Making
and Data Warehousing
Management Systems
Extend relational databases to store and support multimedia
User-defined types (UDT) and functions (UDF) in SQL-3
Specialized Servers
Used for data which is incompatible with relational databases (e.g.,
Streaming video servers)
Objects may be linked to a relational database
Search Engines
Query by Image Content (shape, color, texture, etc)
Text retrieval on free-text documents
Audio and video searching
Decision Making and Data
Warehousing
~ The trend is toward unstructured data and ad hoc warehouses.~
~ Trend toward multimedia. ~
Types of Decision Support Tools
Data Inquiry
A request for a set of data based on some search criteria
Data Interpretation
Manipulation and visualization of a set of data (statistical analysis)
Multidimensional Analysis (OLAP)
n-dimensional spreadsheet analysis
Information Discovery
Pattern recognition, trends
Browsers
Search metadata catalogs
Search information object lists
Launch analysis tools
File-based Processing
Types of Decision Support Tools
~ Trend toward utilization of the Web, facilitated by Java. ~
Data Warehouse Architectures
Single Level
Decision support tools access operational data directly
Feasible only with “clean” data
Valid for unstructured data
Two Level Reconciled
Scrubbed operational data supporting ad hoc queries
Two Level Derived
Summarized data
Three Level
Maintains both scrubbed operational data, and summarized data.
Data Warehouse Architectures
~ Trend toward multidimensional data. ~
Data Stores and Access Enablers
Specialized Multidimensional Databases
Data is peregrinated and loaded into multidimensional databases
Long loading times but quick response
Relational-like Stores
Indexing is used to proved pseudo-multidimensional functionality
Relational Data Stores
An extra semantic layer generates multidimensional data on the fly
Hybrids
Details are stored in a traditional relational format
A subset is cached in a multidimensional data structure
Database Management System
(DBMS)
Data Stores and Access Enablers
~ Trend toward multidimensional data.
Metadata
Integrated Components
All components (sources, stores, etc) use a common metadata
repository to maintain their metadata
Standardized Metadata Interchange
Components keep their own metadata
Components use a common interchange information model and
syntax to share metadata
Synchronized Metadata Interchange
Metadata changes are updated automatically across all components
Building of Business Metadata
Manually entered, free-text, plain language descriptions
Metadata
~ Trend toward better metadata, exchanged between systems. ~
Middleware - Gluing the Warehouse
Together
Definition: software that shields users and developers
from differences in services and resources used by
applications
Data warehouses often have heterogeneous databases,
operating systems, networks, hardware, applications
Business Issues for Middleware
Role of middleware
Assist developer in data extraction/transformation and
populating DW
Assist business user in accessing DW
Therefore needed at different points in life cycle
Types
Copy management: data extraction, transformation,
replication, and propagation
Gateways: DB and independent gateways
Program-to program: RPCs, TP monitors, ORBs
Message-oriented
Data Quality
Preprocessing Ownership
Source application owners know their data
Warehouse owners still must integrate the entire system
Automated Preprocessing Tools
Specialized packages
Generalized tools using pattern processing, lexical analysis, and
statistical matching to reconcile a wide range of data sources
Custom programming
Reliability and Credibility of External Data
Quality ratings
Posted statistical meta-information (sample size, randomness, etc)
Data Quality
Trend toward better understanding of data quality. ~
Significant Trends- Multiple Data Types
Image Spatial
Structured Numeric
Video
Data Warehouse
Structured Text
Repository
17
Audio
Unstructured Documents
Significant Trends- Data
Visualization
More Chart Types-Pie chart, scatter plot
Interactive Visualization
Chart Manipulation
Drill Down
Significant Trends- Parallel
Processing
Aims to solve decision-support problems using
multiple nodes working on the same problem.
Performs many database operations simultaneously,
splitting individual tasks into smaller parts so that
tasks can be spread across multiple processors.
Parallel DBMSs must be capable of running parallel
queries, parallel data loading, table scanning, and data
archiving, and back up.
Significant Trends- Parallel
Processing
Shared memory architecture (SMP)
– All the servers share all the data
Shared nothing architecture (MPP)
– Each server has its own partition of data
Significant Trends- Query Tools,
Browse Tools
Flexible Presentation –online results and report generator
Aggregate Awareness
Crossing Subject Areas
Multiple Heterogeneous Sources
Integration
Overcoming SQL Limitations
Data Fusion
Significant Trends- Integrating ERP
and Data Warehouse
Option 1: Companies implement the data warehouse
solutions of the ERP vendor with the currently available
functionality and await the enhancements.
Option 2: Companies implement customized data
warehouse and use third-party tools to extract data from
the ERP datasets. Retrieving and loading data from the
proprietary ERP datasets is not easy.
Option 3: It is a hybrid approach that combines the
functionalities provided by the vendor’s data warehouse
with additional functionalities from third-party tools.
Significant Trends- Integrating KM
and Data Warehouse
What’s KM?
It is a systematic process for capturing, integrating,
organizing, and communicating knowledge
accumulated by employees.
It is a vehicle to share corporate knowledge so that
employees may be more more effective and be
productive in their work.
A knowledge management system must store all such
knowledge in a knowledge repository.
Significant Trends- Integrating KM
and Data Warehouse
A specific corporate scenario:
Sales have dropped in the South region.
Your marketing VP is able to discern this from your data warehouse by
running some queries and doing some preliminary analysis. If he or she
has access to a document prepared by an analyst explaining why the
sales are low and suggesting remedial action.
Knowledge must be linked to the sales result to provide context to the
sales numbers from the data warehouse.
Significant Trends- Integrating KM
and Data Warehouse
An airplane sales scenario: The following information is essential
For a successful pitch for airplane sales.
Model configuration
Production schedule (Delivery schedule)
Part replacement
Warranty
Knowledge obtained from the knowledge management system
can provide context to the information received from the data
warehouse to understand the story behind the above
information.
Trends in Business Intelligence
Business Intelligence (BI)
Comprehensive, cohesive, integrated set of tools and
processes
Captures, collects, integrates, stores, and analyzes data
Purpose - Generate and present information to
support business decision making
Allows a business to transform:
Data into information
Information into knowledge
Knowledge into wisdom
Business Intelligence Framework
Business Intelligence Tools
Dashboards and business activity monitoring
Dashboards: Shows key business performance
indicators in a single integrated view
Portals: Integrate data using web browser from
multiple sources into a single webpage
Data analysis and reporting tools
Data-mining tools
Data warehouses (DW)
OLAP tools and data visualization
Practices to Manage Data
Master data management (MDM): Collection of
concepts, techniques, and processes for identification,
definition, and management of data elements
Governance: Method of government for controlling
business health and for consistent decision making
Key performance indicators (KPI): Numeric or
scale-based measurements that assess company’s
effectiveness in reaching its goals
Practices to Manage Data
Data visualization: Abstracting data to provide
information in a visual format
Enhances the user’s ability to efficiently comprehend
the meaning of the data
Techniques
Pie charts and bar charts
Line graphs
Scatter plots
Gantt charts
Heat maps
Reporting Styles of a Modern BI
System
Monitoring and
Advanced reporting
alerting
Advanced data
analytics
Business Intelligence Benefits
Improved decision making
Integrating architecture
Common user interface for data reporting and analysis
Common data repository fosters single version of company data
Improved organizational performance
Business Intelligence Evolution
Business Intelligence Evolution
Evolution of BI Information
Dissemination Formats
Business Intelligence Technology
Trends
Data storage improvements
Business intelligence appliances
Business intelligence as a service
Big Data analytics
Personal analytics
Thank You