0% found this document useful (0 votes)
3 views

DATABASE 4

Give me free pdf

Uploaded by

akashchauhan8321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DATABASE 4

Give me free pdf

Uploaded by

akashchauhan8321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Warehousing and Data Mining

Last Updated : 19 Sep, 2024



. Datalarge
warehouse.
store
supporting
warehousing
datasets
reporting
The mainis
from
and
the
purpose
various
process
decision-making.
of data
sources
of compiling
warehousing
for efficient
information
is retrieval
to consolidate
intoand
a data
analysis,
and
What is Data Warehousing?
A data warehouse is where data can be collected for mining purposes, usually
with large storage capacity. Various organizations’ systems are in the data
warehouse, where it can be fetched as per usage.

Source 🡪 Extract 🡪Transform 🡪 Load 🡪 Target.

(Data warehouse process)

Data warehouses collaborate data from several sources and ensure data
accuracy, quality, and consistency. System execution is boosted by
differentiating the process of analytics from traditional databases. In a data
warehouse, data is sorted into a formatted pattern by type and as needed. The
data is examined by query tools using several patterns.

Data warehouse improves system performance by separating analytics processing


from transactional databases. Data flows into a data warehouse from the various
databases. A data warehouse works by organizing data into a schema that describes
the layout and type of data. Query tools analyze the data tables using schema.

Data warehouses store historical data and handle requests faster, helping in
online analytical processing, whereas a database is used to store current
transactions in a business process that is called online transaction
processing.

FEATURES OF DATA WAREHOUSES:


• Subject Oriented:
It provides you with important data about a specific subject like suppliers,
products, promotion, customers, etc. Data warehousing usually handles the
analysis and modeling of data that assist any organization to make data-
driven decisions.

• Integrated:
Different heterogeneous sources are put together to build a data warehouse,
such as level documents or social databases.

• Time-Variant:
The data collected in a data warehouse is identified with a specific period.

• Nonvolatile:
This means the earlier data is not deleted when new data is added to the data
warehouse. The operational database and data warehouse are kept separate
and thus continuous changes in the operational database are not shown in
the data warehouse.

APPLICATIONS OF DATA WAREHOUSES:


Data warehouses help analysts or senior executives analyze, organize, and
use data for decision making.

It is used in the following fields:

• Consumer goods
• Banking services
• Financial services
• Manufacturing
• Retail sectors
Advantages of Data Warehousing
• The data warehouse’s job is to make any form of corporate data easier
to understand. The majority of the user’s job will consist of inputting
raw data.
• The capacity to update continuously and frequently is the key benefit
of this technology. As a result, data warehouses are perfect for
organizations and entrepreneurs who want to stay current with their
target audience and customers.
• It makes data more accessible to businesses and organizations.
• A data warehouse holds a large volume of historical data that users
can use to evaluate different periods and trends in order to create
predictions for the future.
Disadvantages of Data Warehousing
• There is a great risk of accumulating irrelevant and useless data. Data
loss and erasure are other potential issues.
• Data is gathered from various sources in a data warehouse. Cleansing
and transformation of the data are required. This could be a difficult
task.
What is Data Mining?
data is extracted and analyzed to fetch useful information. In data mining
hidden patterns are researched from the dataset to predict future behavior.
Data mining is used to indicate and discover relationships through the data.

Data mining uses statistics, artificial intelligence, machine learning systems,


and some databases to find hidden patterns in the data. It supports business-
related queries that are time-consuming to resolve.

It is the process of finding patterns and correlations within large data sets to
identify relationships between data. Data mining tools allow a business
organization to predict customer behavior. Data mining tools are used to build
risk models and detect fraud. Data mining is used in market
analysisandmanagement, fraud detection, corporate analysis, and risk
management

FEATURES OF DATA MINING:


• It is good with large databases and datasets
• It predicts future results
• It creates actionable insights
• It utilizes the automated discovery of patterns
.

Figure: Data Mining process


Advantages of Data Mining
• Data mining aids in a variety of data analysis and sorting procedures.
The identification and detection of any undesired fault in a system is
one of the best implementations here. This method permits any
dangers to be eliminated sooner.
• In comparison to other statistical data applications, data mining
methods are both cost-effective and efficient.
• Companies can take advantage of this analytical tool by providing
appropriate and easily accessible knowledge-based data.
• The detection and identification of undesirable faults that occur in the
system are one of the most astonishing data mining techniques.
Disadvantages of Data Mining
• Data mining isn’t always 100 percent accurate, and if done incorrectly,
it can lead to data breaches.
• Organizations must devote a significant amount of resources to
training and implementation. Furthermore, the algorithms used in the
creation of data mining tools cause them to work in different ways.

Difference Between Data Mining and Data Warehousing


Basis of
Comparison Data Warehousing Data Mining

A data warehouse is a
database system that is
Data mining is the process of
Definition designed for analytical
analyzing data patterns.
analysis instead of
transactional work.

Data is stored
Process Data is analyzed regularly.
periodically.

Data warehousing is the


Data mining is the use of pattern
process of extracting and
Purpose recognition logic to identify
storing data to allow
patterns.
easier reporting.

Data warehousing is Data mining is carried out by


Managing
solely carried out by business users with the help of
Authorities
engineers. engineers.
Basis of
Comparison Data Warehousing Data Mining

Data warehousing is the Data mining is considered as a


Data
process of pooling all process of extracting data from
Handling
relevant data together. large data sets.

Subject-oriented,
AI, statistics, databases,
integrated, time-varying
and machine learning systems are
Functionality and non-volatile
all used in data mining
constitute data
technologies.
warehouses.

Data warehousing is the


process of extracting and
Pattern recognition logic is used in
Task storing data in order to
data mining to find patterns.
make reporting more
efficient.

It extracts data and stores


This procedure employs pattern
it in an orderly format,
Uses recognition tools to aid in the
making reporting easier
identification of access patterns.
and faster.

When a data warehouse Data mining aids in the creation of


is connected with suggestive patterns of key
Examples operational business parameters. Customer purchasing
systems like CRM behavior, items, and sales are
(Customer Relationship examples. As a result, businesses
Basis of
Comparison Data Warehousing Data Mining

Management ) systems, it will be able to make the required


adds value. adjustments to their operations
and production.
Emerging database technologies
_emerging database are rapidly evolving to
meet the demands of modern applications,
large-scale data processing, and real-time
analytics. Here are some key trends and
technologies to watch:

### 1. **Cloud-Native Databases**

- **Serverless Databases**: Fully


managed databases that automatically
scale based on usage, such as AWS Aurora
Serverless and Google BigQuery.

- **Distributed SQL Databases**:


Databases like CockroachDB, YugabyteDB,
and Google Spanner provide high
availability and strong consistency across
global deployments.
### 2. **Graph Databases**

- Tools like **Neo4j**, **TigerGraph**, and


**Amazon Neptune** are gaining traction
for applications requiring complex
relationships, such as social networks,
fraud detection, and recommendation
engines.

### 3. **Multi-Model Databases**

- Systems like **ArangoDB** and


**MarkLogic** support multiple data
models (e.g., document, graph, key-value)
in a single engine, providing flexibility for
diverse use cases.

### 4. **Time-Series Databases**


- Specialized databases like **InfluxDB**,
**TimescaleDB**, and **Prometheus** are
optimized for handling time-series data,
critical for IoT, DevOps, and real-time
analytics.

### 5. **NewSQL**

- Combines the scalability of NoSQL with


the ACID guarantees of traditional SQL
databases. Examples include **Google
Spanner**, **MemSQL (SingleStore)**, and
**CockroachDB**.

### 6. **AI-Enhanced Databases**


- Databases like **Oracle Autonomous
Database** use machine learning to
automate tasks like tuning, security
patching, and scaling, improving
efficiency and reliability.

### 7. **Edge Databases**

- Designed for edge computing


environments, these databases (e.g.,
**Dgraph**, **FaunaDB**) enable low-
latency processing and offline capabilities
close to the data source.

### 8. **Blockchain Databases**

- Distributed ledger technologies such


as **BigchainDB** integrate blockchain
principles with database functionalities
for transparency and immutability.
### 9. **Real-Time Analytics Databases**

- Tools like **Apache Pinot**, **Druid**,


and **ClickHouse** are optimized for real-
time, high-throughput analytics, catering
to modern BI applications.

### 10. **Data Lakehouses**

- Combining the best of data lakes and


warehouses, platforms like **Databricks
Delta Lake** and **Snowflake** allow
seamless processing of structured and
unstructured data for analytics.

11. **Privacy-Preserving Databases**


- Focused on enhancing security and
compliance, these databases use
technologies like homomorphic
encryption and differential privacy to
safeguard sensitive data.

Internet database

An **internet database** is a collection of data that is stored on servers connected to the


internet and can be accessed, updated, or managed remotely through websites, apps, or
software. It acts as a central hub of information that multiple users or systems can interact
with simultaneously, in real time.

### **How Does an Internet Database Work?**

1. **Storage**: Data is organized in a structured way (like tables, documents, or files) on


servers that are always online.

2. **Access**: Users or applications connect to the database through the internet using
websites, apps, or APIs (Application Programming Interfaces).

3. **Management**: A Database Management System (DBMS) ensures the data is stored


securely, retrieved efficiently, and updated accurately.

### **Examples of Internet Databases**

- **Google Search**: The data about billions of websites is stored in an internet database.

- **Online Shopping**: Platforms like Amazon use databases to manage product catalogs,
user accounts, and orders.

- **Social Media**: Facebook and Instagram store data like posts, messages, and user
profiles in internet databases.

- **Streaming Services**: Netflix and Spotify use databases to manage their vast libraries
of movies, TV shows, and music.
### **Key Features**

1. **Remote Access**: Data can be accessed from anywhere in the world using the
internet.

2. **Real-Time Updates**: Changes to the database are instantly reflected for all users.

3. **Multi-User Access**: Many users can access and use the database at the same time.

4. **Scalability**: Databases can grow to store more data as needed.

### **Common Technologies**

Internet databases are powered by technologies like:

- **SQL Databases**: MySQL, PostgreSQL, Microsoft SQL Server.

- **NoSQL Databases**: MongoDB, Firebase, DynamoDB.

### **Why is it Important?**

An internet database is essential for running modern online services and applications. It
allows:

- Quick access to data.

- Sharing of information between users and systems.

- The ability to process and store large amounts of data efficiently.

In simple terms, an internet database is the backbone of most online services, ensuring
data is always available and up-to-date for users around the globe.

Digital libraries data base


A **digital library in a database** is a system
where the resources (such as books, articles,
images, or videos) are stored, organized, and
managed using a database. The database
acts as the backbone of the digital library,
ensuring efficient storage, searching, and
retrieval of information.

Here’s a simple and detailed explanation:

### **What is a Digital Library


Database?**

A **digital library database** is a structured


collection of data that holds all the content
of a digital library. It organizes information
like titles, authors, publication dates, and file
formats, making it easy to search and
access.
For example:

- If you search for a book in a digital library,


the database finds the file and shows it to
you.

- If you browse by category (e.g., "science


fiction"), the database filters and lists all
related items.

### **How Does a Digital Library Database


Work?**

1. **Storing Data**:

- The database stores digital files (e.g.,


PDFs, images, videos) along with metadata
(details about the files like author, subject,
and keywords).

2. **Organizing Data**:
- The database organizes resources into
categories or collections for easy navigation
(e.g., by topic, author, or publication date).

3. **Searching**:

- Users search for resources by entering


keywords. The database quickly matches
the keywords with its stored metadata to
find relevant results.

4. **Retrieving Data**:

- Once a user selects a resource, the


database retrieves the digital file for viewing
or downloading.

### **Key Components of a Digital Library


Database**

1. **Metadata**:
- Descriptive information about resources,
such as title, author, publication year, file
type, and subject.

2. **Indexing**:

- The database indexes resources to make


searching faster and more efficient.

3. **Search Engine**:

- A tool that helps users search and filter


the library’s database.

4. **Storage System**:

- The actual digital files are stored on


servers or cloud storage, while the database
keeps track of where each file is located.

**Examples of Digital Library Databases**

1. **Online Academic Libraries**: Store


research papers, journals, and theses (e.g.,
JSTOR, IEEE Xplore).
2. **E-book Databases**: Collections of
eBooks like Kindle Library or Project
Gutenberg.

3. **Institutional Repositories**: Databases


for universities or organizations to store
research outputs (e.g., arXiv).

---**Advantages of Using Databases in


Digital Libraries**

1. **Efficient Search**: Users can quickly find


resources using filters or keywords.

2. **Scalability**: Databases can grow to


accommodate millions of resources.

3. **Multi-User Access**: Many users can


access the same resource simultaneously
without issues.
4. **Data Security**: Databases ensure that
digital files are stored securely and are
backed up.

5. **Easy Updates**: Administrators can


easily add, update, or remove resources in
the databa

**Why Are Databases Important in Digital


Libraries?**

- **Organization**: They keep the digital


library well-structured and easy to navigate.

- **Accessibility**: Databases ensure that


users can access the information they need
anytime.

- **Speed**: They allow instant retrieval of


resources, even for large collections.

- **Automation**: Databases automate tasks


like cataloging, searching, and managing
resources.
Multimedia database
A **multimedia database** is a type of
database that stores and manages
multimedia content such as images, videos,
audio, animations, and text. It helps
organize, retrieve, and share these files
easily and efficiently.

Here’s a detailed but **simple** explanation:

**What is a Multimedia Database?**

A multimedia database is a system


designed to:

- Store large files like pictures, songs, or


videos.

- Organize these files using information


(metadata) like names, dates, or categories.

- Allow users to search and access files


quickly.
It’s like a digital library, but instead of just
books, it stores various types of media.

**What Types of Multimedia Does It


Store?**

1. **Images**: Pictures, graphics, or


illustrations (e.g., JPG, PNG).

2. **Audio**: Music, sound effects, or voice


recordings (e.g., MP3, WAV).

3. **Videos**: Movies, video clips, or


animations (e.g., MP4, AVI).

4. **Text**: Subtitles, captions, or


descriptions (e.g., TXT, DOC).

5. **Mixed Media**: A combination of images,


videos, and audio in presentations or
interactive files.
**How Does It Work?**

1. **Storage**:

- Files are saved in a secure location like a


server or cloud.

- Metadata (information about the files,


like "file name" or "date created") is saved in
the database for easy searching.

2. **Organization**:

- Files are grouped or tagged by categories


such as type, subject, or keywords.

- Example: A music database might sort


songs by artist or genre.
3. **Search and Retrieval**:

- Users can find files by searching for


keywords or using filters like file type or
date.

- Advanced multimedia databases can


even identify content, like finding images
with specific objects.

4. **Access**:

- Users view, download, or stream files


directly from the database.

*Examples of Multimedia Databases**

1. **Google Photos**: Stores and organizes


images and videos.

2. **YouTube**: Manages billions of videos


with metadata for easy searching.
3. **Spotify**: A database for audio files,
playlists, and artists.

4. **Netflix**: Stores and streams movies and


TV shows.

- **Advantages**

1. **Easy Organization**: Keeps multimedia


files in a neat and searchable format.

2. **Quick Access**: Users can find what they


need in seconds.

3. **Multi-User Support**: Many people can


access the same content at the same time.

4. **Scalability**: Can grow to store millions


of files

**Challenges**
1. **Large Files**: Multimedia files take up a
lot of storage space.

2. **Complex Searches**: Searching for


specific content in videos or images is
harder than searching text.

3. **Expensive Storage**: High-quality


multimedia requires advanced storage
solutions.

MULTIMEDIA DATABASE

A **mobile database** is a type of database system


designed to work with mobile computing devices like
smartphones, tablets, and laptops. It is optimized to
function efficiently on these devices, which often have
limited processing power, memory, and storage compared to
desktop or server systems. Here’s a detailed overview:

-# 1. **Purpose of a Mobile Database**


- To store, retrieve, and manage data locally on mobile
devices.
- To synchronize data with a central server or cloud when
connectivity is available.
- To enable offline functionality for mobile applications.

- 2. **Key Features**
- **Lightweight:** Mobile databases are designed to use
minimal resources, making them suitable for devices with
constrained hardware.
- **Synchronization:** They support synchronization with a
central database to ensure data consistency across devices.
- **Offline Access:** They allow users to access and update
data without requiring an active internet connection.
- **Cross-platform Support:** Many mobile databases are
designed to work across various operating systems like
Android, iOS, and Windows.

-# 3. **Types of Mobile Databases**


- **Embedded Databases:** Installed directly on the mobile
device, used by apps for local data storage (e.g., SQLite).
- **Cloud Databases:** Accessible via the internet, data is
stored on remote servers (e.g., Firebase, AWS DynamoDB).
- **Hybrid Databases:** Combine local storage and cloud
synchronization (e.g., Couchbase Mobile).

--4. **Examples of Mobile Databases**


1. **SQLite**
- Open-source, lightweight, and serverless.
- Most commonly used mobile database.
- Integrated into Android and iOS.

2. **Realm**
- Designed specifically for mobile devices.
- High performance and offline-first approach.
- Supports complex queries and real-time updates.

3. **Firebase Realtime Database**


- Cloud-hosted, NoSQL database by Google.
- Syncs data across all connected clients in real-time.

4. **Couchbase Lite**
- Embedded NoSQL database with synchronization
capabilities.
- Ideal for apps needing strong offline functionality.

---5. **Components of a Mobile Database System**


- **Database Engine:** Manages storage, queries, and data
manipulation.
- **API/SDK:** Allows developers to interact with the
database easily.
- **Synchronization Module:** Ensures data consistency
between local storage and remote/cloud databases.
# 6. **Advantages**
- **Portability:** Enables apps to function across various
devices and environments.
- **Efficiency:** Optimized for small devices with limited
resources.
- **Reliability:** Supports offline capabilities and data
recovery mechanisms.

- 7. **Challenges**
- **Resource Limitations:** Mobile devices have limited
storage and processing capabilities.
- **Security:** Ensuring data is encrypted and secure during
storage and transmission.
- **Synchronization Issues:** Managing conflicts when
multiple devices update the same data.

Spatial database
A **spatial database** is a database designed to store,
query, and manage spatial data, which is information related
to the position, shape, and size of objects in space. Spatial
data is used to represent geographic objects, such as roads,
rivers, buildings, or even entire regions.

### Key Concepts of Spatial Databases

1. **Spatial Data Types**: These represent geographic


objects in the database. Common spatial data types
include:
- **Point**: Represents a single location (e.g., a city).
- **Line**: Represents a series of connected points (e.g., a
road or river).
- **Polygon**: Represents an area defined by a closed
shape (e.g., a country or lake).

2. **Spatial Indexing**: Efficient spatial querying is achieved


using specialized indexes, such as **R-trees** or **Quad-
trees**, which allow the database to quickly find objects
based on their spatial properties.

3. **Coordinate Systems**: Spatial databases use


coordinate systems (e.g., latitude and longitude) to
represent the position of geographic objects.

4. **Spatial Queries**: These are queries used to interact


with spatial data. Examples include:
- **Containment**: Does one object contain another (e.g.,
does a city contain a park)?
- **Intersection**: Do two objects intersect (e.g., does a
river cross a road)?
- **Distance**: How far apart are two objects?
- **Within a range**: Find objects within a specific area or
distance.
5. **Spatial Operations**: These involve operations like:
- **Buffering**: Creating a region around an object (e.g., a
10 km buffer around a lake).
- **Union/Intersection**: Combining or finding common
areas between spatial objects.
- **Geocoding**: Converting addresses into geographic
coordinates.

### Example of Spatial Data Usage


- **Mapping**: Creating digital maps with points for
landmarks, lines for roads, and polygons for regions.
- **Geospatial Analytics**: Analyzing patterns like traffic
flow, environmental changes, or urban planning.

You might also like