0% found this document useful (0 votes)
20 views57 pages

Bi L3

The document provides an overview of various types of databases, including operational, commercial, open-source, hierarchical, network, relational, object, centralized, distributed, and cloud databases, highlighting their characteristics, advantages, and disadvantages. It discusses the importance of understanding database types for efficient business operations and introduces concepts like ACID and BASE properties, as well as the CAP theorem related to distributed databases. Additionally, it emphasizes the challenges posed by Big Data and the necessity for new data management models like NoSQL.

Uploaded by

Houcem Damak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views57 pages

Bi L3

The document provides an overview of various types of databases, including operational, commercial, open-source, hierarchical, network, relational, object, centralized, distributed, and cloud databases, highlighting their characteristics, advantages, and disadvantages. It discusses the importance of understanding database types for efficient business operations and introduces concepts like ACID and BASE properties, as well as the CAP theorem related to distributed databases. Additionally, it emphasizes the challenges posed by Big Data and the necessity for new data management models like NoSQL.

Uploaded by

Houcem Damak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Business Intelligence and Database

Administration
• A database is a collection of information stored within a
computer.
• Databases are utilized for everything from storing pictures on
your computer to buying items online and analyzing the stock
market.

Administration
Business Intelligence and Database
• Databases allow computers to store essential information in
an organized , structured and easily searchable way.
• It exists many different types of databases, each with its
strengths and weaknesses based on how they are designed
and for what purpose they will be used.
• It's exclusively important for businesses and developers to
understand the different types of databases, to ensure that 2
they have the most efficient and adequate setup.
Business Intelligence and Database
3

Administration
Business Intelligence and Database
4

Administration
Operational database
• The purpose of an operational database is to allow users to modify data in
real time.
• Operational databases are critical in business analytics and data
warehousing.
• They can be set up either as relational databases or NoSQL, depending on
needs.

Administration
Business Intelligence and Database
• Operational databases allow to add, edit and remove data at any moment.
 The type of database which creates and updates the database in real-time.
 It is basically designed for executing and handling the daily data operations
in several businesses.
• For example, An organization uses operational
databases for managing per day transactions.

5
Commercial database

• A commercial database is designed by a commercial business

• Businesses develop rich-feature databases, which they then

Administration
Business Intelligence and Database
sell to their customers.

• Commercial databases can vary in terms of composition or


technology they use.

• The defining trait of commercial databases is having users pay


to use them, unlike open source databases.
6
Open-source database

• An open-source database is designed for the public to use for


free.
• Unlike commercial databases, users can download or sign up
for open source databases without paying a fee.

Administration
Business Intelligence and Database
• The term "open source" refers to a program in which users can
see how it was written and constructed and are free to make
their own changes to the program.
• Open-source databases are typically much cheaper than
commercial databases, but they can also lack some of the
more advanced features found in commercial databases.
7
Business Intelligence and Database
8

Administration
9
• The hierarchical model is historically the first form of data
modelling on a computer system.
• It is the type of database that stores data in the form of
parent-children relationship nodes.
• Here, it organizes data in a tree-like

Administration
Business Intelligence and Database
structure.

Limitations:
• only supports 1 to n relationships
• It makes data modelling difficult
• A parent can have multiple children
11
• A child can only have a single parent
• It is the database that typically follows the network data
model. Here, the representation of data is in the form of
nodes connected via links between them.
• Unlike the hierarchical database, it allows each record to have
multiple children and parent nodes to form a generalized

Administration
Business Intelligence and Database
graph structure.
• Represents one to one as many to many relationships.

Limitations:
• Very complex system
• Lack of structural independence
• Structural changes to the database
are very difficult 12
• A relational database uses SQL for storing, manipulating, as
well as maintaining the data
• Based on the relational data model, which stores data in the
form of rows(tuple) and columns(attributes), and together
forms a table (relation).

Administration
Business Intelligence and Database
• Each table in the database carries a key that makes the data
unique from others.
• Relational databases are often preferred when you are
concerned about the integrity of your data, or when you're
not particularly focused on scalability.

Examples: MySQL, Microsoft SQL Server, Oracle, etc. 13


• The relationship model has developed our vision of what is
and should be a database, for decades.

Characteristics:

Administration
Business Intelligence and Database
• logical and physical separation,
• declarative language,
• strong data structuring,
• tabular representation,
• defined constraints,
• strong transactional consistency, etc.
14
There are four commonly known properties of a relational model known as ACID
properties, where:

• “A” means Atomic: This ensures the data operation will complete either with
success or with failure. It follows the 'all or nothing' strategy.
example, a transaction will either be committed or will abort.

Administration
Business Intelligence and Database
• “C” means Consistent: If we perform any operation over the data, its value
before and after the operation should be preserved.
example, the account balance before and after the transaction should be correct,
i.e., it should remain conserved.

• “I” means Isolated: There can be concurrent users for accessing data at the
same time from the database. Thus, the isolation between the data must be
respected.
example, when multiple transactions occur at the same time, one transaction effects
should not be visible to the other transactions in the database.
15
• “D” means Durable: It ensures that once the operation is completed and the
data validated, the data changes must remain permanent.
• Financial institutions will almost exclusively use ACID
properties for relational databases where money transfers
depend on the atomic nature of ACID.

• An interrupted transaction which is not immediately removed

Administration
Business Intelligence and Database
from the database can cause a lot of issues  Money could be
debited from one account and, due to an error, never credited
to another.

• One safe way to make sure your database is ACID compliant is


to choose a relational database management system.
• Examples of RDBMS: MySQL, PostgreSQL, Oracle, SQLite, and 16
Microsoft SQL Server.
• An object database is managed by an object oriented database
management system (OODBMS).
• The database combines object-oriented programming
concepts with relational database principles:
• Objects are the basic building block of an instance of a class, where

Administration
Business Intelligence and Database
the type is either built-in or user-defined.
• Classes provide a schema or plan for objects, defining behavior.
• Methods determine the behaviour
of a class.
• Pointers help access elements of
an object database and establish
relations between objects.
17
• GemStone/S is an object database system based on Smalltalk – an
object-oriented programming language influenced by Java.

• ObjectDatabase++ is a real-time embeddable object database


designed for server-side applications. The required external

Administration
Business Intelligence and Database
maintenance is minimal.

• ObjectDatabase++ supports:
• Multi-process with multi-threaded server applications.
• Full transaction control.
• Real-time recovery.
• C++ related languages, VB.NET
as well as C#. 18
Advantages
• Complex data and a wider variety of data types compared to MySQL data
types.
• Easy to save and retrieve data quickly.
• Seamless integration with object-oriented programming languages.
• Easier to model the advanced real world problems.

Administration
Business Intelligence and Database
• Extensible with custom data types.

Disadvantages
• Not as widely adopted as relational databases.
• No universal data model.
• Lacks theoretical foundations and standards.
• Does not support views.
• High complexity causes performance issues.
• An adequate security mechanism and access rights to objects do not 19
exist.
• It is the type of database that stores data at a centralized
database system.
• It allows the users to access the stored data from different
locations through several applications.
• These applications contain the authentication process to let

Administration
Business Intelligence and Database
users access data securely.
• An example of a centralized database can be central library
that carries a central database of each library in a
college/university.

20
Advantages :
• It decreases the risk of data management, i.e., manipulation of data
will not affect the core data.
• Data consistency is maintained as it manages data in a central
repository.
• It provides better data quality, which enables organizations to

Administration
Business Intelligence and Database
establish data standards.
• It is less costly because fewer vendors are required to handle the
data sets.

Disadvantages:
• The size of the centralized database is large, which increases the
response time for fetching the data.
• It is not easy to update such an extensive database system.
• If any server failure occurs, entire data will be lost, which could be a
huge loss. 21
• A distributed database is one that is spread out over multiple
devices  data is distributed among different database
systems of an organization.

• These database systems are connected via communication

Administration
Business Intelligence and Database
links  It helps the end-users to access the data easily.

• Rather than having all information stored on a single device,


distributed databases will operate across multiple machines,
such as different computers within the same location or across
a network.

• Examples: Apache Cassandra, HBase, Ignite, etc. 22


• We can further divide a distributed database system into:
• Homogeneous DDB: Those database systems which execute on
the same operating system and use the same application process
and carry the same hardware devices.
• Heterogeneous DDB: Those database systems which execute on

Administration
Business Intelligence and Database
different operating systems under different application
procedures, and carries different hardware devices.

23
Advantages:
• Increased speed
• Better reliability : One server failure will not affect the entire
data set.

Administration
Business Intelligence and Database
• Ease of expansion: Modular development is possible in a
distributed database, i.e., the system can be expanded by
including new computers and connecting them to the
distributed system.

24
• A cloud database is one that runs over
the Internet.

Administration
Business Intelligence and Database
• The data is stored on a local hard drive
or server, but the information is available online.
• Makes it easy to access your files from anywhere, as long as you
have an Internet connection.
• To use a cloud database, users can either build one themselves
or pay for a service to store their data.
• Encryption is an essential part of any cloud database, as all 25
information needs to be protected as it is transmitted online 
Security
• Provides users with various cloud computing services (SaaS,
PaaS, IaaS, etc.) for accessing the database.

• There are numerous cloud platforms like:

Administration
Business Intelligence and Database
• Amazon Web Services (AWS)
• Microsoft Azure
• Kamatera
• PhonixNAP
• ScienceSoft
• Google Cloud SQL, etc.

26
Business Intelligence and Database
27

Administration
• ACID properties are not applicable in a distributed context
such as NoSQL.
• Indeed, let us take the example of a transaction of five
operations (read/write): this involves synchronization
between five servers to ensure atomicity, consistency and

Administration
Business Intelligence and Database
isolated  translates into unrealised transactions (ongoing
and Competition).
• The problem gets worse when you distribute the data because
you’re going to have to replicate each data  because if a
server crashes, it must be possible to find all the data present
on this server, so we fail to replicate.
• All updates will have to be synchronized with all replica data! 28
• The CAP theorem states that it is impossible to achieve both
consistency and availability in a partition tolerant distributed
system (i.e., a system which continues to work in cases of
temporary communication breakdowns).

Administration
Business Intelligence and Database
• The fundamental difference between ACID and BASE database
models is the way they deal with this limitation.
• The ACID model provides a consistent system.
• The BASE model provides high availability (distributed systems)

29
• BASE properties were proposed to characterize NoSQL
databases:
 Basically Available: whatever the database load (data/queries),
the system guarantees a data availability rate

Administration
Business Intelligence and Database
 Soft-state: The database may change when updating or
adding/deleting servers.
 Eventually consistent: The NoSQL database is not consistent at all
times. Eventually, the base will reach a coherent state

30
• Just as SQL databases are almost uniformly ACID compliant,
NoSQL databases tend to conform to BASE principle

• Most popular NoSQL solutions :

Administration
Business Intelligence and Database
• MongoDB
• Cassandra
• Redis
• Amazon DynamoDB
• Couchbase

31
• 2000, Eric A. Brewer formalized the CAP theorem as a belief
from theoretical computer science about distributed data
stores that claims, in the event of a network failure on a
distributed database, it is possible to provide either
consistency or availability—but not both.

Administration
Business Intelligence and Database
• CAP is based on 3 fundamental properties to characterize
databases (relational, NoSQL, and others):
• Consistency: A data has only one visible state regardless of the
number of replicates.
• Availability: As long as the system is running (distributed or not),
the data must be available
• Partition Tolerance (Distribution): Regardless of the number of
servers, any query must provide a correct result 32
• In any database, we respect at most 2 properties among
consistency, availability and distribution :

 CA (Consistency-Availability)

Administration
Business Intelligence and Database
 CP (Consistency-Partition)

 AP (Availability-Partition)

33
A. Consistency-Availability (CA): represents the fact that
during concurrent operations on the same data, two queries
return the new version (v2) and without waiting time  This
combination is only possible in the context of transactional
databases such as RDBMS.

Administration
Business Intelligence and Database
B. Consistency-Partition Tolerance (CP): offers to distribute
data across multiple servers with error tolerance
(replication). At the same time, it is necessary to check the
consistency of the data by guaranteeing the returned value
despite competitive updates. Managing this consistency
requires a replica synchronization protocol, introducing
latency delays in response times (L1 and L2 wait for
synchronization to see v2). 34
 This is the case of NoSQL database MongoDB.
C. Availability-Partition Tolerance (AP): works to provide a fast
response time while distributing data and replicas. The
updates are asynchronous on the network, and the data is
"Eventually Consistent" (L1 sees version v2, while L2 sees
version v1)

Administration
Business Intelligence and Database
35
Business Intelligence and Database
36

Administration
• Data comes from all sources: internet, sensors, satellites.
• As well data types are various: text, sound, video...
• The increasing computerization of all kinds of processing data
involves an exponential multiplication of this volume which is
now counted in petabytes

Administration
Business Intelligence and Database
• The management and processing of these volumes of data
are considered a new challenge for Big Data,
• it is seen as a new IT challenge, and traditional, highly
transactional relational database engines seem completely
outdated.
• Moving to a new data management model has become a
necessity  NoSQL 37
• The rise of the major platforms and Web applications (Google,
Facebook, Twitter, Linkedln, Amazon, ...) has led to:
 A considerable amount of data is managed by these
applications requiring data distribution and processing on many
servers: Data Centers

Administration
Business Intelligence and Database
 This data is often associated with complex and
heterogeneous objects
 Limitations of traditional (relational and transactional)
SQL-based DBMS

38
New approaches to data storage and management have
emerged:
• Enabling greater scalability in highly distributed contexts
• Allowing management of complex and heterogeneous objects
sounds to have previously declared the set of fields

Administration
Business Intelligence and Database
representing an object
• Grouped behind the term NoSQL (Not Only SQL, proposed by
Carl Strozzi), not replacing Relational DBMS but
supplementing them by filling in their weaknesses.

39
• Non-SQL/Not Only SQL is a type of database that is used for
storing a wide range of data sets.
• A NoSQL database has a hierarchy similar to a file folder
system and the data within it is unstructured, or non-
relational.

Administration
Business Intelligence and Database
• This lack of structure allows them to process larger amounts
of data at speed and makes it easier to expand in the future.
• Cloud computing regularly makes use of NoSQL databases.
• It is not a relational database as it stores data not only in
tabular form but in several different ways.
• It came into existence when the demand for building modern
applications increased. 40
• NoSQL present a wide variety of database technologies.
• It came to support the idea of Not Only SQL.
• It’s divided into the following four types:

Administration
Business Intelligence and Database
41
• It enables good productivity in the application development as
it is not required to store data in a structured format
 NoSQL caters for specific application needs

Administration
Business Intelligence and Database
• It is a better option for managing and handling large data sets.

• It provides high scalability.

• Users can quickly access data from the database through key-
value.
42
Business Intelligence and Database
43

Administration
Four main types:
• Key-value storage: It is the simplest type of database storage
where it stores every single item as a key (or attribute name)
holding its value, together.
• Document-oriented Database: A type of database used to

Administration
Business Intelligence and Database
store data as JSON-like document. It helps developers in
storing data by using the same document-model format as
used in the application code.
• Graph Databases: It is used for storing fast amounts of data in
a graph-like structure. Most commonly, social networking
websites use the graph database.
• Wide-column stores: It is similar to the data represented in
relational databases. Here, data is stored in large columns
44
together, instead of storing in rows.
• The aim of the key-value family is efficiency and simplicity.
• A key-value system acts as a huge hash table distributed over
the network.
• Everything is based on the key/value combination:

Administration
Business Intelligence and Database
• The key identifies the data in a unique way and allows it to be
managed.
• The value contains any type of data.
• Having everything and anything implies that there is no layout
or structure for storage.

45
• From a database point of view, there is no possibility of
exploiting or controlling the structure of the data and, indeed,
no SQL language.
• In itself it is not a problem if you know what you are looking
for (the key) and you directly manipulate the value.

Administration
Business Intelligence and Database
• Only CRUD operations can be used:
• Create (key,vaIue)
• Read (key)
• Update (key,vaIue)
• Delete (key)

46
Examples of this type of model:
• Redis (VMWare): Vodafone, Trip Advisor, Nokia, Samsung,
Docker
• SimpleDB (Amazon)

Administration
Business Intelligence and Database
Application types could fit this solution:
• Real-time fraud detection,
• loT, e-commerce,
• fast transactions,
• log files, chat...
47
• Document-oriented databases are probably most similar to
what can be done in a traditional database for complex
queries.
• The purpose of this storage is to manipulate documents
containing information with a complex structure (types, lists,

Administration
Business Intelligence and Database
nesting).
• It is based on the principle of Key/vaIue, but with an extension
on the fields that make up this document.

48
Business Intelligence and Database
49

Administration
• Advantages:
• Structured approach to each value, thus forming a document.
• Rich query languages allowing complex manipulations on each
attribute of the document (and sub-documents) as well as in a
traditional database, while moving to the scale in a distributed
context.

Administration
Business Intelligence and Database
• MongoDB (NongoDB): ADP, Adobe, Bosch, Cisco, eBay, Electronic Arts,
Expedia,
• CouchBose (Apache, Jadoop): AOL, AT&T, Comcast, Disney, PayPal,
Ryanair
• DynomoDB (Amazon) BMW, Dropcam, Duolingo, Supercell, Zynga

• Applications examples:
• digital libraries, product collections, software repositories,
multimedia collections or even managing user histories on social
50
networks.
• Traditionally, data is represented online, representing the set of
attributes.
• Column-oriented stockade changes this paradigm by focusing on
each attribute and distributing them.
• You can then focus on one or more columns, without having to
process unnecessary information (other columns)
• This solution is suitable for processing on columns such as

Administration
Business Intelligence and Database
aggregates (counting, averages, co-occurrence...).
• Suitable for large analytical calculations.

Examples:
• BigTable (Google)
• HBase (Apache, Hadoop)
• Spark SQL (Apache)
• Elasticsearch (elastic) search engine
Some examples of applications: 51
• Counting (online voting),
• product search in a category,
• large-scale reporting.
Business Intelligence and Database
52

Administration
• Graph databases are databases that focus equally on the data
and the connections between them.
• In this database, data is not constricted to predefined models.
Most other databases can find connections between data
when you run a search.

Administration
Business Intelligence and Database
• With a graph database, these connections are stored inside
the database right alongside the original data.
• Advantages:
• more efficient and faster
database when your primary
goal is to manage the
Connections between your 53
data.
• The first three NoSQL families do not address the problem of
correlations between elements.
• let’s take the example of a social network: in some cases, it becomes
difficult to calculate the distance between people who are not directly
connected.
• And this is the kind of approach that Graph-oriented bases solve.

Administration
Business Intelligence and Database
• In the graph-oriented database, stored data are: nodes, links and
properties on these nodes and links.
• The queries are based on the management of paths, propagations,
aggregations and even recommendations.

• OrientDB (Apache): Comcast, Warner Music Group, Cisco, Sky, United


Nations,
• FlockDB (twitter): Twitter

• Applications based on graph-oriented bases: networks social


(recommendation, shorter path, cluster...), web social (Linked Data). 54
Difference between SQL and NoSQL
SQL NOSQL
Database Types SQL database is one type (SQL). It has four types, Key-Value
Each RDBMS system has minor pair, Column Based, Document
differences. Type and Graph database

Schemas predefined before inserting NoSQL is a schema-less


any data. If we want to change database. In other words we

Administration
Business Intelligence and Database
the data type of any field then are not required to define the
the database must be altered. structure. Dissimilar data can
SQL Server, My SQL, Oracle, be stored together.
Postgres and SQL Compact. MonhoDB, CouchDB,
Cassandra , Neo4j and HBase
Replication Replication is a difficult task NoSQL supports automatic
for SQL databases because replication.
. these systems were not
developed with horizontal
scaling. 55
Difference between SQL and NoSQL
SQL NOSQL
Scaling SQL works on a scaling up (vertical) NoSQL works on a scaling out
technique. In a scaling up approach approach.
we add resources within the same In this approach we add a new
logical unit to increase the capacity. node (server) to the system
For example add a CPU to a single such that an entire load is

Administration
Business Intelligence and Database
server or add (increase) memory or distributed over all the servers.
add some external storage devices to
increase storage capacity.

Data Data manipulation is done using SQL Through Object Oriented APIs,
Manipulation commands like Select, Update and for example MongoDB
Delete, for example Delete From , db.collection_name.find(),
Select *. db.collection_name.remove().
56
Difference between SQL and NoSQL
SQL NOSQL
Transaction SQL supports transaction NoSQL supports
Management management. Either the entire query transaction management
set is submitted successfully or not at in certain circumstances
all. and at certain levels (for
example, document level
vs. database level).

Administration
Business Intelligence and Database
Some RDBMSs are open-source, like NoSQL is completely
Development Postgres, MySQL and some RDBMSs open-source
Model are closed-source like SQL Server and
Oracle.
Consistency SQL support has a high level of Some database systems
consistency. are highly inconsistent,
for example MongoDB
and some database
system are eventual 57
consistence, for example
Cassandra

You might also like