DATA MANAGEMENT IN
CLOUD
Dr. Abdur Rahman,
AP/CSE
Syllabus
Unit –I
DISTRIBUTED DATABASE
Distributed Database
• Database - Organized collection, structured data.
• Distributed – Shared, Spread out.
• Collection of data that is shared which is physically distributed over a
computer network on different sites.
• Logically it’s a single DB divided into number of pieces called Fragments.
• Distributed databases use a client/server architecture to process information
requests.
Types of Distributed Database
• Homogenous Distributed Database
Systems
• Heterogeneous Distributed Database
Systems
• Generic connectivity
• ODBC
• OLE DB Protocols
Concepts
• Database Links
• Defines a one-way communication path from server to another database server.
Types of DB Links
• Public
• Private
• Connected
Distributed Data Storage
• Replication
• Redundantly at 2 or ore sites.
• Increased Availability
• Parallel Processing
• Constant Updates - Overhead
• Fragmentation
• Divided into smaller parts.
• Consistent
• No redundancy
Distributed Databases Features
• Location independent
• Distributed query processing
• Distributed transaction management
• Hardware independent
• Operating system independent
• Network independent
• Transaction transparency
• DBMS independent
Examples of distributed databases
• Apache Ignite,
• Apache Cassandra,
• Apache HBase,
• Couchbase Server,
• Amazon SimpleDB,
• Clusterpoint, and
• FoundationDB.
Data Allocation
• Intelligent distribution of your data pieces.
• Performance and Availability
• Types: Centralized, Partition, and Replicated.
• Strategies:
• Data Fragmentation
• Dividing the database into part/sub-table.
• Horizontal fragmentation.
• Vertical fragmentation,
• Mixed or Hybrid fragmentation.
• Data Replication
• Copying of Data – Multiple locations.
Horizontal Fragmentation
ID Name Age Marks
1 A 21 20
2 B 22 25
3 C 23 30
4 D 24 35
SELECT * FROM student WHERE marks < 35;
ID Name Age Marks
1 A 21 20
2 B 22 25
SELECT * FROM student WHERE marks > 35;
ID Name Age Marks
4 D 24 35
T1
T2
Types:
•Primary
•Derived
•Complete
T = T1 ∪ T2 ∪ …. ∪ TN
Vertical Fragmentation
SELECT Name FROM Table;
ID Name Age Marks
1 A 21 20
2 B 22 25
3 C 23 30
4 D 24 35
SELECT Age FROM Table;
Name
A
B
C
D
Age
21
22
23
24
Hybrid Fragmentation
SELECT Name FROM Table WHERE age = 22;
ID Name Age Marks
1 A 21 20
2 B 22 25
3 C 23 30
4 D 24 35
Name
B
Types of Data Replication
• Transactional Replication
• A complete copy of your database
• Copies of new data changes
• Database are synced in real-time
• Snapshot Replication
• Simplest type of Data Replication
• Current state at a specific in time
• Merge Replication
• Tracks subsequent data changes and schema modifications
• Synchronizes using merge agents
Structure of Distributed Database
Architecture Models of Distributed Database Systems
• Client-Server Architecture
• Peer-to-peer Architecture
• Multi DBMS Architecture
Trade-off in Distributed Database
• Trade-off
• CAP Theorem
• Consistency: Every read receives the most recent write or an error.
• Availability: Every request receives a response
• Partition tolerance: Continued Functioning.
Practical Implications
• Understand the requirements
• Choose the appropriate replication scheme
• Use appropriate data structures
• Plan for failures
“The fate of your distributed system rests on your ability to make the right
trade-offs. Choose wisely! “
Objectives of the Design of Data Distribution
• Processing locality- placing data as close as possible to the
applications which use them.
• Availability and reliability of distributed data.
• Workload distribution.
• Storage costs and availability.
Distributed Database Design - Concept
• Centralized DB Issue
• Designing the "conceptual schema" - High Level Description - Main Concept and
Relationships.
• Designing the "physical database," i.e., mapping the conceptual schema to storage
areas and determining appropriate access methods.
• Distributed DB Issue
• Designing the fragmentation.
• Designing the allocation of fragments- mapping to physical image.
Two Strategies
• Top Down
• Designing systems from scratch
• Mostly in homogeneous systems
Bottom Up Approach
• When the databases already exist at a number of sites
Design of Distributed Database
• Top Down Design
• Designing the global schema, and we
• Designing the fragmentation of the database, and then by
• Allocating the fragments to the sites, and
• Creating the physical images
• Bottom Up
• The selection of a common database model for describing the global schema of the
database.
• 2. The translation of each local schema into the common data model.
• 3. The integration of the local schemata into a common global schema.Loosely
Coupled System

DATA BASE MANAGEMENT IN CLOUD - UNIT -1 PPT

  • 1.
    DATA MANAGEMENT IN CLOUD Dr.Abdur Rahman, AP/CSE
  • 2.
  • 3.
  • 4.
    Distributed Database • Database- Organized collection, structured data. • Distributed – Shared, Spread out. • Collection of data that is shared which is physically distributed over a computer network on different sites. • Logically it’s a single DB divided into number of pieces called Fragments. • Distributed databases use a client/server architecture to process information requests.
  • 5.
    Types of DistributedDatabase • Homogenous Distributed Database Systems • Heterogeneous Distributed Database Systems • Generic connectivity • ODBC • OLE DB Protocols
  • 6.
    Concepts • Database Links •Defines a one-way communication path from server to another database server. Types of DB Links • Public • Private • Connected
  • 7.
    Distributed Data Storage •Replication • Redundantly at 2 or ore sites. • Increased Availability • Parallel Processing • Constant Updates - Overhead • Fragmentation • Divided into smaller parts. • Consistent • No redundancy
  • 8.
    Distributed Databases Features •Location independent • Distributed query processing • Distributed transaction management • Hardware independent • Operating system independent • Network independent • Transaction transparency • DBMS independent
  • 9.
    Examples of distributeddatabases • Apache Ignite, • Apache Cassandra, • Apache HBase, • Couchbase Server, • Amazon SimpleDB, • Clusterpoint, and • FoundationDB.
  • 10.
    Data Allocation • Intelligentdistribution of your data pieces. • Performance and Availability • Types: Centralized, Partition, and Replicated. • Strategies: • Data Fragmentation • Dividing the database into part/sub-table. • Horizontal fragmentation. • Vertical fragmentation, • Mixed or Hybrid fragmentation. • Data Replication • Copying of Data – Multiple locations.
  • 11.
    Horizontal Fragmentation ID NameAge Marks 1 A 21 20 2 B 22 25 3 C 23 30 4 D 24 35 SELECT * FROM student WHERE marks < 35; ID Name Age Marks 1 A 21 20 2 B 22 25 SELECT * FROM student WHERE marks > 35; ID Name Age Marks 4 D 24 35 T1 T2 Types: •Primary •Derived •Complete T = T1 ∪ T2 ∪ …. ∪ TN
  • 12.
    Vertical Fragmentation SELECT NameFROM Table; ID Name Age Marks 1 A 21 20 2 B 22 25 3 C 23 30 4 D 24 35 SELECT Age FROM Table; Name A B C D Age 21 22 23 24
  • 13.
    Hybrid Fragmentation SELECT NameFROM Table WHERE age = 22; ID Name Age Marks 1 A 21 20 2 B 22 25 3 C 23 30 4 D 24 35 Name B
  • 14.
    Types of DataReplication • Transactional Replication • A complete copy of your database • Copies of new data changes • Database are synced in real-time • Snapshot Replication • Simplest type of Data Replication • Current state at a specific in time • Merge Replication • Tracks subsequent data changes and schema modifications • Synchronizes using merge agents
  • 15.
  • 16.
    Architecture Models ofDistributed Database Systems • Client-Server Architecture • Peer-to-peer Architecture • Multi DBMS Architecture
  • 17.
    Trade-off in DistributedDatabase • Trade-off • CAP Theorem • Consistency: Every read receives the most recent write or an error. • Availability: Every request receives a response • Partition tolerance: Continued Functioning.
  • 18.
    Practical Implications • Understandthe requirements • Choose the appropriate replication scheme • Use appropriate data structures • Plan for failures “The fate of your distributed system rests on your ability to make the right trade-offs. Choose wisely! “
  • 19.
    Objectives of theDesign of Data Distribution • Processing locality- placing data as close as possible to the applications which use them. • Availability and reliability of distributed data. • Workload distribution. • Storage costs and availability.
  • 20.
    Distributed Database Design- Concept • Centralized DB Issue • Designing the "conceptual schema" - High Level Description - Main Concept and Relationships. • Designing the "physical database," i.e., mapping the conceptual schema to storage areas and determining appropriate access methods. • Distributed DB Issue • Designing the fragmentation. • Designing the allocation of fragments- mapping to physical image.
  • 21.
    Two Strategies • TopDown • Designing systems from scratch • Mostly in homogeneous systems
  • 22.
    Bottom Up Approach •When the databases already exist at a number of sites
  • 25.
    Design of DistributedDatabase • Top Down Design • Designing the global schema, and we • Designing the fragmentation of the database, and then by • Allocating the fragments to the sites, and • Creating the physical images • Bottom Up • The selection of a common database model for describing the global schema of the database. • 2. The translation of each local schema into the common data model. • 3. The integration of the local schemata into a common global schema.Loosely Coupled System