DDB Slides
DDB Slides
1
Centralized DB systems
P Software:
Application
SQL Front End
M ... Query Processor
Transaction Proc.
File Access
• Simplifications:
single front end
one place to keep data, locks
if processor fails, system fails, ...
2
Distributed Database Systems
3
Why do we need Distributed Databases?
4
Data Access Pattern
• Mostly, employee data is managed at the
office where the employee works
– E.g., payroll, benefits, hire and fire
• Periodically, company needs consolidated
access to employee data
– E.g., company changes benefit plans and that
affects all employees.
– E.g., Annual bonus depends on global net profit.
5
London New York
Payroll app Payroll app
EMP
London
New York
Internet
Hong Kong
Payroll app Problem:
NY and HK payroll
apps run very slowly!
Hong Kong
6
London New York
Payroll app Payroll app
London
Emp NY
London
New York Emp
Internet
Hong Kong
Payroll app
Much better!!
Hong Kong
HK
Emp
7
London New York
Payroll app Payroll app
Annual
Bonus app
London
Emp NY
London
New York Emp
Internet
Hong Kong Distribution provides
Payroll app
opportunities for
parallel execution
Hong Kong
HK
Emp
8
London New York
Payroll app Payroll app
Annual
Bonus app
London
Emp NY
London
New York Emp
Internet
Hong Kong
Payroll app
Hong Kong
HK
Emp
9
London New York
Payroll app Payroll app
Annual
Bonus app
Lon, NY
Emp NY, HK
London
New York Emp
Internet
Hong Kong
Payroll app Replication improves
availability
Hong Kong
HK, Lon
Emp
10
Distributed Database Features
11
Distributed Database Types
• In a homogeneous distributed database
– All sites have identical software
– Are aware of each other and agree to cooperate in processing user
requests.
– Each site surrenders part of its autonomy in terms of right to
change schemas or software
– Appears to user as a single system
• In a heterogeneous distributed database
– Different sites may use different schemas and software
Difference in schema is a major problem for query processing
Difference in software is a major problem for transaction
processing
– Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
homogeneous distributed database
13
heterogeneous distributed
database
14
Distributed Database Advantages
and Disadvantages
15
Distributed Database Challenges
• Distributed Database Design
– Deciding what data goes where
– Depends on data access patterns of major
applications
– Two subproblems:
Fragmentation: partition tables into fragments
Allocation: allocate fragments to nodes
16
Distributed Data Storage
• Advantages of Replication
– Availability: failure of site containing relation r does not result
in unavailability of r is replicas exist.
– Parallelism: queries on r may be processed by several nodes in
parallel.
– Reduced data transfer: relation r is available locally at each
site containing a replica of r.
• Disadvantages of Replication
– Increased cost of updates: each replica of relation r must be
updated.
– Increased complexity of concurrency control: concurrent updates
to distinct replicas may lead to inconsistent data unless special
concurrency control mechanisms are implemented.
One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
Data Fragmentation
• Horizontal:
– allows parallel processing on fragments of a relation
– allows a relation to be split so that tuples are located
where they are most frequently accessed
• Vertical:
– allows tuples to be split so that each part of the tuple is
stored where it is most frequently accessed
– tuple-id attribute allows efficient joining of vertical
fragments
– allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
– Fragments may be successively fragmented to an
arbitrary depth.
Data Transparency
Architectural complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex