CPS 216: Advanced Database Systems
Shivnath Babu Fall 2006
Outline for Today
What this class is about: Data management What we will cover in this class Logistics
Data Management
Application
Query Query
Query
Data
DataBase Management System (DBMS)
Example: At a Company
Query 1: Is there an employee named Nemo? Query 2: What is Nemos salary? Query 3: How many departments are there in the company? Query 4: How many employees have Salary >= 80K? Query 5: What is the name of Nemos department? Query 6: How many employees are there in the Accounts department? Employee
ID 10 20 40 52 Name Nemo Dory Gill Ray DeptID 12 156 89 34 Salary 120K 79K 76K 85K
Department
ID 12 34 89 156 Name IT Accounts HR Marketing
DataBase Management System (DBMS)
High-level Query Q
Answer Translates Q into best execution plan for current conditions, runs plan
DBMS
Data
Example: Store that Sells Cars
Make Model OwnerID ID Name Owners of 12 12 Nemo Honda Accords Honda Accord who are <= Honda Accord 156 156 Dory 23 years old Join (Cars.OwnerID = Owners.ID) Filter (Make = Honda and Model = Accord) Age 22 21
Filter (Age <= 23)
Cars
Make Honda Toyota Mini Honda Model Accord Camry Cooper Accord OwnerID 12 34 89 156
Owners
ID 12 34 89 156 Name Nemo Ray Gill Dory Age 22 42 36 21
DataBase Management System (DBMS)
High-level Query Q
Answer Translates Q into best execution plan for current conditions, runs plan
DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.
Data
DBMS is multi-user
Example
Get account balance from database; If balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;
Homer at ATM1 withdraws $100 Marge at ATM2 withdraws $50 Initial balance = $400, final balance = ? Should be $250 no matter who goes first
Final balance = $300
Homer withdraws $100: Marge withdraws $50:
read balance; $400
read balance; $400 if balance > amount then balance = balance - amount; $3 write balance; $350
if balance > amount then balance = balance - amount; $300 write balance; $300
Final balance = $350
Homer withdraws $100: Marge withdraws $50:
read balance; $400
read balance; $400
if balance > amount then balance = balance - amount; $300 write balance; $300 if balance > amount then balance = balance - amount; $3 write balance; $350
Concurrency control in DBMS
Similar to concurrent programming problems But data is not all in main-memory Appears similar to file system concurrent access? Approach taken by MySQL initially; now MySQL offers better alternatives But want to control at much finer granularity Or else one withdrawal would lock up all accounts!
Recovery in DBMS
Example: balance transfer decrement the balance of account X by $100; increment the balance of account Y by $100; Scenario 1: Power goes out after the first instruction Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out Log updates; undo/redo during recovery
DataBase Management System (DBMS)
High-level Query Q
Answer Translates Q into best execution plan for current conditions, runs plan
DBMS
Keeps data safe and correct despite failures, concurrent updates, online processing, etc.
Data
Summary of modern DBMS features
Persistent storage of data Logical data model; declarative queries and updates ! physical data independence Multi-user concurrent access Safety from system failures Performance, performance, performance Massive amounts of data (terabytes ~ petabytes) High throughput (thousands ~ millions transactions per minute) High availability ( 99.999% uptime)
Modern DBMS Architecture
Applications SQL DBMS Parser Logical query plan Query Optimizer Physical query plan Query Executor Access method API calls Storage Manager
File system API calls Storage system API calls OS Disk(s)
Course Outline
50% of the class is about core DBMS concepts
Query execution, query optimization, transactions, recovery, etc. Textbook material
50% of the class is on what is happening today in data management
Data streams Web search Google, Yahoo! XML and data integration Data mining Sensor data management
Using a Traditional DBMS
User/Application Query Query Result Result
Loader
Table R Table S
New Approach for Data Streams
User/Application Register Continuous Query (Standing Query)
Result
Input streams
Stream Query Processor
Example Continuous (Standing) Queries
Web Amazons best sellers over last hour Network Intrusion Detection Track HTTP packets with destination address matching a prefix in given table and content matching *\.ida Finance Monitor NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes
Course Outline
50% of the class is about core DBMS concepts
Query execution, query optimization, transactions, recovery, etc. Textbook material
50% of the class is on what is happening today in data management
Data streams Web search Google, Yahoo! XML and data integration Data mining Sensor data management
New Challenges in DBMSs
High-level Query Q Answer
DBMS
TeraBytes PetaBytes
Data
<CD> <TITLE>Empire B.</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia </COMPANY> <PRICE>10.90</PRICE> </CD>
Course Logistics
Recommended reference: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom Web site: http://www.cs.duke.edu/education/courses/fall06/cps216
Grading:
Homework Assignments 15% Project 25% Midterm 25% Final 35%
Summary: Data Management is Important
Core aspect of most sciences and engineering today Core need in industry Cool mix of theory and systems Chances are you will find something interesting even if you primary interest is elsewhere