Cse 412
Cse 412
Jia Zou
Arizona State University
1
Instruction Team
• Jia Zou (Instructor)
• Assistant Professor, joined ASU in 2019 fall
• Six years’ industrial experience in IBM Research
• Office hour: Wed 10am~11am at Zoom or Room 414
https://asu.zoom.us/j/2534367426
• Will also answer questions after each lecture
2
Graduate TAs
• Saif Masood (TA)
• Email: [email protected]
• Office hour: Wed 4-5p
• Soham Nag (TA)
• Email: [email protected]
• Office hour: Thu 11-12p
You can find Zoom Link for TA office hours in Canvas course homepage.
3
Agenda
• Why do you need to learn about database?
• Course logistics
• A brief history of database systems
4
Discussion
• Anyone want to share your learning goal of CSE 412?
• Why you want to learn about database systems?
• What do you expect to learn at the end of the semester?
7
Database is everywhere
8
Database is everywhere
9
Database is everywhere
10
Database is everywhere
11
Expected Learning Outcome of this Course
• Understand “What” and “Why”
• Relational algebra
• SQL
• database design
• functional dependency
• data storage
• query execution/optimization/compilation
• transaction
• Know “How”
• Setup a database on a computer server
• Design and create efficient relational data models that fits the application
• Loading data into the database
• Write SQL programs to issue queries to the database to read and analyze data
• Build applications that access the database to retrieve relevant data and present it to
end users
12
Course Logistics
Mon/Wed 12pm – 1:15pm
SCOB 210
13
Textbooks (Pick one that you like)
• [SKS] Database System
Concepts 7th
edition https://www.db-
book.com/db7/
14
Grading Policy
15
Grading Scheme
• A+: Grade >= 97%
• A: 97% > Grade >= 94%
• A-: 94% > Grade >= 89% The final grade may not be based solely
• B+: 89% > Grade >= 86% on the absolute percentage, but the
• B: 86% > Grade >= 80% mean and standard deviations of the
• B-: 80% > Grade >= 77% class will be considered to curve the
• C+: 77% > Grade >= 73% final grade when needed.
• C: 73% > Grade >= 70%
• D: 70% > Grade >= 60%
• E: < 60%
16
Assignments (20%)
• There will be four assignments that constitute 20% of the total grade.
• All assignments should be submitted to GradeScope.
• All deadlines are hard deadlines, and no extension is allowed without
acceptable proof for a university approved excuse.
17
Group Project (30%)
• There will be a group project that constitutes 30% of the total grade.
• The TA will release the spreadsheet for forming groups.
• Grouping should be finalized by end of Sept 6th. (two weeks from
now)
• Three to four students in one group
• The project consists of three phases.
18
Proposal Phase (8pt) Due Sept 18th
• First select an information management application that you want to
develop (not limited to following)
• https://github.com/topics/dbms-project
• https://github.com/yogeshk4/sales-and-inventory-management-system
• https://github.com/mayankpadhi/Hospital-Management
• https://github.com/chawat/BankingApp
• https://github.com/chiragchevli/Animal_Species_Repository_System
• https://github.com/dancancro/great-big-example-application
• https://github.com/dbs/postgresql-full-text-search-engine
• https://github.com/dushan14/books-store
• https://github.com/jai-singhal/croma
• https://github.com/carlalmadureira/Django-PostgreSQL-Application
19
Proposal Phase (8pt) Due Sept 18th
20
Proposal Phase (8pt) Due Sept 18th
• ER Diagram (5pt) : You will have to turn in a complete ER diagram
that models your application entities and relationships.
21
Proposal Phase (8pt) Due Sept 18th
• Implementation Plans (in plain English) (0.5 pt) One or two
paragraphs to explain your project implementation plans. Every
group will have to use PostgreSQL as the main DBMS for their
project. However each group need explain what kind of Application
Development technology you plan to use (PHP/JSP for Web APP,
IOS/Android APP…), which is totally your decision and you can select
whatever technology you prefer to use.
22
Proposal Phase (8pt) Due Sept 18th
• Presentation (1.5 pt): Each group will have to give a 5 minutes video
presentation explaining their application and the ER diagram. You
need give the five-minutes video demonstration as video clip (screen
capture), upload it to YouTube and provide the YouTube link in the
report.
23
Proposal Phase (8pt) Due Sept 18th
• Peer review is due on Sept 25th, which will be automatically assigned
in Canvas. For each assigned review, the reviewer is expect to give a
point between 0 to 10, as the overall rating for the demo, and the
reviewer must write a review to list (1) the rating (0-10), (2)at least
one strength, (3) at least one weakness, and (4) at least one
suggestion to the proposal, and submit the review as a comment in
the Canvas Peer Review platform.
24
Midterm-Report Phase (12 pts) Due Oct 16th
• ER-to-Relational (4 pt): Transform the ER diagram into relational
model using SQL data definition language (DDL)
25
Midterm-Report Phase (12 pts) Due Oct 16th
• Fill in the database with data (synthetically generated, scraped from
web data or integrated from open data) (4 pt): . That should be done
through a sequence of SQL Insert statements.
26
Midterm-Report Phase (12 pts) Due Oct 16th
• SQL Queries (4 pt): Prepare examples of SQL queries that cover the
application description (as described by the proposal submitted). The
SQL queries should include multiple SELECT queries and
INSERT/UPDATE/DELETE queries.
27
Midterm-Report Phase (12 pts) Due Oct 16th
• Each group will have to submit a report as well as demonstrate the
aforementioned tasks to the instructor (and/or the TA) using a
PostgreSQL console. Note that this phase doesn’t require any GUI
demonstration.
28
Final-Report Phase (10 pts) Due Dec 4th
• 6pt will be used to evaluate how much the final interface matches the
application requirements and database schema you created in Phase
1. and 2.
• 2pt will be used to evaluate how intuitive and neat the interface you
created. These 2 points will be based on the judgement of both the
instructor and TA.
• 2pt will be used to evaluate how good your final application is.
However, this time your peers (classmates) will have to rank all
projects and the 2 points grade will be based on the overall ranking.
29
Final-Report Phase (10 pts) Due Dec 4th
• You need give a five-minutes video demonstration as video clip
(screen capture), upload it to YouTube and provide the YouTube link in
the report. In the demonstration you will have to go through as much
of the proposed application functionality.
• Turn in the final code as well as database dump.
• Turn in a user manual that guides the user of how to use the
application (with screen shots).
30
Final-Report Phase (10 pts) Due Dec 4th
• Peer review is due on Dec 7th, which will be automatically assigned in
Canvas. For each assigned review, the reviewer is expect to give a
point between 0 to 10, as the overall rating for the demo, and the
reviewer must write a review to list (1) the rating (0-10), (2)at least
one strength, (3) at least one weakness, and (4) at least one
suggestion to the proposal, and submit the review as a comment in
the Canvas Peer Review platform.
31
Examples
• https://github.com/topics/dbms-project
• https://github.com/yogeshk4/sales-and-inventory-management-system
• https://github.com/mayankpadhi/Hospital-Management
• https://github.com/chawat/BankingApp
• https://github.com/chiragchevli/Animal_Species_Repository_System
• https://github.com/dancancro/great-big-example-application
• https://github.com/dbs/postgresql-full-text-search-engine
• https://github.com/dushan14/books-store
• https://github.com/jai-singhal/croma
• https://github.com/carlalmadureira/Django-PostgreSQL-Application
34
Warning: You fail if you miss any
assignment or get lower than 50% in
an exam
35
Please refer to the Syllabus for
more details. The due dates may
subject to adjustments.
A History of Database Systems
37
1961: “We choose to go to moon”
38
1961: “We choose to go to moon”
• Apollo needs an automated system to manage the purchase
information of a large number of rocket parts and IBM took the job.
39
Why a flat file strawman doesn’t work?
• Why not simply writing all of the information into a flat file that sit
in local disk?
40
Why a flat file strawman doesn’t work?
• Why not simply writing all of the information into a flat file that sit
in local disk?
• Anyone volunteer to share your opinion?
41
Why a flat file strawman doesn’t work?
• Integrity: when multiple parties modify the file, how to ensure
integrity and correctness?
• Implementation: each time when we need to read, search, or update
some information to the file, we need implement an application from
scratch?
• Durability: how to handle crashes and failures?
• Security: how to ensure security access to different information in the
file for different users, that is how to enforce access control policies
for different users?
42
1966: IBM Information Management System
(IMS)
• IBM designed IMS with Rockwell and Caterpillar
• To inventory the very large bill of materials and keep track of
purchase orders for the Saturn V moon rocket and Apollo space
vehicle
• Hierarchical data model
43
Hierarchical Data Model
• It stores data as a tree, so parent nodes contain pointers to children nodes
Schema Instances
44
Hierarchical Data Model
• However, this approach doesn’t consider how to avoid redundant information
shared by instances.
Schema Instances
Redundancy 45
Hierarchical Data Model
OR
47
Network Data Model
48
Network Data Model
49
Network Data Model
Complex queries
Easily corrupted
50
1970s: Relational Model
• Edgar Frank Codd was a mathematician working
at IBM Research. He saw developer represent a
query like “Find the employees who earn more
than their managers.” in CODASYL using complex
code that were five pages long that would
navigate through this labyrinth of pointers and
stuff.
• He propose relational model as a new database
abstraction, which is still in wide use today and
won him a Turing award in 1981
51
Relational Data Model: Conceptual Schema
52
Relational Data Model: Relations
In each relation, a
set of attributes is
the key, if the key
minimally and
uniquely identifies
a tuple.
53
Queries in Relational Model
• Examples:
• Give me the names of all Suppliers, whose qty for at least one part is larger
than 10
• Give me the names, and states of all Suppliers, whose price for part with
pno=999 is lower than 100
• Give me the names of all parts, which has more than one supplier
• Insert one Supplier, whose name is XXXX, city is XXXX, state is XXXX
• Remove a Supplier whose sno is 1001, and remove all part information
related with this Supplier
• Update the price with sno=1001, and pno=999 to 200
54
Queries in Relational Model
• Query Language
• Relational Calculus (a formal language based on mathematical logics)
• Relational Algebra (based on a collection of operators for manipulating relations)
• SQL (Structured Query Language for Relational Algebra)
• Data Manipulation Language (DML)
• Insert, delete, update, query relations
• Query Language is part of DML
• Data Definition Language (DDL)
• Define relations and schemas
55
Example
• Give me the names of all Suppliers, whose qty for at least one part is
larger than 10
Select Suppliers.Name
from Suppliers, Supplies
Where Suppliers.sno == Supplies.sno AND Supplies.qty > 10
56
Relational Data Model: Physical Schema
57
Relational Data Model
• Main concept: Queries
• relation: basically a table with rows and
columns
• every relation has a conceptual schema, Tables
which describes the logical structure
(columns, primary keys, foreign keys etc.)
Storage
• every relation also has a physical schema,
which describes how the relation is stored
(sorting, indexing, partitioning,
column/row storage)
• every relation can have many external
schema (views)
58
Benefits of Relational Model: Data
Independence
• A query only specifies what to compute, and not how to compute.
• The database management system (DBMS) will be responsible for parsing the
query, evaluating the query, optimizing the query execution plan, and execute
the query.
• Logical schema is decoupled with physical schema
• Changing the physical schema do not require any changes in the logical
schema, and the applications (queries)
59
Then there were years’ debate about which data model is better,
CODASYL or Relational…
Relational Model is hard to implement!
CODASYL is too complex! Most applications can work with a-tuple-at-
Set-oriented queries in CODASYL is too difficult to develop! a-time queries.
CODASYL lacks a theoretical foundation! Set-oriented queries are not popular.
60
Early implementations of relational DBMS
63
1990s: Boring Days
• No major advancements in database systems or application
workloads
• Microsoft creates SQL server based on Sybase
• An open source database MySQL is written
• Postgres gets SQL support
• SQLite started in early 2000
65
2000s: Internet Boom
• All the big players were heavyweight
and expensive
• Open-source databases were missing
important features
• Companies wrote their own
middleware to scale out database
across single-node DBMS instances
66
• Focus on high-availability & high-scalability:
2000s: NoSQL
• Schemaless
• Non-relational data model: document, key/value, etc.
67
2000s: Data Warehouses
• Rise of special purpose OLAP DBMSs
• Distributed/Shared-Nothing
• Relational/SQL
• Closed-source
• Column-storage
68
2010s: Big Data/IoT Era
• Numerous specialized database systems
• NewSQL
• Hybrid Transactional-Analytical Processing
• Cloud database
• Shared-disk database (on distributed storage)
• Graph database
• Timeseries database
• Embedded DBMSs
• Multi-model DBMSs
• Blockchain DBMSs
• Hardware Acceleration
69
Today: AI + Database
• Database with a Natural Language interface?
• Database for retrieving contexts for Large Language Model (LLM)
tasks?
70
https://dbdb.io
71
54
Embedded DBMSs
A lot of systems are still
Multi-Model DBMSs
extending SQL interfaces
and relational models
Blockchain
Even the NoSQL DBMSs
systems
Hardware Acceleration
use a lot of RDBMS ideas
73
TO DO List
1. Reading
5. Please think about what project you want to do, write it down (no need to submit it to
me), and send it to your team members once the group formation are announced.
74
Next Class
• We will learn basics of relational algebra, which is the foundation of
SQL, and ER diagram that you need for your group project phase 1
75