0% found this document useful (0 votes)
107 views

Cse 412

This document outlines the syllabus for a database management course. It introduces the instructor, TAs, course agenda, expected learning outcomes, logistics, textbooks, grading policy, assignments including a group project with multiple phases, and project requirements.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Cse 412

This document outlines the syllabus for a database management course. It introduces the instructor, TAs, course agenda, expected learning outcomes, logistics, textbooks, grading policy, assignments including a group project with multiple phases, and project requirements.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

CSE 412 Database Management

Jia Zou
Arizona State University

1
Instruction Team
• Jia Zou (Instructor)
• Assistant Professor, joined ASU in 2019 fall
• Six years’ industrial experience in IBM Research
• Office hour: Wed 10am~11am at Zoom or Room 414
https://asu.zoom.us/j/2534367426
• Will also answer questions after each lecture

2
Graduate TAs
• Saif Masood (TA)
• Email: [email protected]
• Office hour: Wed 4-5p
• Soham Nag (TA)
• Email: [email protected]
• Office hour: Thu 11-12p

You can find Zoom Link for TA office hours in Canvas course homepage.

3
Agenda
• Why do you need to learn about database?
• Course logistics
• A brief history of database systems

4
Discussion
• Anyone want to share your learning goal of CSE 412?
• Why you want to learn about database systems?
• What do you expect to learn at the end of the semester?

7
Database is everywhere

Facebook uses MYSQL as the


primary database management system for all
the structured data storage such as the posts,
information of the various users, their timeline
and so on.

8
Database is everywhere

Instagram use PostgreSQL for handling user,


media, friendship etc.

9
Database is everywhere

WordPress uses MySQL for


its database management
system. MySQL is
responsible for managing
user data, user meta, posts,
comments, and so on.

10
Database is everywhere

11
Expected Learning Outcome of this Course
• Understand “What” and “Why”
• Relational algebra
• SQL
• database design
• functional dependency
• data storage
• query execution/optimization/compilation
• transaction
• Know “How”
• Setup a database on a computer server
• Design and create efficient relational data models that fits the application
• Loading data into the database
• Write SQL programs to issue queries to the database to read and analyze data
• Build applications that access the database to retrieve relevant data and present it to
end users
12
Course Logistics
Mon/Wed 12pm – 1:15pm
SCOB 210

13
Textbooks (Pick one that you like)
• [SKS] Database System
Concepts 7th
edition https://www.db-
book.com/db7/

14
Grading Policy

Assignments (4) 20%


Group Project (1) 30%
Mid-term Exams (2) 20%
Final Exam (1) 25%
Participation (peer-reviews, quiz, 5%
questions etc.)

15
Grading Scheme
• A+: Grade >= 97%
• A: 97% > Grade >= 94%
• A-: 94% > Grade >= 89% The final grade may not be based solely
• B+: 89% > Grade >= 86% on the absolute percentage, but the
• B: 86% > Grade >= 80% mean and standard deviations of the
• B-: 80% > Grade >= 77% class will be considered to curve the
• C+: 77% > Grade >= 73% final grade when needed.
• C: 73% > Grade >= 70%
• D: 70% > Grade >= 60%
• E: < 60%

16
Assignments (20%)
• There will be four assignments that constitute 20% of the total grade.
• All assignments should be submitted to GradeScope.
• All deadlines are hard deadlines, and no extension is allowed without
acceptable proof for a university approved excuse.

17
Group Project (30%)
• There will be a group project that constitutes 30% of the total grade.
• The TA will release the spreadsheet for forming groups.
• Grouping should be finalized by end of Sept 6th. (two weeks from
now)
• Three to four students in one group
• The project consists of three phases.

18
Proposal Phase (8pt) Due Sept 18th
• First select an information management application that you want to
develop (not limited to following)
• https://github.com/topics/dbms-project
• https://github.com/yogeshk4/sales-and-inventory-management-system
• https://github.com/mayankpadhi/Hospital-Management
• https://github.com/chawat/BankingApp
• https://github.com/chiragchevli/Animal_Species_Repository_System
• https://github.com/dancancro/great-big-example-application
• https://github.com/dbs/postgresql-full-text-search-engine
• https://github.com/dushan14/books-store
• https://github.com/jai-singhal/croma
• https://github.com/carlalmadureira/Django-PostgreSQL-Application
19
Proposal Phase (8pt) Due Sept 18th

• Detailed Application Requirements (in plain English) (1pt) : This


explains an application (that you already chose) in detail and how
different objects/users interact with each other.

20
Proposal Phase (8pt) Due Sept 18th
• ER Diagram (5pt) : You will have to turn in a complete ER diagram
that models your application entities and relationships.

21
Proposal Phase (8pt) Due Sept 18th
• Implementation Plans (in plain English) (0.5 pt) One or two
paragraphs to explain your project implementation plans. Every
group will have to use PostgreSQL as the main DBMS for their
project. However each group need explain what kind of Application
Development technology you plan to use (PHP/JSP for Web APP,
IOS/Android APP…), which is totally your decision and you can select
whatever technology you prefer to use.

22
Proposal Phase (8pt) Due Sept 18th
• Presentation (1.5 pt): Each group will have to give a 5 minutes video
presentation explaining their application and the ER diagram. You
need give the five-minutes video demonstration as video clip (screen
capture), upload it to YouTube and provide the YouTube link in the
report.

23
Proposal Phase (8pt) Due Sept 18th
• Peer review is due on Sept 25th, which will be automatically assigned
in Canvas. For each assigned review, the reviewer is expect to give a
point between 0 to 10, as the overall rating for the demo, and the
reviewer must write a review to list (1) the rating (0-10), (2)at least
one strength, (3) at least one weakness, and (4) at least one
suggestion to the proposal, and submit the review as a comment in
the Canvas Peer Review platform.

24
Midterm-Report Phase (12 pts) Due Oct 16th
• ER-to-Relational (4 pt): Transform the ER diagram into relational
model using SQL data definition language (DDL)

25
Midterm-Report Phase (12 pts) Due Oct 16th
• Fill in the database with data (synthetically generated, scraped from
web data or integrated from open data) (4 pt): . That should be done
through a sequence of SQL Insert statements.

26
Midterm-Report Phase (12 pts) Due Oct 16th
• SQL Queries (4 pt): Prepare examples of SQL queries that cover the
application description (as described by the proposal submitted). The
SQL queries should include multiple SELECT queries and
INSERT/UPDATE/DELETE queries.

27
Midterm-Report Phase (12 pts) Due Oct 16th
• Each group will have to submit a report as well as demonstrate the
aforementioned tasks to the instructor (and/or the TA) using a
PostgreSQL console. Note that this phase doesn’t require any GUI
demonstration.

• This will be arranged in the weekend Oct 22/Oct 23


Please let me know the times of
games, so that you can enjoy both!!!!

28
Final-Report Phase (10 pts) Due Dec 4th
• 6pt will be used to evaluate how much the final interface matches the
application requirements and database schema you created in Phase
1. and 2.
• 2pt will be used to evaluate how intuitive and neat the interface you
created. These 2 points will be based on the judgement of both the
instructor and TA.
• 2pt will be used to evaluate how good your final application is.
However, this time your peers (classmates) will have to rank all
projects and the 2 points grade will be based on the overall ranking.

29
Final-Report Phase (10 pts) Due Dec 4th
• You need give a five-minutes video demonstration as video clip
(screen capture), upload it to YouTube and provide the YouTube link in
the report. In the demonstration you will have to go through as much
of the proposed application functionality.
• Turn in the final code as well as database dump.
• Turn in a user manual that guides the user of how to use the
application (with screen shots).

30
Final-Report Phase (10 pts) Due Dec 4th
• Peer review is due on Dec 7th, which will be automatically assigned in
Canvas. For each assigned review, the reviewer is expect to give a
point between 0 to 10, as the overall rating for the demo, and the
reviewer must write a review to list (1) the rating (0-10), (2)at least
one strength, (3) at least one weakness, and (4) at least one
suggestion to the proposal, and submit the review as a comment in
the Canvas Peer Review platform.

31
Examples
• https://github.com/topics/dbms-project
• https://github.com/yogeshk4/sales-and-inventory-management-system
• https://github.com/mayankpadhi/Hospital-Management
• https://github.com/chawat/BankingApp
• https://github.com/chiragchevli/Animal_Species_Repository_System
• https://github.com/dancancro/great-big-example-application
• https://github.com/dbs/postgresql-full-text-search-engine
• https://github.com/dushan14/books-store
• https://github.com/jai-singhal/croma
• https://github.com/carlalmadureira/Django-PostgreSQL-Application

More examples will be posted on Canvas 32


Grading Appeal
• Any questions, corrections, or appeals on grades of programs or tests
must be done in writing within one week after it has returned to the
class. State the problem and the rationale for any change in your
grade in your appeal.
Honor Codes
• For the 4 individual assignments, you should not copy other people’s
solutions, or use solutions that you find in the internet
• For the group project, you should not use other group’s project, past
year’s projects for this or other courses or projects you find in the
internet
• Once we identify such plagiarism behavior, we have no problem to fail
you in this course
• Generative AI is not allowed!

34
Warning: You fail if you miss any
assignment or get lower than 50% in
an exam

35
Please refer to the Syllabus for
more details. The due dates may
subject to adjustments.
A History of Database Systems

37
1961: “We choose to go to moon”

38
1961: “We choose to go to moon”
• Apollo needs an automated system to manage the purchase
information of a large number of rocket parts and IBM took the job.

39
Why a flat file strawman doesn’t work?
• Why not simply writing all of the information into a flat file that sit
in local disk?

40
Why a flat file strawman doesn’t work?
• Why not simply writing all of the information into a flat file that sit
in local disk?
• Anyone volunteer to share your opinion?

41
Why a flat file strawman doesn’t work?
• Integrity: when multiple parties modify the file, how to ensure
integrity and correctness?
• Implementation: each time when we need to read, search, or update
some information to the file, we need implement an application from
scratch?
• Durability: how to handle crashes and failures?
• Security: how to ensure security access to different information in the
file for different users, that is how to enforce access control policies
for different users?

42
1966: IBM Information Management System
(IMS)
• IBM designed IMS with Rockwell and Caterpillar
• To inventory the very large bill of materials and keep track of
purchase orders for the Saturn V moon rocket and Apollo space
vehicle
• Hierarchical data model

43
Hierarchical Data Model
• It stores data as a tree, so parent nodes contain pointers to children nodes

Schema Instances

44
Hierarchical Data Model
• However, this approach doesn’t consider how to avoid redundant information
shared by instances.
Schema Instances

Redundancy 45
Hierarchical Data Model

OR

Lack Data Independence: Performance depends on choice of hierarchy


46
1969: CODASYL Data Model
• Based on GE’s Integrated Data Store (IDS) released
in 1964
• Advocates for a network data model
• IDS’ developer Charles Bachman won Turing
Award in 1973

47
Network Data Model

48
Network Data Model

49
Network Data Model

Complex queries
Easily corrupted

50
1970s: Relational Model
• Edgar Frank Codd was a mathematician working
at IBM Research. He saw developer represent a
query like “Find the employees who earn more
than their managers.” in CODASYL using complex
code that were five pages long that would
navigate through this labyrinth of pointers and
stuff.
• He propose relational model as a new database
abstraction, which is still in wide use today and
won him a Turing award in 1981

51
Relational Data Model: Conceptual Schema

52
Relational Data Model: Relations

•A database is a collection of relations (or tables)


•Each relation has a set of attributes (or columns)
•Each attribute has a name and a domain(or type)
•Each relation contains a set of tuples(or rows)

In each relation, a
set of attributes is
the key, if the key
minimally and
uniquely identifies
a tuple.
53
Queries in Relational Model
• Examples:
• Give me the names of all Suppliers, whose qty for at least one part is larger
than 10
• Give me the names, and states of all Suppliers, whose price for part with
pno=999 is lower than 100
• Give me the names of all parts, which has more than one supplier
• Insert one Supplier, whose name is XXXX, city is XXXX, state is XXXX
• Remove a Supplier whose sno is 1001, and remove all part information
related with this Supplier
• Update the price with sno=1001, and pno=999 to 200

54
Queries in Relational Model
• Query Language
• Relational Calculus (a formal language based on mathematical logics)
• Relational Algebra (based on a collection of operators for manipulating relations)
• SQL (Structured Query Language for Relational Algebra)
• Data Manipulation Language (DML)
• Insert, delete, update, query relations
• Query Language is part of DML
• Data Definition Language (DDL)
• Define relations and schemas

55
Example
• Give me the names of all Suppliers, whose qty for at least one part is
larger than 10

Select Suppliers.Name
from Suppliers, Supplies
Where Suppliers.sno == Supplies.sno AND Supplies.qty > 10

56
Relational Data Model: Physical Schema

• Store all relations as unsorted files,


• Create index on the SUPPLIER relations’ sno column, and on the
SUPPLY relation’s qty column

57
Relational Data Model
• Main concept: Queries
• relation: basically a table with rows and
columns
• every relation has a conceptual schema, Tables
which describes the logical structure
(columns, primary keys, foreign keys etc.)
Storage
• every relation also has a physical schema,
which describes how the relation is stored
(sorting, indexing, partitioning,
column/row storage)
• every relation can have many external
schema (views)

58
Benefits of Relational Model: Data
Independence
• A query only specifies what to compute, and not how to compute.
• The database management system (DBMS) will be responsible for parsing the
query, evaluating the query, optimizing the query execution plan, and execute
the query.
• Logical schema is decoupled with physical schema
• Changing the physical schema do not require any changes in the logical
schema, and the applications (queries)

59
Then there were years’ debate about which data model is better,
CODASYL or Relational…
Relational Model is hard to implement!
CODASYL is too complex! Most applications can work with a-tuple-at-
Set-oriented queries in CODASYL is too difficult to develop! a-time queries.
CODASYL lacks a theoretical foundation! Set-oriented queries are not popular.

60
Early implementations of relational DBMS

Jim Gray Michael Stonebraker Larry Ellison 61


Early implementations of relational DBMS
• Systems R – IBM Research (Jim Gray, 1998 Turing Award Winner)
• INGRES – U.C. Berkeley (Michael Stonebraker, 2014 Turing Award Winner)
• Oracle – Larry Ellison

Jim Gray Michael Stonebraker Larry Ellison 62


1980s: Relational Models wins!
• “SEQUEL” becomes the standard
(SQL)
• Many new “enterprise” DBMSs,
but Oracle wins the market place

63
1990s: Boring Days
• No major advancements in database systems or application
workloads
• Microsoft creates SQL server based on Sybase
• An open source database MySQL is written
• Postgres gets SQL support
• SQLite started in early 2000

65
2000s: Internet Boom
• All the big players were heavyweight
and expensive
• Open-source databases were missing
important features
• Companies wrote their own
middleware to scale out database
across single-node DBMS instances

66
• Focus on high-availability & high-scalability:

2000s: NoSQL
• Schemaless
• Non-relational data model: document, key/value, etc.

Systems • No ACID transactions


• Custom APIs instead of SQL
• Usually open-source

67
2000s: Data Warehouses
• Rise of special purpose OLAP DBMSs
• Distributed/Shared-Nothing
• Relational/SQL
• Closed-source
• Column-storage

68
2010s: Big Data/IoT Era
• Numerous specialized database systems
• NewSQL
• Hybrid Transactional-Analytical Processing
• Cloud database
• Shared-disk database (on distributed storage)
• Graph database
• Timeseries database
• Embedded DBMSs
• Multi-model DBMSs
• Blockchain DBMSs
• Hardware Acceleration

69
Today: AI + Database
• Database with a Natural Language interface?
• Database for retrieving contexts for Large Language Model (LLM)
tasks?

70
https://dbdb.io

71
54

2 0 10 s SPEC I ALI Z ED SYST EM S

Embedded DBMSs
A lot of systems are still
Multi-Model DBMSs
extending SQL interfaces
and relational models
Blockchain
Even the NoSQL DBMSs
systems
Hardware Acceleration
use a lot of RDBMS ideas

15-721 (Spring 2020) 72


Reading
• Required
• SKS 1.1~1.10
• Optional
• Stonebraker, Michael, and Joey Hellerstein.
"What goes around comes around." Readings
in database systems 4 (2005): 1724-1735.
• Making Databases Work

73
TO DO List
1. Reading

2. Take a look at example projects

3. Finish grouping in one week

4. Finish Assignment 0 in 1-2 weeks

5. Please think about what project you want to do, write it down (no need to submit it to
me), and send it to your team members once the group formation are announced.

74
Next Class
• We will learn basics of relational algebra, which is the foundation of
SQL, and ER diagram that you need for your group project phase 1

75

You might also like