100% found this document useful (3 votes)
2K views

First Course in Database Systems 3rd

first course in db systems pdf free download

Uploaded by

Juan Murillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (3 votes)
2K views

First Course in Database Systems 3rd

first course in db systems pdf free download

Uploaded by

Juan Murillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 487
Table of Contents 1 The Worlds of Database Systems 1.1. The Evolution of Database Systems .......-- 1.1.1 Early Database Management Systems . . . 1.1.2 Relational Database Systems 1.1.3. Smaller and Smaller Systems . . 1.14 Bigger and Bigger Systems . . . 1.2. The Architecture of aDBMS ...... 1.2.1 Overview of DBMS Components 1.2.2 The Storage Manager 1.2.3. The Query Manager 1.2.4 The Transaction Manager . 1.2.5. Client-Server Architecture 1.3 The Future of Database Systems . . 1.3.1 Types, Classes, and Objects 1.3.2 Constraints and Triggers 1.3.3 Multimedia Data 1.3.4 Data Integration... . . 1.4 Outline of the Book 14.1 Design 1.4.2 Programming . . . 1.5 Summary of Chapter 1. . 1.6 References for Chapter 1 . 2 Database Modeling 25 2.1 Introduction to ODL... ...-- 26 2.1.1 Object-Oriented Design - . 27 2.1.2 Interface Declarations . . . css 2 2.1.3 Attributes in ODL... - cee eee 21.4 Relationships in ODL ....-. +++ -2 255+ . 30 2.1.5 Inverse Relationships. . . . « Severe eee BL 2.1.6 Multiplicity of Relationships . . 2. 83 2.1.7 Types in ODL . . eens pee eee 36 218 Exercises for Section 21.0... 0202s eee ees 88 ul 22 23 24 25 2.6 27 28 29 TABLE OF CONTENTS Entity-Relationship Diagrams - 40 2.2.1 Multiplicity of £/R Relationships . . Al 2.22 Multiway Relationships . 42 2.2.3 Roles in Relationships ............- .....8 2.2.4 Attributes on Relationships - . 4 22.5 Converting Multiway Relationships to Binary . . 46 2.26 Exercises for Section 22.......... 48 Design Principles . 50 2.3.1 Faithfulness 50 23.2 Avoiding Redundancy 51 23.3. Simplicity Counts 51 2.3.4 Picking the Right Kind of Element . 52 2.3.5 Exercises for Section 2.3.......... 55 Subclasses . 8T 24.1 Subclasses in ODL... « 87 242 Multiple Inheritance in ODL. 58 24.3 Subclasses in Entity-Relationship Diagrams . 60 2.4.4 Inheritance in the E/R Model . . - 60 24.5 Exercises for Section 24... - 62 The Modeling of Constraints . 63 251 Keys....... . 64 25.2 Declaring Keys in ODL . 66 2.5.3 Representing Keys in the E/R Model . 67 25.4 Single-Value Constraints... . 67 2.5.5 Referential Integrity . . 68 256 Referential Integrity in E/R Diagrams . .. 2.5.7 Other Kinds of Constraints... . . 2.5.8 Exercises for Section 2.5 . . Weak Entity Sets... 2.0... 2.6.1 Causes of Weak Entity Sets. . . 2.6.2 Requirements for Weak Entity Sets . 2.6.3 Weak Entity Set Notation . 2.6.4 Exercises for Section 2.6 Models of Historical Interest . . 2.7.1 The Network Model. : 2.7.2 Representing Network Schemas . . 2.7.3. The Hierarchical Model. 2.7.4 Exercises for Section 2.7 . Summary of Chapter 2. . References for Chapter 2 TABLE OF CONTENTS x $ The Relational Data Model 85 3.1 Basics of the Relational Model. . . 8 Attributes . 86 Schemas : . 86 Tuples . . 87 Domains. . . . : . 87 Equivalent Representations of a Relation - 88 .1.6 Relation Instances ... . . . 8 3.1.7 Exercises for Section 3.1 . : . 90 32. From ODL Designs to Relational Designs. : + OL 32.1 From ODL Attributes to Relational Attributes 92 Nonatomic Attributes in Classes . . . : . 92 Representing Other Type Constructors... . . 96 Representing Single-Valued Relationships . . . . 97 Representing Multivalued Relationships . . . . . 99 What If There Is No Key?....-.- 2. -- = 100 Representing a Relationship and Its Inverse . . = 101 .2.8 Exercises for Section 3.2.2... . = 102 3.3. From E/R Diagrams to Relational Designs . . . 103 3.3.1 From Entity Sets to Relations... . . - 104 3.3.2 From E/R Relationships to Relations - 106 3.3.3 Handling Weak Entity Sets... . . - 108 3.3.4 Exercises for Section 3.3. ML 3.4 Converting Subclass Structures to Relations»... . - 2 3.4.1 Relational Representation of ODL Subclasses . . - 113 3.4.2 Representing Isa in the Relational Model 14 3.4.3 Comparison of Approaches us 3.4.4 Using Null Values to Combine Relations»... ..... - 115 3.4.5 Exercises for Section 3.4 - 16 3.5 Functional Dependencies . . . - 18 3.5.1 Definition of Functional Dependency « 118 3.5.2. Keys of Relations +12 3.5.3 Superkeys...... - 122 3.5.4 Discovering Keys for Relations . . . = 122 3.5.5 Keys for Relations Derived from ODL - 124 3.5.6 Exercises for Section 3.5 . - 126 3.6 Rules About Functional Dependencies - 126 3.6.1 The Splitting/Combining Rule - 127 3.6.2 ‘Trivial Dependencies = 128 36.3 Computing the Closure of Attributes . . + 129 36.4 The Transitive Rule + 132 3.6.5 Closing Sets of Functional Dependencies + 134 36,6 Exercises for Section 3.6 . = 135 3.7. Design of Relational Database Schemas - - 137 3.7.1 Anomalies... ...%-- . - 138 TABLE OF CONTENTS 3.7.2 Decomposing Relations. + 138 3.7.3 Boyce-Codd Normal Form . . 140 3.74 Decomposition into BCNF . M2 3.7.5 Projecting Functional Dependencies... - .. M7 3.7.6 Recovering Information from a Decomposition .. M48 3.27 Third Normal Form ........ .. 151 3.7.8 Exercises for Section 3.7 . . - 154 3.8 Multivalued Dependencies... . . +155 3.8.1 Attribute Independence and Its Consequent Redundancy 156 3.82 Definition of Multivalued Dependencies. +. 157 3.83 Reasoning About Multivalued Dependencies . - 189 3.84 Fourth Normal Form... . - 161 3.8.5 Decomposition into Fourth Normal Form .. 1 3.8.6 Relationships Among Normal Forms - 163 3.8.7 Exercises for Section 3.8....... - 164 3.9 An Example Database Schema . . . 166 3.10 Summary of Chapter 3... .... + + 169 3.11 References for Chapter 3 . . 171 Operations in the Relational Model 173 41 An in Algebra of Relational Operations . . 178 1 Set Operations on Relations 2 Th Projection . Lee - +176 Selection 17 Cartesian Product 178 Natural Joins .. 179 ‘Theta-Joins : . » 180 Combining Operations to Form Queries . | . 182 Renaming + 185 Dependent and Independent Operations ++ 185 4.1.10 Exercises for Section 4.1 - 187 4.2 A Logic for Relations : - 194 Predicates and Atoms . - 194 Arithmetic Atoms... . . +. 195 Datalog Rules and Queries - 196 Meaning of Datalog Rules . 197 Extensional and Intensional Predicates. + = 200 2.6 Exercises for Section 4.2 . . ~ . 200 4.3 From Relational Algebra to Dataog : +201 4.3.1 Intersection : += 201 432 Union . . += 201 4.3.3 Difference ~ 202 4.3.4 Projection . . 202 43.5. Selection + = 203 43.6 Product » 205 TABLE OF CONTENTS x 44 45 46 aT 43.7 Joins... .. =. 205 438. Simulating Multiple Operations with Datalog = 207 4.3.9 Exercises for Section 4.3 . = 208 Recursive Programming in Datalog = 209 4.4.1 The Fixedpoint Operator . . 2 4.4.2 Computing the Least Fixedpoint . 2h 4.4.3 Fixedpoint Equations in Datalog 213 4.44 Negation in Recursive Rules . . 218 4.4.5 Exercises for Section 4.4 221 Constraints on Relations . 223 45.1 Relational Algebra as a Constraint Language 4.5.2 Referential Integrity Constraints : 4.5.3 Additional Constraint Examples 4.5.4 Exercises for Section 4.5 Relational Operations on Bags 4.6.1 Why Bags? . . . 4.62 Union, Intersection, and Difference of Bags 4.63. Projection of Bags . 4.6.4. Selection on Bags . 4.6.5 Product of Bags 4.6.6 Joins of Bags . 46.7 Datalog Rules Applied to Bags - 4.6.8 Exercises for Section 4.6 . . . . Other Extensions to the Relational Model 4.7.1 Modifications . . 4.7.2. Aggregations 4.73 Views ...... 4.74 Null Values... 225 227 229 229 - 239 239 4.8 Summary of Chapter 4. 240 4.9. References for Chapter 4 241 5 The Database Language SQL 243 5.1. Simple Queries in SQL . . . . = 244 5.1.1 Projection in SQL 2 245 5.1.2 Selection in SQL .. . . » 247 5.1.3 Comparison of Strings... . = » - 248 5.1.4 Comparing Dates and Times . +. 251 5.1.5 Ordering the Output... . - 251 5.1.6 Exercises for Section 5.1. . . 252 52 Queries Involving More than One Relation... . . - - .. 284 5.2.1 Products and Joins in SQL . . +. 254 5.2.2 Disambiguating Attributes + 255 5.2.3 Tuple Variables . = 256 5.24 Interpreting Multirelation Queries . . 257 5.2.5 Union, Intersection, and Difference of Queries + 260 Ml 53 54 55 5.6 5.7 58 59 5.10 Recursion in SQL3 TABLE OF CONTENTS 5.2.6 Exercises for Section 5.2 . = 262 Subqueries. . . + 263 5.3.1 Subqueries that Produce Scalar Values - = 264 5.3.2 Conditions Involving Relations . + 265 5.3.3 Conditions Involving Tuples . . . - 266 5.34 Correlated Subqueries - 267 5.3.5 Exercises for Section 5.3. - 269 Duplicates . . . 270 5.4.1 Eliminating Duplicates 27 5.4.2 Duplicates in Unions, Intersections, and Differences. . . 271 5.4.3 Exercises for Section 5.4 . seeeeee Aggregation... 2.0.2... 5.5.1 Aggregation Operates 5.5.2 Grouping . . 5.5.3 HAVING Clauses 5.54 Exercises for Section 5.5 - Database Modifications . . . 5.6.1 Insertion. . . 5.6.2 Deletion... 5.63 Updates . 5.64 Exercises for Section 5.6... Defining a Relation Schema in SQL . 5.7.1 Data Types... . 5.7.2 Simple Table Declarations . 5.7.3 Deleting Tables... . tae 5.74 Modifying Relation Schemas. . 5.7.5 Default Values . : 5.7.6 Domains . 5.7.7 Indexes 5.7.8 Ex ‘View Definitions 5.8.1 Declaring Views 5.8.2 Querying Views... . 5.8.3 Renaming Attributes . 5.8.4 Modifying Views . . . 5.8.5 Interpreting Queries Involving Views - 5.8.6 Exercises for Section 5.8 . . Null Values and Outerjoins . 5.9.1 Operations on Nulls ...... 5.9.2. The Truth-Value UNKNOWN... . 5.9.3 SQL2 Join Expressions 5.9.4 Natural Joins... . . 5.9.5 Outerjoins . . 5.9.6 Exercises for Section 5.9 . SESES28SE SRR SRR SERREERERES TIS SIE 2 2 & TABLE OF CONTENTS Mm 5.10.1 Defining IDB Relations in SQL3 ++ 313 5.10.2 Linear Recursion . . . . - 316 5.10.3 Use of Views in With-Statements - .. 317 5.10.4 Stratified Negation : -- 318 5.10.5. Problematic Expressions in Recursive SQL3 . »- 319 5.10.6 Exercises for Section 5.10 . «322 5.11 Summary of Chapter 5 . 324 5.12 References for Chapter 5 326 © Constraints and Triggers in SQL 327 6.1 KeysinSQL ..... . ae oe . - 328 6.1.1 Declaring Keys » 328 6.1.2 Enforcing Key Constraints... . +. 330 6.13 Exercises for Section 6.1... ... « + 331 6.2 Referential Integrity and Foreign Keys - 331 6.2.1 Declaring Foreign-Key Constraints - 331 622 Maintaining Referential Integrity . 333 6.2.3 Exercises for Section 6.2... . - 335, 6.3 Constraints on the Values of Attributes - 336 6.3.1 Not-Nuil Constraints . 337 6.3.2 Attribute-Based CHECK Constraints . - 337 6.3.3 Domain Constraints... . 339 6.3.4 Exercises for Section 6.3 . - 340 64 Global Constraints . . . 341 6.4.1 Tuple-Based CHECK Constraints 341 64.2 Assertions . - 342 6.4.3 Exercises for Section 6.4 « - 46 6.5 Modification of Constraints - 348 6.5.1 Giving Names to Constraints . |... 348 6.5.2 Altering Constraints on Tables . . . - 349 6.5.3 Altering Domain Constraints . + 350 6.5.4 Altering Assertions ... . . - 351 6.5.5 Exercises for Section 6.5 . . . - 351 66 Triggers in SQL3 . . . . - 352 6.6.1 Triggers and Constraints . . . +. 352 66.2 SQL3 Triggers ........ = 353 6.6.3 Assertions in SQL3... . . . = 356 6.6.4 Exercises for Section 6.6 . . . . 357 6.7 Summary of Chapter 6 . . . 359 6.8 References for Chapter 6 . . + 360 7 System Aspects of SQL 361 7.1 SQL in a Programming Environment : 7.1.1 The Impedance Mismatch Problem . . . 7.1.2 The SQL/Host Language Interface . . XW TABLE OF CONTENTS ‘The DECLARE Section. ........ 364 Using Shared Variables... . +. 865 Single-Row Select Statements . - 366 Cursors . . 367 Modifications by Cursor . . - 370 Cursor Options . 370 7.1.9 Ordering Tuples for Fetching... . .. 371 7.1.10 Protecting Against Concurrent Updates ses - . 372 7.111 Scrolling Cursors... 2.0... ». 373 7.112 Dynamic SQL... . ». 874 7.1.13 Exercises for Section 7.1 . - 875 7.2 Transactions in SQL . 378 7.21 Serializabilit 378 7.22 Atomicity . 380 Transactions . . . . - 382 Read-Only Transactions . . 384 Dirty Reads... ..... - 385 Other Isolation Levels . . . . 387 Exercises for Section 7.2 . . » 388 7.3 The SQL Environment . . - 389 7.3.1. Environments . . +. 389 7.3.2. Schemas . - 391 7.33 Catalogs . : » 392 Clients and Servers in the SQL Environment. +. 392 Connections . + - 393 .6 Sessions . » - 394 7.3.7 Modules . cee BOB 74 Security and User Authorization in SQL2 .. 1s... 2s 395 7.4.1 Privileges eet e tee eee + - 396 74.2. Creating Privileges - 397 743. The Privilege-Checking Process - - 398 7.44 Granting Privileges . - 399 74.5 Grant Diagrams . 401 7.4.6 Revoking Privileges seen eee ee 402 7.4.7 Exercises for Section 74. sae oan . 407 7.5 Summary of Chapter 7 . . 408 7.6 References for Chapter7..........- - 410 ‘Object-Oriented Query Languages 411 81 Query-Related Features of ODL . eee eee ee M12 8.1.1 Operations on ODL Objects ~- 412 8.1.2 Declaring Method Signatures in ODL 412 8.13 B14 8.2 Introduction toOQL....... ‘The Extent of a Class. . . : 2+. 415 Exercises for Section 8.1 . . 415 417 INTENTS Ww An Object-Oriented Movie Example +. 419 The OQL Type System ...... +. 419 Path Expressions. . . . ~ 420 Select-From-Where Expressions in OQL |... - 421 Eliminating Duplicates... . . . . wee 423 Complex Output Types = 423 Subqueries ...... = 424 .2.8 Ordering the Result... . . » 425 8.2.9 Exercises for Section 8.2... . . 426 8.3 Additional Forms of OQL Expressions . 27 8.3.1 Quantifier Expressions . . . 427 8.3.2 Aggregation Expressions 8.3.3 Group-By Expressions . . . 8.3.4 HAVING Clauses 8.3.5 Set Operators... . 8.3.6 Exercises for Section 8.3 . 8.4 Object Assignment and Creation in OQL . .. . » 434 8.4.1 Assigning Values to Host-Language Variables... 434 8.4.2 Extracting Elements of Collections ............. 434 8.4.3 Obtaining Each Member of a Collection ........ . . 435 84.4 Creating New Objects... .... test ces. = 06 84.5 Exercises for Section 8.4 . . fe 488 8.5 Tuple Objects in SQL3.....-..-.-- tees. OO 8.5.1 Row Types . . 439 85.2 Declaring Relations with a Row Type 440 8.5.3 Accessing Components of a Row Type - 44 8.5.4 References... ....-..5- 441 8.5.5 Following References 442 8.5.6 Scopes of References... . . 444 8.5.7 Object Identifiers as Values 2. 445, 8.5.8 Exercises for Section 8.5 . ~ 448 86 Abstract Data Types in SQLS . - 449 8.6.1 Defining ADT’s... . . 450 8.6.2 Defining Methods for ADT's + 453 86.3 External Functions ........ ve 456 8.6.4 Exercises for Section 86 ..-..- +--+ - - 456 8.7 A Comparison of the ODL/OQL and SOL Apprea +. 458 8.8 Summary of Chapter 8 . 459 89 References for Chapter 8 . . 460 Index 463 Chapter 1 The Worlds of Database Systems In this book the reader will learn the effective use of database management systems, including the design of databases and the programming of operations on databases. This chapter serves to introduce a number of important database concepts. After a brief history of the subject, we learn what makes database systems different from other software genres. This chapter also provides back- ground concerning the implementation of the datsbase management systems that support databases and their use. An understanding of what goes on “be- hind the scenes” is important if we are to have an appreciation of why databases are designed as they are or why there are limits on the way operations can be performed on databases. Finally, we review some ideas, such as object-oriented programming, with which the reader may be familiar but that are essential in the chapters to follow. 1.1 The Evolution of Database Systems What is a database? In essence a database is nothing more than collection of information that exists over a long period of time, often many years. In common parlance, the term database refers to a collection of data that is managed by a database management system, also called a DBMS, or just database system. A DBMS is expected to: 1. Allow users to create new databases and specify their schema (logical structure of the data), using a specialized language called a data-definition language. 2. Give users the ability to query the data (a “query” is database lingo for ‘a question about the data) and modify the data, using an appropriate language, often called a query language or data-manipulation language. 1 2 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS 3. Support the storage of very large amounts of data — gigabytes or more — over a long period of time, keeping it secure from accident or unauthorized tuse and allowing efficient access to the data for queries and database modifications. 4. Control access to data from many users at once, without allowing the actions of one user to affect other users and without allowing simultaneous accesses to corrupt the data accidentally. 1.1.1 Early Database Management Systems ‘The first commercial database management systems appeared in the late 1960's. These evolved from file systems, which provide some of item (3) above; file systems store data over a long period of time, and they allow the storage of large amounts of data. However, file systems do not generally guarantee that data cannot be lost ifit is not backed up, and they don’t support efficient access to data items whose location in a particular file is not known. Further, file systems do not directly support item (2), a query language for the data in files. Their support for (1) — a schema for the data — is limited to the creation of directory structures for files. Finally, file systems do not satisfy (4). When they allow concurrent access to files by several users or processes, a file system generally will not prevent situations such as two users modifying the same file at about the same time, so the changes made by one user fail to appear in the file. ‘The first important. applications of DBMS’s were ones where data was com- posed of many small items, and many queries or modifications were made. Here are some of these applications. Airline Reservations Systems Here, the items of data include: 1. Reservations by a single customer on a single flight, including such infor- mation as assigned seat or meal preference. 2. Information about flights — the airports they fly from and to, their de- parture and arrival times, or the aircraft flown, for example. 3. Information about ticket prices, requirements, and availability. Typical queries ask for flights leaving about a certain time from one given city to another, what seats are available, and at what prices. Typical data modifications include the booking of a flight for a customer, assigning a seat, or indicating a meal preference. Many agents will be accessing parts of the data at any given time. The DBMS must allow such concurrent accesses, prevent problems such as two agents assigning the same seat simultaneously, and protect. against loss of records if the system suddenly fails. 1.1. THE EVOLUTION OF DATABASE SYSTEMS —. 3 Banking Systems Data items include names and addresses of customers, accounts, loans, and their balances, and the connection between customers and their accounts and loans, e.g., who has signature authority over which accounts. Queries for account balances are common, but far more common are modifications representing a single payment. from or deposit to an account. As with the airline reservation system, we expect that many tellers and customers (through ATM machines) will be querying and modifying the bank’s data at once. It is vital that simultaneous accesses to an account not cause the effect of an ATM transaction to be lost. Failures cannot be tolerated. For example, once the money has been ejected from an ATM machine, the bank must record the debit, even if the power immediately fails. On the other hand, it is not permissible for the bank to record the debit and then not deliver the money because the power fails. The proper way to handle this operation is far from obvious and can be regarded as one of the significant achievements in DBMS architecture. Corporate Records Many early applications concerned corporate records, such as a record of each sale, information about accounts payable and receivable, or information about employees — their names, addresses, salary, benefit options, tax status, and so on. Queries include the printing of reports such as accounts receivable or employees’ weekly paychecks. Each sale, purchase, bill, receipt, employee hired, fired, or promoted, and so on, results in a modification to the database. ‘The early DBMS’s, evolving from file systems, encouraged the user to visu- alize data much as it was stored. These database systems used several different data models for describing the structure of the information in a database, chief among them the “hierarchical” or tree-based model and the graph-based “net- work” model. The latter was standardized in the late 1960's through a report of CODASYL (Committee on Data Systems and Languages).! We shall introduce the reader to both the network and hierarchical models in Section 2.7, although today they are only of historical interest. ‘A problem with these early models and systems was that they did not sup- port high-level query languages. For example, the CODASYL query language had statements that allowed the user to jump from data element to data ele- ment, through a graph of pointers among these elements. There was consider- able effort needed to write such programs, even for very simple queries. TCODASYL Data Base Task Group April 1971 Report, ACM, New York. 4 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS 1.1.2 Relational Database Systems Following a famous paper written by Ted Codd in 19702 database systems changed significantly. Codd proposed that database systems should present the user with a view of data organized as tables called relations. Behind the scenes, there might be a complex data structure that allowed rapid response to a variety of queries. But, unlike the user of earlier database systems, the user of a relational system would not be concerned with the storage structure. Queries could be expressed in a very high-level language, which greatly increased the efficiency of database programmers. ‘We shall cover the relational model of database systems throughout most of this book, starting with the basic relational concepts in Chapter 3. SQL (Structured Query Language), the most important query language based on the relational model, will be covered starting in Chapter 5. However, a brief introduction to relations will give the reader a hint of the simplicity of the model, and an SQL sample will suggest how the relational model promotes queries written at a very high level, avoiding details of “navigation” through the database. Example 1.1: Relations are tables. Their columns are headed by attributes, which describe the entries in the column. For instance, a relation named Accounts, recording bank accounts, their balance, and type might look like: accountNo | balance _| type 12345 1000.00 | savings 67890 2846.92 | checking Heading the columns are the three attributes: accountWo, balance, and type. Below the attributes are the rows, or tuples. Here we show two tuples of the relation explicitly, and the dots below them suggest that there would be many more tuples, one for each account at the bank. The first tuple says that account number 12345 has a balance of one thousand dollars, and it is a savings account. ‘The second tuple says that account 67890 is a checking acount with $2846.92. ‘Suppose we wanted to know the balance of account 67890. We could ask this query in SQL as follows: SELECT balance FROM Accounts WHERE accountNo = 67890; For another example, we could ask for the savings accounts with negative bal- ances by: 2Godd, E. F., “A relational model for large shared data banks.” Comm. ACM, 18:6, pp. 377-387. 1.1, THE EVOLUTION OF DATABASE SYSTEMS 5 SELECT accountNo FROM Accounts VHERE type = ‘savings’ AND balance < 0; We do not expect that these two examples are enough to make the reader an expert SQL programmer, but they should convey the high-level nature of the SQL select-from-where statement. In principle, they ask the database system to 1, Examine all the tuples of the relation Accounts mentioned in the FROM- clause, 2. Pick out those tuples that satisfy some criterion indicated in the WHERE- clause, and 3. Produce as an answer certain attributes of those tuples, as indicated in the SELECT-clause. In practice, the system must “optimize” the query and find an efficient way to answer the query, even though the relations involved in the query may be very large. 0 IBM was an early vendor of both relational and prerelational DBMS’s. In addition, new companies were formed to implement and sell relational DBMS's. ‘Today, some of these companies are among the largest software vendors in the world. 1.1.3 Smaller and Smaller Systems Originally, DBMS’s were large, expensive software systems running on large computers. The size was necessary, because to store a gigabyte of data required alarge computer system. Today, a gigabyte fits on a single disk, and it is quite feasible to run a DBMS on a personal computer. Thus, database systems based ‘on the relational model have become available for even very small machines, and they are beginning to appear as a common tool for computer applications, ‘much as spreadsheets and word processors did before them. 1.1.4 Bigger and Bigger Systems On the other hand, a gigabyte isn’t much data. Corporate databases often oc- cupy hundreds of gigabytes. Further, as storage becomes cheaper people find new reasons to store greater amounts of data. For example, retail chains often store a terabyte (1000 gigabytes, or 10! bytes) or more of information recording the history of every sale made over a long period of time (for planning inven- tory; we shall have more to say about this matter in Section 1.3.4). Databases no longer focus on storing simple data items such as integers or short character strings. They can store images, audio, video, and many other kinds of data that 6 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS take comparatively huge amounts of space. For instance, an hour of video con- suimes about a gigabyte. Databases storing images from satellites are expected, by the year 2000, to hold several petabytes (1000 terabytes, or 10° bytes). Handling such large databases required several technological advances. For example, databases of modest size are today stored on arrays of disks, which are called secondary storage devices (compared to main memory, which is “primary” storage). One could even argue that what distinguishes database systems from other software is, more than anything else, the fact that database systems routinely assume data is too big to fit in main memory and must be located primarily on disk at all times. The following two trends allow database systems to deal with larger amounts of data, faster. Tertiary Storage The largest databases today require more than disks. Several kinds of tertiary storage devices have been developed. Tertiary devices, perhaps storing a tera- byte each, require much more time to access a given item than does a disk. While typical disks can access any item in 10-20 milliseconds, a tertiary device may take several seconds. Tertiary storage devices involve transporting an object, upon which the desired data item is stored, to a reading device. This movement is performed by a robotic conveyance of some sort. For example, compact disks (CD's) may be the storage medium in a tertiary device. An arm mounted on a track goes to a particular CD, picks it up, carries it to a CD reader, and loads the CD into the reader. Paralle] Computing The ability to store enormous volumes of data is importast, but it would be of little use if we could not access large amounts of that data quickly. Thus, very large databases also require speed enhancers. One important speedup is through index structures, which we shall mention in Sections 1.2.1 and 5.7.7. Another way to process more data in a given time is to use parallelism. This parallelism manifests itself in various ways. For example, since the rate at which data can be read from a given disk is fairly low, a few megabytes per second, we can speed processing if we use many disks and read them in parallel (even if the data originates on tertiary storage, it is “cached” on disks before being accessed by the DBMS). These disks may be part of an organized parallel machine, or they may be components of a distributed system, in which many machines, each responsible for a part of the database, communicate over a high-speed network when needed. ‘Of course, the ability to move data quickly, like the ability to store large amounts’ of data, does not by itself guarantee that queries can be answered quickly, We still need to use algorithms that break queries up in ways that allow parallel computers or networks of distributed computers to make effective ‘use of all the resources. Thus, parallel and distributed management, of very large databases remains an active area of research and development. 1.2, THE ARCHITECTURE OF A DBMS 7 1.2 The Architecture of a DBMS In this section, we shall sketch the structure of a typical database management system. We shall also look at what the DBMS does to process user queries and other database operations. Finally, we shall consider some of the problems that come up in designing a DBMS that can maintain large amounts of data and process a high rate of queries. The technology for implementing a DBMS is not the subject of this book, however; we concentrate on how databases are designed and used effectively. Schema . Modifications Queries Modifications Figure 1.1: Major components of a DBMS 1.2.1 Overview of DBMS Components Figure 1.1 shows the essential parts of a DBMS. At the bottom, we see a representation of the place where data is stored. By convention, disk-shaped components indicate a place for storage of data. Note that we have indicated that this component contains not only data, but metadata, which is information about the structure of the data. For example, if this DBMS is relational, the metadata includes the names of the relations, the names of the attributes of, those relations, and the data types for those attributes (e.g., integer or character 8 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS How Indexes Are Implemented ‘The reader may have learned in a course on data structures that a hash table is a very efficient way to build an index. Early DBMS’s did use hash tables extensively. Today, the most common data structure is called a B-tree; the “B” stands for “balanced.” A B-tree is a generalization of a balanced binary search tree. However, while each node of a binary tree hhas up to two children, the B-tree nodes have a large number of children. Given that B-trees normally appear on disk rather than in main memory, the B-tree is designed so that each node occupies a full disk block. Since typical systems use disk blocks on the order of 2'? bytes (4096 bytes), there can be hundreds of pointers to children in a single block of a B-tree. ‘Thus, search of a B-tree rarely involves more than three levels. The true cost of disk operations generally is proportional to the num- ber of disk blocks accessed. Thus, searches of a B-tree, which typically examine only three disk blocks, are much more efficient than would be a binary-tree search, which typically visits nodes found on many different disk blocks. This distinction, between B-trees and binary search trees, is but one of many examples where the most appropriate data structure for data stored on disk is different from the data structures used for algorithms that run in main memory. string of length 20). Often, a DBMS maintains inderes for the data. An index is a data structure that. helps us find data items quickly, given a part of their value; the most common example is an index that will find those tuples of a particular relation that have a given value for one of the attributes. For instanee, a relation storing account numbers and balances might have an index on account-number, so that we can find the balance, given an account number, quickly. Indexes are part of she stored data, and a description of which attributes have indexes is part of the metadata. In Fig. 1.1 we also see a storage manager, whose job it is to obtain requested information from the data storage and to modify the information there when requested to by the levels of system above it. We also see a component that we have called the query processor, although that name is somewhat of a misnomer. It handles not only queries but requests for modification of the data or the metadata. Its job is to find the best way to carry out a requested operation and to issue commands to the storage manager that will carry them out. The transaction manager component is responsible for the integrity of the system. It must assure that several queries running simultaneously do not interfere with each other and that the system will not lose data even if there is a system failure. It interacts with the query manager, since it must know what 1.2. THE ARCHITECTURE OF A DBMS 9 data is being operated upon by the current queries (in order to avoid conflicting, actions), and it may need to delay certain queries or operations so that: these conflicts do not occur. It interacts with the storage manager because schemes for protection of data usually involve storing a log of changes to the data, By properly ordering operations, the log will contain a record of changes so that after a system failure even those changes that never reached the disk can be reexecuted. At the top of Fig. 1.1 we see three types of inputs to the DBMS: 1. Queries. These are questions about the data. They are generated in two different ways: {a) Through a generic query interface. For example, a relational DBMS allows the user to type SQL queries that are passed to the query processor and answered. (b) Through application program interfaces. A typical DBMS allows programmers to write application programs that, through calls to the DBMS, query the database. For example, an agent using an airline reservation system is running an application program that queries the database about fight availabilities. The queries are submitted through a specialized interface that might include boxes to be filled in with cities, times, and so on. One cannot ask arbitrary queries through this interface, but it is generally easier to ask an appropriate query through this interface than to write the query directly in SQL. 2. Modifications. ‘These are operations to modify the data. Like queries, they can be issued either through the generic interface or through the interface of an application program. 3. Schema Modifications. These commands are usually issued by authorized personnel, sometimes called database administrators, who are allowed to change the schema of the database or create a new database: For example, when the IRS started requiring banks to report interest payments along with each customer’s Social Security number, a bank may have had to add anew attribute socialSecurityNo to a relation that stored information about customers. 1.2.2 The Storage Manager Ina simple database system, the storage manager might be nothing more than the file system of the underlying operating system. However, for efficiency purposes, DBMS’s normally control storage on the disk directly, at least under some circumstances. The storage manager consists of two components, the buffer manager and the file manager. 10 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS 1. The file manager keeps track of the location of files on the disk and obtains the block or blocks containing a file on request from the buffer manager. Recall that disks are generally divided into disk blocks, which are regions of contiguous storage containing a large number of bytes, perhaps 2!? or 2!4 (about 4000 to 16,000 bytes). 2. The buffer manager handles main memory. It obtains blocks of data from the disk, via the file manager, and chooses a page of main memory in which to store that block. The buffer manager may keep a disk block in main memory for a while, but returns it to the disk if its page of main memory is needed for another block. Pages are also returned to disk when the transaction manager requires it; see Section 1.2.4. 1.2.3. The Query Manager ‘The job of the query manager is to turn a query or database manipulation, which may be expressed at a very high level (eg., as an SQL query), into a sequence of requests for stored data such as specific tuples of a relation or parts of an index on a relation. Often the hardest part of the query-processing task is query optimization, that is, the selection of a good query plan or sequence of requests to the storage system that will answer the query. Example 1.2: Suppose that a bank has a database with two relations: 1. Customers is a table giving, for each customer, their name, Social Security number, and address. 2. Accounts is a table giving, for each account, its account number, balance, and the Social Security number of its owner. Note that each account has ‘a principal owner, whose Social Security number is used for tax-reporting purposes; there may be other owners of an account, but these cannot be known from the two relations given hese. Suppose also that the query “find the balances of all accounts of which Sally Jones is the principal owner” is asked. ‘The query manager must find a query plan to perform on these relations, a plan that will yield the answer to the query. The fewer steps taken to answer the query, the better the query plan is. In general, the costly steps are those in which a disk block is copied from the disk into a page of the buffer pool by the storage manager, or a page is written back conto the disk. Thus, it is reasonable to count only these disk-block operations in evaluating the cost of a query plan. In order to answer the query, we need to examine the Customers relation to find the Social Security number of Sally Jones (we assume there is only one customer with that name, although in practice there could'be several). We then need to examine the Accounts relation to find every account with that Social Security number and print the balances of those accounts. 1.2, THE ARCHITECTURE OF A DBMS u A simple but expensive plan is to examine all the tuples (rows) of the Customers relation until we find one with Sally Jones as the customer name. On average, we shall have to look at half of the tuples before we find the one we want, Since a bank will have many customers, the Customers relation will occupy many disk blocks, and this step will be very expensive. Once we have Sally Jones’ Social Security number, we are not yet done. Now we have to look at the Accounts tuples and find those that have the selected Social Security number. Since there may be several such accounts, we have to look at all the tuples. A typical bank will have many accounts, so the Accounts relation will also occupy many disk blocks. Examining them all will be quite expensive. If there is an index on the customer name for relation Customers, then a better plan exists. Instead of looking at the whole Customers relation, we use the index to find only the disk block containing the tuple for Sally Jones. As we mentioned in the box in Section 1.2.1, a typical B-tree index requires that we look at three disk blocks of the index in order to find what we want.? One more block access gets us the tuple for Sally Jones. Of course we still need to do the second step: finding the accounts with that Social Security number in the Accounts relation. That step will require many disk accesses, typically. However, if there is an index on the Social Security number for relation Accounts, then we can find each of the blocks containing one of the accounts with a given Social Security number by going through this index. To do so, we must make 2 or 3 disk accesses to go through the index, as we discussed for indexed access to the Customers relation. If all the desired tuples are on different disk blocks, then we shall have to access each of these blocks. But there probably aren’t too many accounts for one person, so this step probably uses only a few disk accesses. If these two indexes exist, then we can answer the query with perhaps 6-10 disk accesses. If one or both of them do not exist, and we have to use one of the poorer query plans, then the number of disk accesses might be in the hundreds or thousands, as we scan an entire, large relation. © Tt might appear from Example 1.2 that all there is to query optimization is to use indexes if they exist. In fact, there is a great deal more to the subject. Complex queries often allow us to reorder operations, and there may be a very large number of possible query plans, often exponentially many in the size of the query. Sometimes we have a choice of two indexes to use, but we cannot use both. A study of this important part of DBMS implementation is beyond the scope of this book. 1.2.4 The Transaction Manager [As we discussed in Section 1.1, there are some special guarantees that a DBMS must make to those performing operations on a database. For example, we “in fact, since the root node of the B-tree is used in every search involving that index, its block is often found in main memory, occupying one of the buffer pages, so two disk-block accesses usually suffice. 12 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS discussed the importance that the effect of an operation never be lost, even in the face of a severe system failure. The typical DBMS allows the user to group one or more queries and/or modifications into a transaction, which infor- mally is a group of operations that must appear to have been executed together sequentially, as a unit. Frequently, a database system allows many transactions to execute concur- rently; e.g., something may be going on at each of a bank’s ATM machines simultaneously. The role of assuring that all these transactions are executed properly is the job of the transaction manager component of the DBMS. In more detail, “proper” execution of transactions requires what are often called the ACID properties, after the initials of the four principal requirements on transaction execution. These properties are: © Atomicity. We require that either all of a transaction be executed or none of it is. For instance, withdrawal of money from an ATM machine and the associated debit to the customer's account should be a single, atomic transaction. It is not acceptable if the money is dispensed but the debit is not made, or if the debit is made and the money not dispensed. © Consistency. A database generally has a notion of a “consistent state,” in which the data meets any expectations we may have. For example, an appropriate consistency condition for an airline database is that no seat be assigned to two different customers. While this condition might be vio- lated for a brief moment during a transaction, as people are moved among seats, the transaction manager must assure that after transactions have completed, the database satisfies any consistency conditions assumed. ‘¢ Isolation, When two or more transactions run concurrently, their effects must be isolated from one another. That is, we must not see effects caused by the two transactions running at the same time that would not occur if ‘one ran before the other. For instance, when two airline agents are selling seats on the same fight, and only one seat remains, one request should be granted and the other denied. It is unacceptable if the same seat were sold twice or not at all, because of concurrent operations. ‘© Durability. If a transaction has completed its work, its effect should not get lost should the system fail, even if it fails immediately after the trans- action completes. How to implement transactions so that they have the ACID properties could, be the subject of a book itself, and we shall'not attempt to cover the matter here., However, Section 7.2 discusses how, in the language SQL, one specifies the operations that belong in a transaction, and what guarantees the SQL programmer can expect from having grouped operations into transactions. Also, we shall in this section outline very briefly the common techniques for enforcing the ACID properties. 1.2. THE ARCHITECTURE OF A DBMS 13 Granularity of Locks Different DBMS's may differ on what sorts of items have locks. For instance, one might lock individual tuples of a relation, individual disk blocks, or even whole relations. The bigger the thing that has a lock, the more likely one transaction is to have to wait for another, even when the ‘two transactions really don’t access the same data. However, the smaller the lockable item, the larger and more complex the locking mechanism is. Locking ‘The principal cause of nonisolation among transactions is if two or more trans- actions read or write the same item in the database. For example, if two transactions try to write a new balance for the same account at the same time, one will overwrite the other, and the effect of the first. to write will be lost. Thus, in most DBMS's the transaction manager is able to lock the items that the transaction accesses. While one transaction has a lock on an item, other transactions cannot. access it. Thus, for example, the first transaction to lock the balance on account 12345 would get both to read it and to write the new value, before another transaction would be allowed to access it. A second trans- action would read the new balance, rather than the old balance, and the two transactions would not interact badly. Logging ‘A “log” of all transactions initiated, the changes to the database caused by each transaction, and the end of each transaction is recorded by the transaction manager. The log is always written to nonvolatile storage, that is, a storage medium like disk where the data will survive a power failure. Thus, while the transaction itself may use volatile main memory for part of its work, the log is always written immediately to disk. Logging of all operations is an important factor in assuring durability. ‘Transaction Commitment For durability and atomicity, transactions are ordinarily done in a “tentative” way, in which the changes to the database are computed but not actually made in the database itself. By the time the transaction is ready to complete, or commit, the changes have been copied to a log. This log record is first copied to disk. Only then are the changes entered into the database itself. Even if the system fails in the middle of the two steps, when the system comes back up we can read the log and see that the changes still need to be made to the database. If the system fails before all changes have been entered in 4 CHAPTER I. THE WORLDS OF DATABASE SYSTEMS the log, we can redo the transaction, sure that we are not accidentally booking an airline seat twice or debiting a bank account twice, for example. 1.2.5 Client-Server Architecture Many varieties of modern software use a client-server architecture, in which requests by one process (the client) are sent to another process (the server) for execution. Database systems are no exception, and it is common to divide the work of the components shown in Fig. 1.1 into a server process and one or more client processes. In the simplest client server architecture, the entire DBMS is a server, ex- cept for the query interfaces that interact with the user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to the client. The relationship between client and server can get more complex, especially when answers are extremely large. We shall have more to say about this matter in Section 1.3.3. There is also a trend to put more work in the client, since the server will be a bottleneck if there are many simultaneous database users. 1.3 The Future of Database Systems ‘There are many currents in the database stream today, and they lead the dis- cipline in variety of new directions. Some of these are new technologies — object-oriented programming, constraints and triggers, multimedia data, or the World Wide Web, for example — that are changing the nature of conventional DBMS's. Other currents involve new applications, such as warehousing of data or information integration. In this section we give brief introductions to the major trends for future database systems. 1.3.1 Types, Classes, and Objects Object-oriented programming has been widely regarded as a tool for better program organization and ultimately, more reliable software implementation. First popularized in the language Smalltalk, object-oriented programming re- ceived a big boost with the development of C++ and the migration to C++ of much software development that was formerly done in C. More recently, the language Java, suitable for sharing programs across the World Wide Web, has also increased attention on object-oriented programming. The database world has likewise been attracted to the object-oriented paradigm, and several com- panies are selling DBMS's dubbed “object-oriented.” In this section we shall review the ideas behind object orientation. 1.3. THE FUTURE OF DATABASE SYSTEMS 15 The Type System An object-oriented programming language offers the user a rich collection of types. Starting with base types, commonly integers, real numbers, booleans, and character strings, one may build new types by using type constructors. ‘Typically, the type constructors let. us build: 1. Record structures. Given a list of types T1,72,-..,Ta and a corresponding list of field names (called instance variables in Smalltalk) fi, fo,---+ fas one can construct a record type consisting of n components. The ith component has type T; and is referred to by its field name f;. Record structures are exactly what C or C++ calls “structs.” 2. Collection types. Given a type T, one can construct new types by applying a collection operator to type T. Different languages use different collection operators, but there are several common ones, including arrays, lists, and sets. Thus, if T were the base type integer, we might build the collection types “array of integers,” “list of integers,” or “set of integers.” 3. Reference types. A reference to a type T is a type whose values are suitable for locating a value of the type T. In C or C++, a reference is a “pointer” toa value, that is, a location in which is held the virtual-memory address of the value pointed to. The model of pointers is often suitable for understanding references. However, in database systems, where data is stored on many disks, perhaps distributed over many hosts, a reference is of necessity something more complex than a pointer. It might, for example, include the name of a host, a disk number, a block within that disk, and a position within the block where the referenced value is held. Of course, record-structure and collection operators can be applied repe edly to build ever more complex types. For instance, we might define a type that is a record structure with a first component named custoner of type string and whose second component is of type set of integers and named accounts. Such a type is suitable for associating bank customers with the set of their accounts. Classes and Objects A class consists of a type and possibly one or more functions or procedures (called methods; see below) that can be executed on objects of that class. The objects of a class are either values of that type (called immutable objects) or variables whose value is of that type (called mutable objects). For example, if we define a class C whose type is ‘set of integers,” then {2,5,7} is an immutable object of class C,, while variable # could be declared to be of class C and assigned the value {2,5, 7) 16 CHAPTER I. THE WORLDS OF DATABASE SYSTEMS Object Identity Objects are assumed to have an object identity (OID). No two objects can have the same OID, and no object has two different O1D's. The OID is the value that a reference to the object has. We may often think of the OID as a pointer in virtual memory to the object but, as we discussed in connection with reference types, in a database system the OID may actually be something more complex: a sequence of bits sufficient to locate the object on secondary or tertiary memory of any of a large number of different machines. Further, since data is persistent, the OID must be valid for all time, as long as the data exists. Methods Associated with a class there are usually certain functions, often called methods. A method for a class C has at least one argument that is an object of class C; it may have other arguments of any class, including C. For example, associated with a class whose type is “set of integers,” we might. have a method to compute the power set of a given set, to take the union of two sets, of to return a boolean indicating whether or not the set is empty. Abstract Data Types In many cases, classes are also “abstract data types,” meaning that they en- capsulate, or restrict access to objects of the class so that only the methods defined for the class can modify objects of the class directly. This restriction assures that the objects of the class cannot be changed in ways that were not anticipated by the implementor of the class. This concept is regarded as one of the key tools for reliable software development. ‘Class Hierarchies It is possible to declare one class C' to be a subclass of another class D. If so, then class C inherits all the properties of class D, including the type of D and any functions defined for class D. However, C’ may also have additional properties. For example, new methods may be defined for objects of class C, and these may be either in addition to or in place of methods of D. It may even be possible to extend the type of D in certain ways. In particular, if the type of D is a record-structure type, then we can add new fields to this type that are’present only in objects of type C. Example 1.3: Consider a class of bank account objects. We might describe the type for this class informally as: CLASS Account = {accountNo: integer; balance: real; owner: REF Customer; } 1.3. THE FUTURE OF DATABASE SYSTEMS Ww Why Objects? Object-oriented programming offers a number of capabilities of importance in database systems. © Through a rich type system, we can deal with data in forms that are more natural than relations or earlier data models. Note that the relational mode! has a rather restricted type system. Relations are sets of records, and these records have a record structure in which the fields (called “attributes” in the relational model) have base types. © Through classes and a class hierarchy, we can share or reuse software and schemas more readily than with conventional systems. © Through abstract data types we can protect against misuse of our data by preventing access except through some carefully designed functions that are known to use the data properly. ‘That is, the type for the Account class is a record structure with three fields: an integer account number, a real-number balance, and an owner that is a reference to an object of class Customer (another class that we'd need for a banking database, but whose type we have not introduced here). ‘We would also define some methods for the class. For example, we might have a method deposit(a: Account, m: real) that increases the balance for Account object a by amount m. Finally, we might wish to have several subclasses of the Account subclass. For instance, a time-deposit account might have an additional field dueDete, the date at which the account balance may be withdrawn by the owner. There might also be an additional method for the subclass TimeDeposit penalty(a: TimeDeposit) that takes an account a belonging to the subclass TimeDeposit and calculates the penalty for early withdrawal, as a function of the dueDate field in object a and the current date; the latter would be obtainable from the system on which the method is run. © We shall consider object-oriented aspects of database systems extensively in this book. In Section 2.1 we introduce the object-oriented database-design language ODL. Chapter 8 is devoted to object-oriented query languages. There 18 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS wwe cover both the OQL query language that is becoming a standard for object- oriented DBMS's and the proposed object-oriented features for SQL, the stan- dard query language for relational DBMS’s. 1.3.2 Constraints and Triggers Another recent trend in database systems has been the extensive use of active: elements in commercial systems. By “active” we mean that the component of the database is available at all times, ready to execute whenever it becomes appropriate for it to do so. There are two common kinds of active elements found in database systems: 1. Constraints. These are boolean-valued functions whose value is required to be true. For instance, we might place in a banking database the con- straint that a balance cannot be less than 0. A database modification that violated this constraint, such asa withdrawal that would leave the account negative, is rejected by the DBMS. 2. Triggers. A trigger is a piece of code'that waits for an event to occur; possible events are the insertion or deletion of a certain kind of data item. When the event occurs, an associated sequence of actions is executed, or triggered. For instance, an airline reservation system could have'a rule whose condition is triggered when a fight status is changed to cancelled, The action part of the rule might be a query that asks for the phone number of all customers booked on that flight, so these customers may be notified. A more complex action would be to rebook the customers on alternative flights automatically. Active elements are not a new idea. ‘They appeared as “ON-conditions” in the programming language PL/I. They have also appeared in artificial- intelligence systems for many years, and they are akin to “daemons” thet are used in operating systems. However, when the size of the data on which the active elements operate is very large, or the number of active elements is very large, there are severe technical problems in implementing active elements effi- ciently. For that reason, active elements did not appear as standard components of a DBMS until the early 1990's. We discuss active elements in Chapter 6. 1.3.3 Multimedia Data Another important trend in database systems is the inclusion of multimedia data. By “multimedia” we mean information that represents a signal of some sort. Common forms of multimedia data include video, audio, radar signals, satellite images, and documents or pictures in various encodings. These forms have in common that they are much larger than the earlier forms of data — integers, character strings of fixed length, and so on — and of vastly varying sizes. 1.3. THE FUTURE OF DATABASE SYSTEMS 19 ‘The storage of multimedia data has forced DBMS's to expand in several ways. For example, the operations that one performs on multimedia data are not the simple ones suitable for traditional data forms. Thus, while one might search a bank database for accounts that have a negative balance, comparing each balance with the real number 0.0, it is not feasible to search a database of pictures for those that show a face that “looks like” a particular image. Thus, DBMS's have had to incorporate the ability of users to introduce functions of their own choosing, which they may apply to multimedia data. Often, the ‘object-oriented approach is used for such extensions, even in relational systems. The size of multimedia objects also forces the DBMS to modify the storage manager so that objects or tuples of a gigabyte or more can be accommodated. Among the many problems that such large elements present is the delivery of answers to queries. In a conventional, relational database, an answer is a set of tuples. These would be delivered to the client, by the database server as a whole. However, suppose the answer to a query is a video clip a gigabyte long. It is not feasible for the server to deliver the gigabyte to the client as a whole, For ‘one reason it, takes too tong and will prevent the server from handling other requests. For another, the client may want only a small part of the film clip, but doesn’t have a way to ask for exactly what it wants without seeing the initial portion of the clip. For a third reason, even if the client wants the whole clip, perhaps in order to play it on a screen, it is sufficient to deliver the clip at a fixed rate over the course of an hour (the amount of time it takes to play a gigabyte of compressed video). Thus, the storage system of a multimedia DBMS has to be prepared to deliver answers in an interactive mode, passing a piece of the answer to the client on request or at a fixed rate. 1.3.4 Data Integration As information becomes ever more essential in our work and play, we find that existing information resources are being used in many new ways. For instance, consider a company that wants to provide on-line catalogs for al its products, so that people can use the World Wide Web to browse its products and place on- line orders. A large company has many divisions. Each division may have built its own database of products independently of other divisions. These divisions may use different DBMS's, different structures for information, perhaps even different terms to mean the same thing or the same term to mean different things. Example 1.4: Imagine a company with several divisions that manufacture disks. One division’s catalog might represent rotation rate in revolutions per second, another in revolutions per minute. Another might have neglected to represent rotation speed at all. A division manufacturing floppy disks might refer to them as “disks,” while a division manufacturing hard disks might call them “disks” as well, The number of tracks on a disk might be referred to as, “tracks” in one division, but “cylinders” in another. 0 20 CHAPTER 1. THE WORLDS OF DATABASE SYSTEMS Central control is not always the answer. Divisions may have invested large amounts of money in their database long before integration across divisions was recognized as a problem. A division may have been an independent company, recently acquired. For these or other reasons, these so-called legacy databases cannot be replaced easily. Thus, the company must build some structure on lop of the legacy databases to present to customers a unified view of products across the company. One popular approach is the creation of data warehouses, where information from many legacy databases is copied, with the appropriate translation, to a central database. As the legacy databases change, the warehouse is updated, but not necessarily instantaneously updated. A common scheme is for the warehouse to be reconstructed each night, when the legacy databases are likely to be less busy. The legacy databases are thus able to continue serving the purposes for which they were created. New functions, such as providing an on-line catalog service through the Web, are done at the data warehouse. We also see data warehouses serving needs for planning and analysis. For example, company analysts may run queries against the warehouse looking for sales trends, in order to better plan inventory and production. Data mining, the search for interesting and unusual pattems in data, has also been enabled by the construction of data warehouses, and there are claims of enhanced sales through exploitation of patterns discovered in this way. 1.4 Outline of the Book Ideas related to database systems can be divided into three broad categories: 1. Design of databases. How does one build a useful database? What kinds of information goes into the database? How is the information structured? ‘What assumptions are made about types or values of data items? How do data items connect? : 2. Database programming. How does one express queries and other opera- tions on the database? How does one use other capabilities of a DBMS, such as transactions or triggers? 3, Database implementation. How does one build a DBMS, including such matters as query processing, transaction processing and organizing stor- age for efficient access? While database implementation is a major segment of the software industry, the number of people who will design or use databases far exceeds the number that will build them. This book is intended for a first course in database systems, so it is appropriate to concentrate on the first two aspects: design and programming. In this chapter we have tried to give the reader a glimpse of the third aspect — implementation — but we shall not return to the subject in this

You might also like