Imperial College London - SQL Data Definition
Imperial College London - SQL Data Definition
Naranker Dulay
[email protected]
https://www.doc.ic.ac.uk/~nd/databases
Data definition
SQL’s Data Definition Language (DDL) is concerned with schema creation and
modification; as well as the specification of constraints and performance options such
as materialised views and indexing.
We’ll mainly look at base relations (stored tables) and briefly at derived relations
(computed views).
The create table statement is used to create a new named relation and declare its
schema. The relation is persistent and is stored on disk in specially organised files.
The order that attributes are defined is sometimes used by other SQL statements. For
example select *, and inserting values into a relation when no attribute list is given.
We can remove relations with drop table, e.g. drop table movie
N. Dulay Databases : 08 : SQL Data Definition 3
Modifying the Schema
The alter table statement is used to add or remove attributes for a relation.
and each tuple of movie will now have a studio attribute set to the empty string.
We can set different studio values by following alter table with one or more
update statements.
Each relation should have a primary key (a candidate key) that determines the other
attributes in the relation and is used to uniquely identify each tuple of the relation.
The primary key is also used to enforce foreign key constraints on the relation
(see later)
If there is choice of candidate keys, then choose one where the key will never be
updated or very rarely updated, otherwise choose one with few (and small-size)
attributes.
Other candidate keys can be defined using unique constraints (see later).
2. Primary key values (as a whole) must be unique i.e. no two tuples can have the
same primary key value (i.e. all attributes the same).
If any database operation attempts to violate one of these constraints then the
operation will fail. For example, if we attempt to assign a null value to a primary
key attribute or if we attempt to insert a tuple where a tuple with the same primary
key value already exists. The insertion
will fail since insert will attempt to assign null to year, and year is an attribute of
the primary key.
Additional candidate keys (or superkeys or any attributes for that matter) can be
declared using unique constraints, which ensure that no two tuples can have the
same set of values for the attributes listed in unique, i.e. that every tuple must have
a unique value for the attributes listed in unique (as a whole).
We can add a not null constraint to any attribute to have the database prohibit
assignment of null to that attribute.
Q. Declare the SQL schema for the following relation - don’t worry about getting all the
details right - make ‘educated‘ guesses.
Attribute types and not null constraints allow us to limit the values that we store.
check constraints allow us to define predicates that must be satisfied when a tuple is
inserted or updated.
Although check constraints are typically simple checks on a single attribute they can
be arbitrary expressions involving several attributes and/or a query.
Q. For the staff relation add a check constraint for validfrom and validto and one for
room given the following additional relation:
building(id, ..., room, area, desks, occupancy)
It’s also possible to declare assertions, which are check constraints over data in
several relations.
Although assertions are a very powerful feature of SQL, they are hard to implement
efficiently. Triggers are an alternative, more powerful and more operational approach
for letting the programmer deal with constraint checking when data is modified.
N. Dulay Databases : 08 : SQL Data Definition 13
Naming constraints
It’s good practice to name constraints. This allows us to drop them using alter
table but also clarifies error messages when a constraint violation is reported.
Foreign key constraints specify that the value of one or more attributes in a relation
must match (reference) values of a primary key or unique constraint (candidate key)
in another (referenced) relation. This is an example of referential integrity.
The default policy when referential integrity is violated in SQL is to reject the
modification. However, there are two other policies that can be defined for deletes
and updates. Cascade Policy - With this policy, any update to the referenced
attribute(s) is cascaded back to the foreign key. Similarly deleting a referenced tuple
will result in the referencing tuple being deleted as well (which might cascade again!)
create table actor ( Updating the title (or year) of a movie will
title varchar(120), change the title of the movie for all actors
year int, in the movie. Deleting a movie will delete
name varchar(60), all actors who appeared in the movie.
Rather than cascading updates or deletes to the referencing relation, we can, instead,
set the value of the foreign key to null using a set null policy or to the default
value using a set default policy. This will lead to unmatched tuples however! ☹
Views are relations that are defined using a query (a select). Views are not
physically stored on disk unless they are materialised (see later).
Views can be queried just like stored relations (tables): Views in a query are like
subqueries.
select * from comedies where year=2010;
select * from actorgenre where genre=‘comedy’;
N. Dulay Databases : 08 : SQL Data Definition 20
Exercise
Q. Write a view on staff for all staff who are currently here (i.e. currently valid).
Q. Write a view on staff who were here in the academic year 2008 to 2009 (October to
September). Tricky.
5. Restrict access to a relation by providing access to a view not the whole relation.
...
Views are normally recomputed each time they are needed. If a view is used
sufficiently often then it might be more efficient to materialise (store) the view at the
cost of extra storage and extra time to keep the view up-to-date when the underlying
relations are changed by insertions, updates, and deletions.
Materialised view maintenance can be expensive however, for example, lots of changes
to the underlying views versus few queries on the views. The decision is a tradeoff
between extra-storage and view maintenance costs and the faster speed of querying a
materialised view.
Rather than keeping a materialised view eagerly up-to-date, some RDBMSs allow a
materialised view to be brought up-to-date only when the view is accessed (lazy
maintenance). Others although the view to become “stale” and only update it
periodically e.g. when database activity is low (overnight). Others create/remove a
materialised view transparently as a query optimisation.
Views are normally read-only, used to retrieve data. One could consider updateable
views - views that allow inserts, deletes, updates directly on the view.
In general, updatable views usually don’t make sense e.g. deleting a tuple in
actorgenre for example.
Even in simple cases, e.g. inserting a tuple into comedies, there is the issue of
missing attributes - attributes in the underlying relations that are not in the relation.
In this case we could set them to null or their default.
SQL has a complex set of rules for defining an updateable view including:
When relations have many tuples it can be very slow to scan the relation tuple-by-
tuple in order to satisfy a query.
If there were 10,000 movies then reading and testing all 10,000 tuples might be a
little slow. Imagine if there were 100 million tuples in the relation.
Indexes are copies of an attribute’s data that are automatically maintained by the
RDBMS but can be searched very very quickly. For the example, we could create an
index on both year and genre together, or separate indexes on one or both which
might be more flexible:
Ideally we shouldn’t
create index yearindex on movies(year); have to create
create index genreindex on movies(genre); indexes, at all.
Like materialised views, there is a tradeoff between the space needed for indexes and
the cost of maintaining them vs the greater speed of access to the indexed data.
N. Dulay Databases : 08 : SQL Data Definition 25