Chapter 5
Advanced Data Modeling
SOURCE :
C O R O N E L , C . & M O R R I S , S . 2 0 1 9 . D ATA B A S E S Y S T E M S :
D E S I G N , I M P L E M E N TAT I O N , A N D M A N A G E M E N T ( 1 3 T H E D ) .
CENGAGE LEARNING INC. CANADA.
Objectives
Describe the main extended entity relationship (EER) model constructs and how they are
represented in ERDs and EERDs
Use entity clusters to represent multiple entities and relationships in an entity relationship
diagram (ERD)
Describe the characteristics of good primary keys and how to select them
Apply flexible solutions for special data-modeling cases
5-1 The Extended Entity Relationship
Model
Extended entity relationship model (EERM) Sometimes referred to as the enhanced entity
relationship model; the result of adding more semantic constructs, such as entity supertypes,
entity subtypes, and entity clustering, to the original entity relationship (ER)
Model Diagram that uses the EERM is called an EER diagram (EERD)
EERD : The entity relationship diagram resulting from the application of extended entity
relationship concepts that provide additional semantic content in the ER model
Entity supertype In a generalization or specialization hierarchy, a generic entity type that
contains the common characteristics of entity subtypes
Entity Subtype : In a generalization or specialization hierarchy, a subset of an entity supertype.
The entity supertype contains the common characteristics and the subtypes contain the unique
characteristics of each entity
5-1 The Extended Entity Relationship
Model
Employee
Accountin
Pilot Mechanic
g
5-1a Entity Supertypes and Subtypes
Group employees based on their characteristics. Diagram that uses the EERM is called an EER
diagram (EERD)
◦ a retail company could group employees as salaried and hourly,
◦ An university could group employees as faculty, staff, and administrators
The grouping of employees into various types provides two important benefits:
◦ It avoids unnecessary nulls in attributes when some employees have characteristics that are not
shared by other employees.
◦ It enables a particular employee type to participate in relationships that are unique to that
employee type
5-1a Entity Supertypes and Subtypes
5-1a Entity Supertypes and Subtypes
Two criteria help the designer determine when to use subtypes and supertypes:
◦ There must be different, identifiable kinds or types of the entity in the user’s environment.
◦ The different kinds or types of instances should each have one or more attributes that are
unique to that kind or type of instance.
5-1b Specialization Hierarchy
Entity supertypes and subtypes are organized in a specialization hierarchy, which
depicts the arrangement of higher-level entity supertypes (parent entities) and
lower-level entity subtypes (child entities).
Specialization hierarchy A hierarchy based on the top-down process of identifying lower level, more specific entity
subtypes from a higher-level entity supertype. Specialization is based on grouping unique characteristics and
relationships of the subtypes.
Specialization hierarchies enable the data model to capture additional semantic content (meaning) into the ERD.
A specialization hierarchy provides the means to:
• Support attribute inheritance.
• Define a special supertype attribute known as the subtype discriminator.
• Define disjoint or overlapping constraints and complete or partial constraints.
5-1c inheritance
Entity supertypes and subtypes are organized in a specialization hierarchy, which
depicts the arrangement of higher-level entity supertypes (parent entities) and
lower-level entity subtypes (child entities).
Specialization hierarchies
enable the data model to capture additional semantic content (meaning) into the ERD.
A specialization hierarchy provides the means to:
• Support attribute inheritance.
• Define a special supertype attribute known as the subtype discriminator.
• Define disjoint or overlapping constraints and complete or partial constraints.
5-1c inheritance
5-1c inheritance
The property of inheritance enables an entity subtype to inherit the attributes and relationships of the
supertype.
Inheriting the relationships of their supertypes subtypes can have relationships of their own. Figure 5.4
illustrates a 1:M relationship between EMPLOYEE, a subtype of PERSON, and OFFICE
5-1d Subtype Discriminator
A subtype discriminator is the attribute in the supertype entity that determines to
which subtype the supertype occurrence is related. In Figure 5.2, the subtype discriminator is the employee type
(EMP_TYPE).
Using Figure 5.2, the supertype is related to a PILOT subtype if the EMP_TYPE has a value of “P.”
If the EMP_TYPE value is “M,” the supertype is related to a MECHANIC subtype.
If the EMP_TYPE value is “A,” the supertype is related to the ACCOUNTANT subtype.
Note that the default comparison condition for the subtype discriminator attribute is the equality comparison.
However, in some situations the subtype discriminator is not necessarily based on an equality comparison.
For example, based on business requirements, you might create two new pilot subtypes:
pilot-in-command (PIC)-qualified and copilot-qualified only.
A PIC-qualified pilot must have more than 1,500 PIC flight hours. In this case, the subtype discriminator would be
“Flight_Hours,” and the criteria would be > 1,500 or <= 1,500, respectively
5-1e Disjoint and overlapping Constraints
disjoint subtypes / nonoverlapping subtypes d
In a specialization hierarchy, these are unique and nonoverlapping subtype entity set.
Example: is a pilot (subtype) can appear only in the PILOT subtype
o
overlapping subtype
In a specialization hierarchy, a condition in which each entity instance (row) of the supertype can appear in
more than one subtype.
For example, in a university environment, a person may be an employee, a student, or both. In turn, an
employee may be a professor as well as an administrator. Because an employee may also be a student,
STUDENT and EMPLOYEE
Toby J. Teorey popularized the use of G and Gs to indicate disjoint and overlapping subtypes.
5-1e Disjoint and overlapping Constraints
the implementation of disjoint subtypes is based on the value of the subtype discriminator attribute in the
supertype.
implementing overlapping subtypes requires the use of one discriminator attribute for
each subtype.
For example, in the case of the Tiny College database design in Chapter 4, Entity Relationship (ER)
Modeling, a professor can also be an administrator. Therefore, the EMPLOYEE supertype would have the
subtype discriminator attributes and values shown in Table 5.1.
5-1f Completeness Constraint
Completeness constraint
A constraint that specifies whether each entity supertype occurrence must also be a member of at least
one subtype. The completeness constraint can be partial or total
Partial completeness
In a generalization or specialization hierarchy, a condition in which some supertype occurrences might not
be members of any subtype .
Total completeness
In a generalization or specialization hierarchy, a condition in which every supertype occurrence must be a
member of at least one subtype
5-1g Specialization and Generalization
Specialization is the top-down process of identifying lower-level, more specific entity subtypes from a
higher-level entity supertype. Specialization is based on grouping the unique characteristics and
relationships of the subtypes.
Generalization
In a specialization hierarchy, the grouping of common attributes into a supertype entity.
For example, you might identify multiple types of musical instruments: piano, violin, and guitar. Using the
generalization approach, you could identify a “string instrument” entity supertype to hold the common
characteristics of the multiple subtypes
5-2 Entity Clustering
As the design approaches completion, the ERD will contain hundreds of entities and relationships that crowd the
diagram to the point of making it unreadable and inefficient as a communication tool. In those cases, you can use
entity clusters to minimize the number of entities shown in the ERD
Entity cluster
A “virtual” entity type used to represent multiple entities and relationships in the ERD. An entity cluster is formed
by combining multiple interrelated entities into a single abstract entity object. An entity cluster is
considered “virtual” or “abstract” because it is not actually an entity in the final ERD.
“virtual” or “abstract” in the sense that it is not actually an entity in the final ERD. Instead, it is a
temporary entity used to represent multiple entities and relationships, with the purpose
of simplifying the ERD and thus enhancing its readability
Figure 5.6 illustrates the use of entity clusters based on the Tiny College example in Chapter 4. Note that the ERD
contains two entity clusters:
• OFFERING, which groups the SEMESTER, COURSE, and CLASS entities and
relationships
• LOCATION, which groups the ROOM and BUILDING entities and relationships
5-2 Entity
Clustering
Note also that the ERD in
Figure 5.6 does not show
attributes for the entities
When using entity clusters,
the key attributes of the
combined entities are no
longer
available to avoid many
irrelevant inheritance and
referential integrity.
p. 177
5-3 Entity Integrity: Selecting Primary
Keys
The primary key’s function is to guarantee entity integrity.
Primary keys and foreign keys work together to implement relationships in the relational model
The importance of properly selecting the primary key has a direct bearing on the efficiency and effectiveness of
database implementation
5-3a Natural Keys and Primary Keys
A natural key or natural identifier is a real-world,
generally accepted identifier used to distinguish—that is, uniquely identify—real-world objects.
As its name implies, a natural key is familiar to end users and forms part of their day-to-day business vocabulary.
5-3b Primary Key Guidelines
1. First, you should understand the function of a primary key. Its main function is to uniquely identify an
entity instance or row within a table. The function of the primary key is to guarantee entity integrity, not
to “describe” the entity
2. Second, primary keys and foreign keys are used to implement relationships among entities .
◦ such relationships is done mostly behind the scenes,
◦ In the real world, end users identify objects based on the characteristics they know about the
objects. For example, when shopping at a grocery store, you select products by taking them from a
display shelf and reading the labels, not by looking at the stock number.
◦ user choose among multiple descriptive narratives of different objects, while using primary key
values behind the scenes .
5-3c when to Use Composite Primary
Keys
Composite primary keys are particularly useful in two cases:
1. As identifiers of composite entities, in which each primary key combination is
allowed only once in the M:N relationship
2. As identifiers of weak entities, in which the weak entity has a strong identifying relationship with the
parent entity
5-3c when to Use Composite Primary
Keys
Composite primary keys are particularly useful in two cases:
1. As identifiers of composite entities, in which each primary key combination is
allowed only once in the M:N relationship
◦ STUDENT entity set and a CLASS ( M:N relationship via an ENROLL entity set )
◦ Figure 5.7, the composite primary key automatically provides the benefit of ensuring that there cannot
be duplicate values—that is, it ensures that the same student cannot enroll more than once in the same
class
5-3c when to Use Composite Primary Keys
5-3c when to Use Composite Primary
Keys
2. As identifiers of weak entities, in which the weak entity has a strong identifying relationship with the
parent entity
The second case, a weak entity in a strong identifying relationship with a parent
entity is normally used to represent one of two situations:
◦ A real-world object that is existence-dependent on another real-world object. EMPLOYEE and DEPENDENT
◦ A real-world object that is represented in the data model as two separate entities in a
strong identifying relationship
For example, the real-world invoice object is represented
by two entities in a data model: INVOICE and LINE. Clearly, the LINE entity does not
exist in the real world as an independent object but as part of an INVOICE.
having a strong identifying relationship ensures that the dependent entity can exist only when it is related to the
parent entity.
In summary, the selection of a composite primary key for composite and weak entity types provides benefits that
enhance the integrity and consistency of the model.
5-3b Primary
Key Guidelines
First, you should understand the
function of a primary key. Its
main function is to
uniquely identify an entity
instance or row within a table.
The function of the primary key is
to guarantee entity integrity, not
to “describe” the entity
Second, primary keys and foreign
keys are used to implement
relationships among
entities.
5-3d when to Use Surrogate
(alternatif/pengganti) Primary Keys
In some instances a primary key doesn’t exist in the real world or the existing natural key might not be a
suitable primary key. In these cases, it is standard practice to create a surrogate key.
A surrogate key is a primary key created by the database designer to simplify the identification of entity
instances. The surrogate key has no meaning in the user’s environment—it exists only to distinguish one
entity instance from another (just like any other primary key). A system-assigned primary key,
generally numeric and auto-incremented.
5-3d when to Use Surrogate
(alternatif/pengganti) Primary Keys
Check Table 5.4
EVENT (DATE, TIME_START, TIME_END, ROOM, EVENT_NAME, PARTY_OF )
(DATE, TIME_START, ROOM) or (DATE, TIME_END, ROOM)
5-3d when to Use Surrogate
(alternatif/pengganti) Primary Keys
EVENT (DATE, TIME_START, TIME_END, ROOM, EVENT_NAME, PARTY_OF )
RESOURCE (RSC_ID, RSC_DESCRIPTION, RSC_TYPE, RSC_QTY, RSC_PRICE)
the business rules, the M:N relationship between RESOURCE and EVENT would
be represented via the EVNTRSC composite entity with a composite primary key as follows:
EVNTRSC (DATE, TIME_START, ROOM, RSC_ID, QTY_USED) → a lengthy, four-attribute composite primary
key
if the EVNTRSC entity’s primary key were inherited by another existence-dependent entity? At this point,
you can see that the composite primary key could make the database implementation and program coding
unnecessarily complex.
The preferred alternative is to use a numeric, single-attribute surrogate primary key.
Ensure that the candidate key of the entity in question performs properly through the use of “unique
“unique index” and “not null” constraints.
5-4 Design Cases: Learning Flexible
Database Design
Flexible designs, proper identification of primary keys, and placement of foreign keys.
5-4a Design Case 1: implementing 1:1
Relationships
Foreign keys work with primary keys to properly implement relationships in the relational model. The basic
rule is very simple: put the primary key of the “one” side (the parent entity) on the “many” side (the
dependent entity) as a foreign key.
However, where do you place the foreign key when you are working with a 1:1 relationship?
EMPLOYEE and DEPARTMENT : based on the business rule “one EMPLOYEE is the manager of one
DEPARTMENT, and one DEPARTMENT is managed by one EMPLOYEE.” Where do you have attach the
PK ?
1. Place a foreign key in both entities
Ex: Place EMP_NUM as a foreign key in DEPARTMENT, and place DEPT_ID as a foreign key in EMPLOYEE
However, this solution is not recommended because it duplicates work, and it could conflict with other
existing relationships. (Remember that DEPARTMENT and EMPLOYEE also participate in a 1:M relationship—
one department employs many employees.)
5-4a Design Case 1: implementing 1:1
Relationships
2. Place a foreign key in one of the entities.
In that case, the primary key of one of the two entities appears as a foreign key in the other entity. That is
the preferred solution,
which primary key should be used as a foreign key? The answer is found in Table 5.5, which shows the
rationale for selecting the foreign key in a 1:1 relationship based on the relationship properties in the ERD.
5-4a Design Case 1: implementing 1:1
Relationships
1:1 relationships exist in the real world;
therefore, they should be supported in the data model.
In fact, a 1:1 relationship is used to ensure that two entity sets are not placed in the same table. In other
words, EMPLOYEE and DEPARTMENT are clearly separate and unique entity types that do not belong
together in a single entity. If you grouped them together in one entity, what would you name that entity?
5-4b Design Case 2: Maintaining History
of Time-Variant Data
Normally, data changes are managed by replacing the existing attribute value with the new value, without
regard to the previous value. However, in some situations the history of values for a given attribute must
be preserved. From a data-modeling point of view,
time-variant data refers to data whose values change over time and for which you must keep a history of
the data changes.
You could argue that all data in a database is subject to change over time and is therefore time variant.
However, some attribute values, such as your date of birth or your Social Security number, are not time
variant.
On the other hand, attributes such as your student GPA or your bank account balance are subject to
change over time. Sometimes the data changes are externally originated and event driven, such as a
product price change. On other occasions, changes are based on well-defined schedules, such as the daily
stock quote “open” and “close” values.
5-4b Design Case 2: Maintaining History
of Time-Variant Data
The storage of time-variant data requires changes in the data model;
the type of change depends on the nature of the data.
Some time-variant data is equivalent to having a multivalued attribute in your entity. To model this type of
time-variant data, you must create a new entity in a 1:M relationship with the original entity.
5-4b Design Case 2: Maintaining History of Time-
Variant Data
5-4b Design Case 2: Maintaining History of Time-Variant Data
5-4b Design Case 2: Maintaining History of Time-Variant Data
5-4c Design Case 3: Fan Traps
Creating a data model requires proper identification of the data relationships among entities. However, due
to miscommunication or incomplete understanding of the business rules or processes, it is not uncommon
to misidentify relationships among entities. Under those circumstances, the ERD may contain a design trap.
A design trap occurs when a relationship is improperly or incompletely identified and is therefore
represented in a way that is not consistent with the real world. The most common design trap is known as
a fan trap.
A fan trap occurs when you have one entity in two 1:M relationships to other entities, thus producing an
association among the other entities that is not expressed in the model.
For example, assume that the JCB basketball league has many divisions. Each division has many players,
and each division has many teams. Given those “incomplete” business rules, you might create an ERD that
looks like the one in Figure 5.12.
5-4d Design Case 4: Redundant
Relationships
5-4d Design Case 4: Redundant
Relationships
The Relational Database Model, redundancies can cause data anomalies in a database.)
Redundant relationships occur when there are multiple relationship paths between related entities.
The main concern with redundant relationships is that they remain consistent across the model. However,
it is important to note that some designs use redundant relationships as a way to simplify the design.
An example of redundant relationships was first introduced in Figure 5.10 during the discussion of
maintaining a history of time-variant data. However, the use of the redundant “manages” and “employs”
relationships was justified by the fact that such relationships dealt with current data rather than historic
data.
In Figure 5.14, note the transitive 1:M relationship between DIVISION and PLAYER through the TEAM entity
set. Therefore, the relationship that connects DIVISION and PLAYER is redundant, for all practical purposes.
In that case, the relationship could be safely deleted without losing any information-generation capabilities
in the model.