DataModelingTraining
DataModelingTraining
• Advanced Relationships
• Modeling many-to-many relationships
• Model multiple relationships between the
• same two entities
• Model self-referencing relationships
• Model ternary relationships
• Identify redundant relationships
Day 5 – Agenda
A functional decomposition contains the whole function or project along with all of the necessary
sub-tasks needed to complete it. Functional decomposition is a problem-solving tool used in
several contexts, from business and industry to computer programming and AI.
Context-level data flow diagram
Context diagrams focus on how external entities interact with your system. It's the most basic
form of a data flow diagram, providing a broad view of the system and external entities in an
easily digestible way. Because of its simplicity, it's sometimes called a level 0 data flow diagram
Sources of requirements
In data modeling, requirements can be gathered from various sources, including business needs,
existing systems, and user requirements. These sources help define the data model's structure,
ensuring it accurately reflects the organization's data landscape and supports its processes.
• Business Requirements
• Existing Systems
• User Requirements
• Other Sources
• KPI
• Reporting Need
Data flow diagrams
A data flow diagram (DFD) maps out the flow of information for any process or system. It uses
defined symbols like rectangles, circles and arrows, plus short text labels, to show data inputs,
outputs, storage points and the routes between each destination.
Use case models
A use-case model is a model of how different types of users interact with the system to solve a
problem. As such, it describes the goals of the users, the interactions between the users and the
system, and the required behavior of the system in satisfying these goals.
Workflow models
Workflow models allow for the standardization of organizational processes. They define a
structured approach to executing tasks, ensuring consistency and adherence to predefined
guidelines. This consistency is crucial for maintaining quality, compliance, and achieving desired
outcomes.
Business rules
Business rules in data modeling are constraints, policies, and logic that define how data behaves and relates
within a database. They ensure data quality, consistency, and alignment with business goals. These rules are
typically derived from a detailed description of the organization's operations. By incorporating business rules,
data models can reflect real-world data environments accurately and lead to better database designs.
State diagrams
State diagrams provide an abstract description of a system's behavior. This behavior is analyzed and
represented by a series of events that can occur in one or more possible states. Hereby "each diagram usually
represents objects of a single class and track the different states of its objects through the system".
Class diagrams
The class diagram is the main building block of object-oriented modeling. It is used for general conceptual
modeling of the structure of the application, and for detailed modeling, translating the models into
programming code
Types of modeling projects
A transactional business system, also known as an Online Transaction Processing (OLTP) system, handles the
recording and processing of a company's daily transactions. These systems are designed to manage a high
volume of transactions efficiently and reliably, focusing on data integrity and accuracy. Examples include
systems for online banking, e-commerce, and inventory management
Business Intelligence (BI) and data warehousing are related but distinct concepts. While data warehousing
focuses on storing and organizing large amounts of data for analysis, BI encompasses the processes and
technologies used to analyze that data and extract actionable insights. Essentially, a data warehouse provides
the foundation for BI by serving as a centralized repository of data, while BI tools and techniques are used to
query, analyze, and visualize that data to drive decision-making.
Integrating and consolidating existing systems involves combining different, often disparate, systems to
create a unified whole. This can include merging data from multiple sources into a single repository
(consolidation) or connecting them for real-time data exchange (integration). The goal is to streamline
operations, improve data accessibility, and enhance overall efficiency
Maintenance of existing systems
Maintaining existing systems involves ongoing activities to keep them operational and meet evolving user
needs. This includes fixing bugs, enhancing functionality, and adapting to changes in the environment.
Effective maintenance is crucial for extending the life of a system and reducing its long-term costs
Enterprise analysis
A commercial off-the-shelf (COTS) application is a pre-packaged software program that's available for
purchase and use without extensive customization. COTS applications are also known as off-the-shelf
software.
Day-3
Conceptual Data
Modeling
Conceptual data modeling is a high-level representation of an organization's data needs, focusing on the core
business concepts and their relationships, rather than specific technical details. It serves as a blueprint for
developing more detailed logical and physical data models, ensuring that the database structure aligns with
the business requirements. Essentially, it's a way to understand and document the essential data elements
and their interactions in a business context.
Discovering entities
Discovering entities in data modeling involves identifying the core objects or concepts for which data is
collected. These entities represent the building blocks of a data model, often corresponding to tables in a
relational database. The process typically involves analyzing data sources and business requirements, and can
be facilitated by tools like Entity-Relationship Diagrams (ERDs).
Defining entities
In data modeling, defining entities involves identifying the core objects or concepts about which data will be
stored and managed within a database or system. These entities represent real-world objects, people, places,
concepts, or events that are of interest to the application or domain
Documenting an entity
Documenting an entity in data modeling involves capturing details about an object of interest, its attributes,
and relationships with other entities. This documentation is crucial for understanding the data structure and
ensures consistency and accuracy.
Identifying attributes
In data modeling, identifying attributes involves recognizing and defining the characteristics or properties of
entities, which are the core objects or concepts being modeled. These attributes describe the entity and are
the most fundamental building blocks of a data model. They represent the data that is stored for each entity
In data modeling, an entity represents a real-world object or concept, like a "Student" or a "Course."
Attributes, on the other hand, are characteristics or properties that describe an entity, such as a "Student's
Name" or a "Course's Credit Hours". Entities are the fundamental building blocks of a data model, and
attributes provide details about those entities.
Model fundamental relationships
Modeling fundamental relationships involves using various frameworks to represent connections between entities or
individuals. These models help understand and enhance interactions, whether in personal relationships, work
environments, or data structures.
Cardinality of relationships
Cardinality, in the context of relationships in databases and data modeling, describes the numerical relationship
between entities or tables. It essentially defines how many instances of one entity can be related to instances of
another entity.
Types of Cardinality:
One-to-One (1:1):
Each instance of one entity is related to exactly one instance of another entity. For example, a person might have
exactly one passport.
One-to-Many (1:N):
Each instance of one entity can be related to multiple instances of another entity. For instance, one
customer can place many orders.
Many-to-Many (N:M):
Each instance of one entity can be related to multiple instances of another entity, and vice versa.
For example, many students can enroll in many courses.
Is the relationship mandatory or optional?
In the context of relationships between entities in data modeling, a relationship can be either mandatory or optional.
A mandatory relationship requires that an entity instance must participate in the relationship with another entity. In
contrast, an optional relationship allows an entity instance to participate in the relationship with another entity, but
this participation is not compulsory.
In a data model, relationships between entities should be named clearly and consistently to ensure understanding
and facilitate data manipulation. A good naming convention uses verbs to describe the relationship between two
entities. For example, "CUSTOMER places ORDER" describes the relationship between a customer and an order.
Active verbs are generally preferred, and inverse names (e.g., "ORDER placed by CUSTOMER") can be helpful for
readability
In data modeling, subject areas are groups of related entities and their attributes that represent specific business
functions or domains within an organization. To discover attributes for a subject area, you need to identify the key
characteristics and properties of the entities within that area. This involves understanding the business needs,
processes, and data requirements related to the subject are
Assign attributes to the appropriate entity
To assign attributes to an entity, you generally use tools or methods specific to the data modeling or database
system being used. This typically involves selecting the entity, then adding attributes and specifying their data types
and properties.
General Steps:
Identify the Entity: Determine the specific entity (e.g., a table, class, or object) you want to modify.
Open the Entity Editor or Tool: Use the appropriate tool (e.g., Attribute Editor, Data Modeler, or database
management system) to access the entity's attributes.
Add Attributes: Use the tool's interface to add new attributes to the entity.
Define Attribute Properties: Specify the data type (e.g., string, integer, date), name, and other relevant properties for
each attribute.
Save Changes: Confirm the changes to the entity's attributes.
Specific Tools/Methods:
Look at the database model below. I went a bit overboard and removed as many traces of a naming
convention as I could. This proves my first point: a naming convention is an important part of a well-
built data model. The model is not very fun to look at, to try to understand, or to code around.
Name attributes using established naming conventions
Look at the database model below. I went a bit overboard and removed as many traces of a naming convention as I
could. This proves my first point: a naming convention is an important part of a well-built data model. The model is
not very fun to look at, to try to understand, or to code around.
Tables
Views
Columns
Keys – including the primary key, alternate keys, and foreign keys
Schemas
IX_TableName_ColumnName
Table_Name_BIS_TRG
Day-4
Advanced Relationships
Self-referencing relationships, also known as recursive relationships, occur when a record in a table references
another record within the same table. This is common when modeling hierarchical or self-associative structures, such
as an employee reporting to another employee, or a category having sub-categories
Model ternary relationships
An association between 3 entities is called a ternary association. A typical example is an association between an
employee, the project they are working on, and their role in that project. If the role is a complex object, you might
decide to model this as 3 entity classes.
In data modeling, supertypes and subtypes are used to represent hierarchical relationships between
entities, facilitating the organization of data by capturing both commonalities and differences among
related entities. A supertype is a generalized entity, and subtypes are specialized entities that inherit
attributes from the supertype while also having their own unique attribute
Supertypes:
A supertype is a general entity that encompasses a broader category or concept. It serves as a parent entity, and its
attributes are shared by all of its subtypes. For example, in a vehicle database, "Vehicle" could be a supertype, as it
encompasses various types of vehicles like cars, trucks, and motorcycles.
Subtypes:
Subtypes are specialized versions of the supertype. They inherit the attributes of the supertype but also have
additional, specific attributes unique to themselves. For instance, "Car", "Truck," and "Motorcycle" could be subtypes
of the "Vehicle" supertype, each with its own attributes specific to its type (e.g., "Car" might have attributes like
"numberOfDoors", while "Truck" might have "payloadCapacity"
Constraints in supertype subtype relationship
Constraints on generalization define the rules governing the relationship between supertypes and subtypes. They
specify which entities can belong to specific subtypes, whether an entity can belong to multiple subtypes, and
whether a supertype entity must belong to at least one subtype
When you write constraints statements on a subtype table, you can refer to all of the following without joining with
another table:
Example 1
Suppose Persons can be Guides or Office Staff (subtype set Occupation), and they can be Male or Female (subtype
set Gender).
Maternity leave is not possible for guides, only for office staff. This business rule can be enforced by a restrictive
constraint with the following statement:
SELECT ' '
FROM female
WHERE maternity_leave = 'Y'
AND guide = 'Y'
In this example, Female is a subtype table, Maternity_leave is a column of this subtype table, and Guide is a subtype
indicator of the constellation.
In the following example, you need to join tables only because the business rule refers to subtype columns in
different subtype tables:
Example 2
Office staff can only get maternity leave if a number of conditions are met. These conditions refer to office staff
attributes such as the person's hire date.
This business rule can be enforced by a constraint with the following statement:
Key Concepts:
Redundancy: Storing the same information multiple times in a database.
Data Integrity: Ensuring the accuracy and consistency of data within the database.
Anomalies: Issues that can arise from inconsistencies or redundancies in data, like inserting, deleting,
or updating data that affects other parts of the database.
Normal Forms (1NF, 2NF, 3NF, etc.): A set of rules or levels that define how well data is structured and
normalized, with each level building upon the previous one.
Functional Dependencies: Relationships between attributes in a table, where one attribute (or set of
attributes) determines the value of another.
1. First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
All the columns hold values of the same type like emp_name has all the names, emp_mobile has all
the contact numbers, etc.
But the emp_skills column holds multiple comma-separated values, while as per the First Normal form,
each column should have a single value.
Hence the above table fails to pass the First Normal form.
So how do you fix the above table?
You can also simply add multiple rows to add multiple skills. This will lead to repetition of the data, but
that can be handled as you further Normalize your data using the Second Normal form and the Third
Normal form
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
But in the Score table, we have a column teacher_name, which depends on the subject information or
just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be
Normalized as per the Second Normal Form.
Third Normal Form (3NF)
It satisfies the First Normal Form and the Second Normal form.
Let's take an example. We had the Score table in the Second Normal Form above. If we have to store
some extra information in it, like,
exam_type
total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the
percentage of marks scored by each student.
In the table above, the column exam_type depends on both student_id and subject_id, because,
and based on that they may have different exam types for different subjects.
The CSE students may have both Practical and Theory for Compiler Design,
whereas Mechanical branch students may only have Theory exams for Compiler Design.
But the column total_marks just depends on the exam_type column. And the exam_type column is not
a part of the primary key. Because the primary key is student_id + subject_id, hence we have a
Transitive dependency here.
can create a separate table for ExamType and use it in the Score table.
We have created a new table ExamType and we have added more related information in it like
duration(duration of exam in mins.), and now we can use the exam_type_id in the Score table.
Denormalization
In a fully normalized database, each piece of data is stored only once, generally in separate tables,
with a relation to one another. To become usable, the information must be queried and read out from
the individual tables, and then joined together to provide the query response. If this process involves
large amounts of data or needs to be done many times a second, it can quickly overwhelm the
database hardware, reduce its performance, and even cause it to crash.
Denormalization pros and cons
Denormalization on databases has both pros and cons:
Pros
• Faster reads for denormalized data.
• Simpler queries for application developers.
• Less compute on read operations.
Cons
• Slower write operations.
• Increases database complexity.
• Potential for data inconsistency.
• Additional storage required for redundant tables.
CRUD matrix
CRUD matrix is a tool used to map out and visualize the relationships between data entities and the
operations that can be performed on them (Create, Read, Update, Delete). It helps understand and
define user permissions and data access within a system, especially in the context of business
processes and software development.
Purpose:
The CRUD matrix helps determine which data entities are affected by various business activities
and how those entities are manipulated (CRUD operations).
Structure:
It's typically represented as a table or matrix where columns represent the CRUD operations
(Create, Read, Update, Delete) and rows represent data entities or use cases.
Usage:
It's a valuable tool for business analysts, software developers, and database administrators to:
Map out data operations and their relationships.
Define user permissions and access control.
Identify potential data conflicts or inconsistencies.
Document the system's data flow and interactions.
CRUD matrix
Identifying & Non Identifying Relationsip
Identifying Relationship:
Dependency: The child entity cannot exist without the parent entity.
Primary Key: The child's primary key includes the parent's primary key.
Example: A book cannot exist without an author, so a "Book" entity's primary key would include the
"Author" entity's primary key.
Non-Identifying Relationship:
Independence: The child entity can exist independently of the parent entity.
Primary Key: The child entity has its own primary key, and the parent's primary key is included as a
foreign key in the child's table, but not as part of its primary key.
Example: A city can exist independently of a country, so a "City" entity can have its own primary key,
with the "Country" entity's primary key included as a foreign key.
Workshop
Demostate the Data Modeling Tool Erwin/Power Designer
Translate the OLTP Model to OLAP
QA Session