Database Concepts
Database Concepts
Database
1. Database
is an integrated collection of logically related records or files.
A database consolidates records previously stored in separate files into a common pool of data records that provides
data for many applications. The data is managed by systems software called database management systems (DBMS).
The data stored in a database is independent of the application programs using it and of the types of secondary
storage devices on which it is stored.
A database management system (DBMS) is a software package designed to store, retrieve, query and manage data.
Database management systems are important because they provides programmers, database administrators and end
users with a centralized view of data and free applications and end users from having to understand where data is
physically located.
Well-known DBMSes include:
• Access – a lightweight relational database management system (RDMS) included in Microsoft Office
and Office 365.
• Amazon RDS – a native cloud DBMS that offers engines for managing MySQL, Oracle, SQL
Server, PostgreSQL and Amazon Aurora databases.
• Apache Cassandra - an open-source distributed database management system known for being able to handle
massive amounts of data.
• Filemaker - a low-code/no-code (LCNC) relational DBMS.
• MySQL – an open-source relational database management system (RDBMS) owned by Oracle.
• MariaDB - an open-source fork of MySQL.
• Oracle - a proprietary relational database management system optimized for hybrid cloud architectures.
• SQL Server – an enterprise-level relational database management system from Microsoft that is capable of
handling extremely large volumes of data and database queries.
The Relational Database Management System (RDBMS) is a database that uses a relational model to store data in
tables. The table comprises rows and columns, with each column containing an entry for data for a given category
and each row including an instance for that data determined by the category.
8.1 Database Concepts
▪ Show understanding of the limitations of using a file-based approach for the storage and retrieval of data
• In a computer, a file system -- sometimes written filesystem -- is the way in which files are named and where
they are placed logically for storage and retrieval.
• Without a file system, stored information wouldn't be isolated into individual files and would be difficult to
identify and retrieve.
• When there is only a single table in the database, this is called a 'flat file database'.
• It is an application program designed to manipulate data files.
• A "flat file" database allows the user to specify data attributes (columns, datatypes, etc) for one table at a time,
storing those attributes independently of an application. dBase III and Paradox were good examples of this
kind of database
• As an example
in the file-oriented approach of data processing, each department has its own files, which are specifically designed
for that department's applications. The figure above shows three departments—academics, accounts, and library,
each of these departments maintains a file containing students' personal details in addition to the file that is
necessary for its own application, Therefore, each application has a separate master file and its own set of personal
files, the following are the various types of files maintained for each application. The patient database is an
example of a flat-file as all of the information is stored in one single table:
Disadvantage of File-oriented system:
1. Data Redundancy: It is possible that the same information may be duplicated in different files. this leads to data
redundancy results in memory wastage.
2. Data Inconsistency: Because of data redundancy, it is possible that data may not be in consistent state.
3. Difficulty in Accessing Data: Accessing data is not convenient and efficient in file processing system.
4. Limited Data Sharing: Data are scattered in various files. also, different files may have different formats and these
files may be stored in different folders may be of different departments. So, due to this data isolation, it is difficult
to share data among different applications.
5. Integrity Problems: Data integrity means that the data contained in the database in both correct and consistent.
for this purpose, the data stored in database must satisfy correct and constraints.
6. Atomicity Problems: Any operation on database must be atomic. this means, it must happen in its entirely or not
at all.
7. Concurrent Access Anomalies: Multiple users are allowed to access data simultaneously, this is for the sake of
better performance and faster response.
8. Security Problems: Database should be accessible to users in limited way. Each user should be allowed to access
data concerning his requirements only
9. Incompatible File Formats: As the structure of the files is embedded in application programs, the structure is fully
dependent on application programming languages.
10. Fixed Queries: File based systems are very much dependent on application programs. Any report or query needed
by the organisation has to be developed by the application programmer. With type, the time, and number of
queries or reports increases. Producing dissimilar types of queries or reports is not possible in File Based Systems.
As a result, in some organisations the kind of queries or reports to be produced is fixed. No latest query or report
of the data could be produced.
▪ Describe the features of a relational database that address the limitations of a file-based approach
A relational database takes this "flat file" approach several logical steps further, allowing the user to specify
information about multiple tables and the relationships between those tables, and often allowing much more
declarative control over what rules the data in those tables must obey. The benefits of the database approach are
as follows:
1. Ease of application development: The programmer is no longer burdened with designing, building and
maintaining master files.
2. Minimal data redundancy: All data files are integrated into a composite data structure. In practice, not all
redundancy is eliminated, but at least the redundancy is controlled. Thus, inconsistency is reduced.
3. Enforcement of standards: The database administrator can define standards for names, etc.
4. Data can be shared. Physical data independence: Data descriptions are independent of the application programs.
This makes program development and maintenance an easier task. Data is stored independently of the program
that uses it.
5. Logical data independence: Data can be viewed in different ways by different users.
6. Better modelling of real-world data: Databases are based on semantically rich data models that allow the accurate
representation of real-world information.
7. Uniform security and integrity controls: Security control ensures that applications can only access the data they
are required to access. Integrity control ensures that the database represents what it purports to represent.
8. Economy of scale: Concentration of processing, control personal and technical expertise.
File Size Estimation – Calculations
▪ Show understanding of and use the terminology associated with a relational database model
1. Entity
An entity is considered strong if it can exist apart from all of its related entities. When you build a database, you are
organising data about entities. An entity is any item that has its attributes stored as data. An entity could be
anything, e.g. a book, a person, a film, a country or a football team. An entity can be a real-world object, either
animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes,
and courses offered can be considered as entities. All these entities have some attributes or properties that give them
their identity. An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise, a Teachers set
may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.
There is a standard notation for describing entities:
EntityName (EntityIdentifier, attribute1, attribute2, attribute3, ...)
Following from the previous example, the entity description for the Teacher entity is:
Teacher (TeacherId, FirstName, MiddleName, LastName, DateofBirth, HireDate, Email)
2. Attributes:
Each entity is described by a set of attributes (e.g., Student = (Name, Address, Date of Birth, Form, Set) and each
attribute has a name, and is associated with an entity and a domain of legal values.
Entities are represented by means of their properties, called attributes. All attributes have values. For example, a
student entity may have name, class, and age as attributes. There exists a domain or range of values that can be
assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's
age cannot be negative, etc. The details about entities are called attributes. A person is an entity with attributes
including age, height and nationality, among many others. When you design a database, you need to think about
which attributes you want to store. For example, the attributes of a film could include title, duration, certificate,
rating, genre, cast, director and year of creation. In other words - details about the entity
3. Bit (Character)
a bit is the smallest unit of data representation (value of a bit may be a 0 or 1). Eight bits make a byte which can
represent a character or a special symbol in a character code.
4. Field
A field consists of a grouping of characters.
A data field represents an attribute (a
characteristic or quality) of some entity
(object, person, place, or event). The
columns in a table are also referred to as an
attribute.
5. Record
a record represents a collection of attributes
that describe a real-world entity. A record consists of fields, with each field describing an attribute of the entity. A
tuple is one record (one row).
You can also think of it this way: an attribute is used to define the record and a record contains a set of attributes.
6. Table
a set of fields and records. The table contains all of the fields and the records for one type of entity. A database may
contain more than one table.
7. File
a group of related records. Frequently classified by the application for which they are primarily used EG. employee
file.
• Date / Time – specifies data related to Date and time and must be formatted
• Text – often referred to as “string,” means simply any combination of letters instead of numbers or other
symbols but can also be that
• Currency – financial numeric data
• Boolean – TRUE or FALSE data, often migrated to YES or NO text, or 1 and 0 numbers. It is, in simple
terms, binary data
9. Primary Key
A primary key in a file is the field (or fields) whose value identifies a record among others in a data file.
It contains a unique identifier for each record. To make each record in a database unique we normally assign them a
primary key. Even if a record is deleted from a database, the primary key will not be used again. The primary key can
be automatically generated and will normally just be a unique number or mix of numbers and letters. The main key
– unique (pieces of data) field that can search things quickly. This allows you to create relationships between
databases. The primary key is selected from one of the candidate keys and becomes the identifying key of a table. It
can uniquely identify any data row of the table. Some ‘natural’ primary keys are:
in simple terms, refers to data about data. The term metadata is used for different types of data in different
contexts. Metadata in DBMS is characterized as data about data. It describes the context and information about data,
the way that data is stored and the various relations among data. Metadata in a relational DBMS stores data about
Constraints, Table Relationships, Data Types, Columns, Tables, and so on.
18. Referential Integrity
• Referential integrity ensures that records in related tables are linked correctly
• is a database constraint that ensures that references between data are indeed valid and intact.
• It’s a database management safeguard that ensures every foreign key matches a primary key. For example,
customer numbers in a customer file are the primary keys, and customer numbers in the order file are the foreign
keys. If a customer record is deleted, the order records must also be deleted; otherwise they are left without a primary
reference.
• Referential integrity is a fundamental principle of database theory and arises from the notion that a database
should not only store data, but should actively seek to ensure its quality.
• It is the logical dependency of a foreign key on a primary key.
• is a software module and database containing descriptions and definitions concerning the structure, data elements,
interrelationships, and other characteristics of an organization's database. A data dictionary:
1. Contains all the data definitions, and the information necessary to identify data ownership
2. Ensures security and privacy of the data, as well as the information used during the development and
maintenance of applications which rely on the database
• A data dictionary is a collection of metadata such as object name, data type, size, classification, and relationships
with other data assets. A data dictionary acts as a reference guide on a dataset.
Data dictionary - (Idea behind it)
A data dictionary is a crucial part of a relational database as it provides additional information about the relationships
between multiple tables in a database. It describes the structure and attributes of data to be used or within the
database.
It includes:
•The names and descriptions of the tables and fields contained in each table.
•Data types. •Field sizes. •Format of fields. •Validation rules. •Primary, compound and foreign keys.
• Think of it as a list along with a description of tables, fields, and columns. The primary goal of a data dictionary
is to help data teams understand data assets.
Description Text 20
The following are the entity relationships – They are categorized as One-To-Many, One-To-One, and Many-To-
Many.
• One to many (1:M) relationship
A one to many (1:M) relationship should be the norm in any relational database design and is found in all
relational database environments. For example, one department has many employees.
Under One-to-Many (1:N) relationship, an instance of entity P is related to more than one instance of entity Q and
an instance of entity Q is related to more than one instance of entity P. Let us see an example
A Person can have more than one Bank Accounts but a bank account can have at most one person as account
holder
Data models are designed to fulfil specific requirements. These are specified on the analysis phase of the
development of a database system. This process includes defining the type of data that the database will
store, the rules and restrictions that will apply to the data, the database applications that will be coupled with
the database and the needs of the users that will interact with the database applications.
One of the most common ways to present a data model is an entity relationship diagram.
An entity is used to represent an object in the real world that can be distinguished from other objects.
An entity can be a physical object (such as a person or a place) or a concept (such as an activity or a
task) for which we need to record data in the database. For example, a physical object could be an
employee, a customer or a product, and a concept could be an online order, a school course or a
booking.
An attribute is used to represent a property, a quality or a characteristic that describes an entity. For
example, the name of an employee, or the date and time that a booking was submitted.
An entity has a value for each of its attributes. These values make up the main body of data that is
stored in the database. For example suppose that you are designing a model for a school where
students are able to book appointments with teachers for parents' evening. A Teacher entity could have
the following attributes and values:
Sometimes an attribute can accept a null value, for example an attribute that is used to record the
middle name for a teacher will be empty for the teachers that only have a first name.
Each set of values that corresponds to a specific teacher is called an instance of the Teacher entity. In
order to distinguish between the different instances of an entity we need to establish an entity
identifier (also known an a key attribute). This is an attribute (or set of attributes) that can be used to
uniquely identify each instance of the entity. For example, the attribute TeacherId is a unique number
that is assigned to each teacher when they are hired. Therefore, it is unique for each instance of the
entity and can be used as an entity identifier. An entity identifier can't be null.
Sometimes one attribute on its own is not enough to uniquely identify each instance of an entity.
Instead a set of the minimum number of attributes that can achieve this goal are combined together
into a composite entity identifier. For example, suppose that the database you are designing has a
TimeOff entity that captures which teachers are absent on specific dates. The model uses the attributes
TeacherId and StartDate as a composite entity identifier. It is not possible to use only one of the entity
attributes as their values are not unique.
Entity descriptions
There is a standard notation for describing entities:
Following from the previous example, the entity description for the Teacher entity is:
Teacher (TeacherId, FirstName, MiddleName, LastName, DateofBirth, HireDate, Email)
The definition starts with the name of the entity, followed by a list of attributes in parentheses
(brackets). The entity identifier is underlined, and it is conventional to specify it as the first attribute (or
set of attributes) in the list.
If the entity identifier is composite, all of the attributes that make up the identifier are underlined. In this
form the entity description for the TimeOff entity is:
TimeOff (TeacherId, StartDate, EndDate, Reason)
Notice that it is conventional to write the entity names in the singular, e.g. Teacher not Teachers.
Entity relationship diagram
A relationship of two or more entities is used to represent an interaction or association that exists
between those entities. For example, an employee works in a company, therefore a relationship exists
between the entities Employee and Company.
In an entity relationship (ER) diagram, each entity is represented by a rectangle. A relationship between
entities is shown as a line and can be one of three types:
One-to-one relationship
One-to-many relationship
Many-to-many relationship
The line between the entities is used to illustrate the cardinality of the relationship. The cardinality
refers to the number of times an instance in one entity can be associated with instances in the related
entity. The cardinality can be one or many:
Relationship categories
The different types of cardinality result in three main categories of relationships:
A one-to-one relationship is when one instance of an entity is associated with only one instance
of another entity.
A one-to-many relationship is when one instance of an entity is associated with more than one
instance of another entity.
A many-to-many relationship is when more than one instances of an entity are associated with
more than one instances of another entity.
Verbalising the nature of the relationship between two entities can help you find out the category of
relationship between them. To do this, form a sentence that describes the relationship from the point of
view of a single instance of each entity. It helps to start your sentences with "Each ...".
For example, think about the ER diagram for a school where students are able to book appointments
with teachers for parents' evening:
An example of a one-to-one relationship is that which exists between head teacher and school:
Each head teacher runs one school
Each school is run by one head teacher
An example of a many-to-many relationship is that which exists between student and teacher:
In a relational database you can't implement many-to-many relationships. However, in the early stages
of database design, you will identify many examples of this type of relationship.
Once you have identified all of the entities and relationships, you can put them together into an ER
diagram.
ER diagram
For this topic you will work through the following scenario:
A sports club requires a relational database to store the data that it needs to manage its courses. The
members can gain certificates to recognise their achievement in a particular course, such as
badminton, swimming or climbing. Most courses have a fee to cover the cost of materials or
equipment. Prior to gaining a certificate, the performance of a member is assessed by an instructor
whose contact details are also stored in the database.
The members of the sports club are young people. On enrolment, each young person is issued a
membership card with a unique membership number. The member’s first name, last name, and
home phone number are recorded.
Each course is identified by a unique six-digit code and has a longer description that will appear
on the certificate. The fee for the course is also recorded.
Each instructor is identified by a unique number. Their first name, last name, and email address
are also recorded.
Members that successfully complete a course to an agreed standard receive a certificate for
their achievement. The date that the certificate was gained is recorded, as well as the identifier of
the instructor.
Firstly you need to identify the main entities for this scenario. Here are the entity descriptions in
standard notation:
Now think about the categories of relationships that exist between the entities.
Many-to-many
One-to-many
The relationship between Course and Certificate is also one-to-many:
Each course can have many certificates (one for each member that completes it).
Each certificate is for just one course.
One-to-many
The relationship between Instructor and Certificate is also one-to-many:
Each certificate has one instructor (the instructor that assessed the course).
Each instructor can carry out many assessments.
One-to-many
• Rule 1- Be in 1NF
• Rule 2- Single Column
Primary Key that does not
functionally dependent
on any subset of
candidate key relation
3NF
Comments On the table above
Third Normal Form (3NF) Rules
A table is said to be in the Third Normal Form when, • We have now separated our database into
different tables.
• We have introduced a primary key to each
database (marked with *).
1. It is in the Second Normal form. • This means that the data entered into that field
2. And, it doesn't have Transitive for each record must be different for every
record. The data in each table is dependent on
Dependency the primary key of that table
i. ArtistDetails table contains details about
each artist
ii. VenueDetails table contains details about
each venue
iii. ArtistsBookings table contains details
about each concert booking for each artist.
• An additional field Concert ID has been added to
this table to create a primary key.
• The fields that are in each of these tables are
directly related to the primary key of that table.
We have taken our database through the three different stages
Relationships of normalization and can now create links between the tables
to create a relational database.
Example 2
1st Normal Form (1NF)
• In this Normal Form, we tackle the problem of atomicity. Here
atomicity means values in the table should not be further divided. In
simple terms, a single cell cannot hold multiple values. If a table
contains a composite or multi-valued attribute, it violates the First
Normal Form.
• In the above table, we can clearly see that the Phone Number column has
two values.
• Thus it violated the 1st NF. Now if we apply the 1st NF to the above table
we get the below table as the result.
This table has a composite primary key Emplyoee ID, Department ID. The non-key attribute is Office Location.
In this case, Office Location only depends on Department ID, which is only part of the primary key.
Therefore, this table does not satisfy the second Normal Form.
• To bring this table to Second Normal Form, we need to break the table into
two parts. Which will give us the below tables:
As you can see we have removed the partial functional dependency that we initially had. Now,
in the table, the column Office Location is fully dependent on the primary key
of that table, which is Department ID.
Example 3
ONF
1 Normal Form
2NF
3NF
23. Datasheet view
refers to row wise and column wise viewing of data in a table in database applications such as Access. The
information pertaining to individual records is provided in individual rows and the attributes related to that record is
given in the corresponding columns
26. Forms
A database form shows all or selected fields for one record. Forms show field names and data in an attractive and
easy-to-read format.
27. Filters
A filter displays records in a database according to criteria you select
28. Reports
A report presents data in an attractive format and is especially suitable for printing. Reports can display data from
tables or queries. All or selected fields can be included in a report. Data can be grouped or sorted and arranged in a
variety of ways
29. Queries:
A query finds records in a database according to criteria you specify.
allows the user to create queries based on a template, usually a set of filters presented in a graphical form. If you are
using database software it might have an option to connect blocks and set the filters you want. The system presents a
blank record and lets you specify the fields and values that define the query. Database management software like
MySQL, Microsoft Access and Oracle have front-end graphical interfaces which make it easier to run QBE
queries.
Below are some examples of searches (queries) and how they would be created using query-by-example.
Example 1
Field: Material Colour Price Product ID Description
Sort: Descending
or:
A search to select the price, product ID and description for all brown leather sofas, in descending order of price
Example 2 Field: Name History Maths
Sort: Ascending
Show: Yes No No
Criteria: >50
or: >50
A search to select the name of all the students, in ascending order, with more than 50 marks in History or more than
50 marks in Maths.
Example 3
Sort:
or:
A search to select the names and marks for all the students with less than 30 marks in both English and Maths.
Example 4
Sort:
or:
A search to select the registration, model and price of all the Ford cars that haven’t been sold yet.
Example 5
Sort:
or:
A search to select the registration, make, model and price of all the cars where the make begins with the letter F.
Example 6
Sort:
Criteria: <01/01/2000
or:
A search to select the first name and surname of all students born before the year 2000.
Example 7
Sort:
or:
A search to select the first name and surname of all students born between 1st and 31st January 2000.
Example 8
A database table, BIKETYRES, is used to keep a record of tyres for sale in a cycle shop. Tyres are categorised by
width and diameter in millimetres, whether they have an inner tube and the type of terrain for which they are
designed. The query-by-example grid below displays the tyre code and the stock level of all 28 mm width tyres
suitable for mixed terrain.
Criteria:
3 3
or:
= 28 = 'Mixed'
Alter the query to show the tyre code and stock level in ascending order of stock level for all 24 mm asphalt terrain
tyres. Write the new query in the following query-by-example grid. [4]
Field: Table:
Sort: Show:
Criteria:
or:
Answer?
Applying Test Data
It is important to test algorithms to check how they perform under a range of conditions.
This includes testing any validation you have created to ensure it performs as expected.
When creating a testing plan, the test data that you use shouldn’t be random values, but rather values that fulfil the
following test criteria.
1. Normal data: Normal data is test data that is typical (expected) and should be accepted by the system.
2. Extreme data or Boundary data: Extreme data is test data at the upper or lower limits of expectations that should
be accepted by the system. A pair of values at each end of a range:
- The data at the upper or lower limits of expectations that should be accepted
- The immediate values before or beyond the limits of expectations that should be rejected
3. Abnormal data (erroneous data): Abnormal data is test data that falls outside of what is acceptable and should be
rejected by the system.
Example: A system has validation to ensure that only integers between 1 and 10 are entered as an input. The test data
for this could be:
• Normal data: from 2 to 9 although 1 and 10 can be included – however, see below
• Boundary data / Extreme: 1 and 10 (to be accepted); 0, 11 (to be rejected)
• Abnormal data (erroneous data): Thirteen, 5.7, 14, outside the range
Validation and Verification
Verification
Verification Method Description
Double entry Data is entered twice and the computer checks that they match up
The user manually reads and compares the newly inputted data against the
Visual check
original source to ensure they match
Verification is performed to ensure that the data entered exactly matches the original source.
Verification is a way of preventing errors when data is copied from one medium to another. Verification does not
check if data makes sense or is within acceptable boundaries, it only checks that the data entered is identical to the
original source. Once we know our data is valid (i.e. it is logical, and in the right format, etc.) then we also need to
check if it is correct (it maybe a valid date-of-birth, but is it your date-of-birth?!)
Validation
Validation is an automatic computer check to ensure that the data entered is sensible, feasible and reasonable.
Validation cannot ensure data is accurate.
When programming, it is important that you include validation for data inputs. This stops unexpected or abnormal
data from crashing your program and prevents you from receiving impossible garbage outputs.
There are several validation methods that can be used to check the input data.
Range Check – this is generally used when working with data which contains numbers, currency, or date and time
values. A range check lets you set appropriate limits:
Boundary Description Validation
Upper limit The maximum price of any item in a shop is $10. <=10
Lower limit In a shop all items have a corresponding cost. >=0
A range Number of hours worked must be less than or equal to 8 but more than 0. >0 and <=8
Type Check – this is a way to confirm that the correct data type is inputted.
• For example, in an application form age may range from 0 to 100. A number data type would be an
appropriate choice for this data. By defining the data type as number, only numbers are allowed in the
field (e.g. 18, 20, 25) and it would prevent people from inputting verbal data, like ‘eighteen’.
• Some data types are capable of doing an extra type check. For example, a date data type will ensure
that a date inputted existed at some point in the past, or will exist in the future. It would not, for
example, accept the date 30/02/2018.
Check Digit – this is used to find out if a series of numbers has been keyed correctly. There are many ways to
produce check digits.
• For example, the ISBN-10 numbering system for books uses ‘Modulo-11’ division, where it outputs
the remainder of the division as the result of the operation.
Length Check – this is used to make sure that the correct number of characters are entered into the field. It
confirms that the character string entered is neither too short nor too long.
• For example, consider a password that needs to be 8 characters long. The length check will ensure
that exactly 8 characters are entered into the field.
Lookup – this helps to lessen errors in a field with a limited list of values.
• For example, the fact that there are only 12 possible months in a year ensures that the list of possible
values is limited.
• Advantages of a lookup list are as follows:
o Faster data entry—because it is typically much faster to select from a list than to type each individual
entry.
o Enhanced accuracy—because it lessens the risk of spelling mistakes.
o Greater ease of use—because it limits the options to choose from by only displaying the essential
choices.
Format Check – this checks that the input data is in the right format.
• For example, a National Insurance number is in the form XX 99 99 99 XX where X is any letter and
9 is any number.
Presence Check – this kind of check makes sure that an essential or required field cannot be left blank: it must be
filled in.
• If someone attempts to leave the field blank, then an error message will be displayed, and they won’t be
able to proceed to the next step, nor will they be able to save any other data which they have entered.
• Database fields should have validation rules to make sure the data entered follows the expected format.
Validation is an automatic check to ensure that the data entered is sensible and feasible. Validation cannot ensure
data is actually accurate. There are different types of validation checks a database can run:
8.2 Database Management System (DBMS)
▪ Show understanding of the features provided by a Database Management System (DBMS)
- A schema is a blueprint of the database which specifies what fields will be present and what would be their
types. For example, an employee table will have an employee_ID column represented by a string of 10 digits and
an employee_Name column with a string of 45 characters.
- Data model is a high-level design which decides what can be present in the schema. It provides a database user
with a conceptual framework in which we specify the database requirements of the database user and the structure
of the database to fulfil these requirements.
- A data model can, for example, be a relational model where the data will be organised in tables whereas the
schema for this model would be the set of attributes and their corresponding domains.
▪ data modelling - Why use Data Model?
The primary goal of using data model are:
• Ensures that all data objects required by the database are accurately represented. Omission of data will lead to
creation of faulty reports and produce incorrect results.
• A data model helps design the database at the conceptual, physical and logical levels.
• Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
• It provides a clear picture of the base data and can be used by database developers to create a physical database.
• It is also helpful to identify missing and redundant data.
• Though the initial creation of data model is labour and time consuming, in the long run, it makes your IT
infrastructure upgrade and maintenance cheaper and faster.
Types of Data Models in DBMS
At this Data Modelling level, no primary or secondary key is defined. At this Data modelling level, you need to
verify and adjust the connector details that were set earlier for relationships.
• To develop Data model, one should know physical data stored characteristics.
• This is a navigational system produces complex application development, management. Thus, it requires a
knowledge of the biographical truth.
• Even smaller change made in structure require modification in the entire application.
• There is no set data manipulation language in DBMS.
▪ Logical Schema
The design of the database is called a schema.
A database schema is a set of rules that define the architecture of our database and data collection needs. Each
company will have its own database needs and will collect different information based on its business goals.
For instance, a store may need to collect transaction information while a free programming education site may
only need to collect user information and settings.
There are mainly three levels of data abstraction:
1. The internal level has an internal schema, which
describes the physical storage structure of the
database and access paths. The internal schema uses a
physical data model and describes the complete
details of data storage and access paths for the
database.
2. The external or view level includes a number of external schemas or user views. It describes various user views.
Each external schema describes the part of the database that a particular user group is interested in and hides the
rest of the database from that user group. A high-level data model or an implementation data model can be used
at this level.
3. Conceptual or Logical Level: Structure and
constraints for the entire database. Logical
Schema defines the design of the database at the
conceptual level of the data abstraction. At this
level, we define the entities, attributes, constraints,
relationships, etc. and how their relationship
would be logically implemented. The
programmers and the DBA work at this level and
they do all these implementations.
▪ data security, including backup procedures and the use of access rights to individuals / groups of users
Data security is a set of processes and practices designed to protect your critical information technology (IT)
ecosystem. This included files, databases, accounts, and networks. Effective data security adopts a set of controls,
applications, and techniques that identify the importance of various datasets and apply the most appropriate
security controls.
Effective data security considers the sensitivity of various datasets and corresponding regulatory compliance
requirements. Like other cybersecurity postures — perimeter and file security to name a few — data security isn’t
the end-all-be-all for keeping hackers at bay. Rather, data security is one of many critical methods for evaluating
threats and reducing the risk associated with data storage and handling.
Then there’s the reputational risk of a data breach or hack. If you don’t take data security seriously, your
reputation can be permanently damaged in the event of a publicized, high-profile breach or hack. Not to mention
the financial and logistical consequences if a data breach occurs. You’ll need to spend time and money to assess
and repair the damage, as well as determine which business processes failed and what needs to be improved.
Types of Data Security
• Access Controls: This type of data security measures includes limiting both physical and digital access to critical
systems and data. This includes making sure all computers and devices are protected with mandatory login entry,
and that physical spaces can only be entered by authorized personnel.
• Authentication: Similar to access controls, authentication refers specifically to accurately identifying users before
they have access to data. This usually includes things like passwords, PIN numbers, security tokens, swipe cards,
or biometrics.
• Backups & Recovery: Good data security means you have a plan to securely access data in the event of system
failure, disaster, data corruption, or breach. You’ll need a backup data copy, stored on a separate format such as a
physical disk, local network, or cloud to recover if needed.
• Data Erasure: You’ll want to dispose of data properly and on a regular basis. Data erasure employs software to
completely overwrite data on any storage device and is more secure than standard data wiping. Data erasure
verifies that the data is unrecoverable and therefore won’t fall into the wrong hands.
• Data Masking: By using data masking software, information is hidden by obscuring letters and numbers with
proxy characters. This effectively masks key information even if an unauthorized party gains access to it. The data
changes back to its original form only when an authorized user receives it.
• Data Resiliency: Comprehensive data security means that your systems can endure or recover from failures.
Building resiliency into your hardware and software means that events like power outages or natural disasters
won’t compromise security.
• Encryption: A computer algorithm transforms text characters into an unreadable format via encryption keys. Only
authorized users with the proper corresponding keys can unlock and access the information. Everything from files
and a database to email communications can — and should — be encrypted to some extent.
Main Elements of Data Security
There are three core elements to data security that all organizations should adhere to: Confidentiality, Integrity,
and Availability. These concepts are also referred to as the CIA Triad, functioning as a security model and
framework for top-notch data security. Here’s what each core element means in terms of keeping your sensitive
data protected from unauthorized access and data exfiltration.
• Confidentiality. Ensures that data is accessed only by authorized users with the proper credentials.
• Integrity. Ensure that all data stored is reliable, accurate, and not subject to unwarranted changes.
• Availability. Ensures that data is readily — and safely — accessible and available for ongoing business needs.
Data Security Technologies
Using the right data security technologies can help your organization prevent breaches, reduce risk, and sustain
protective security measures.
• Data Auditing: Security breaches are often inevitable, so you’ll need to have a process in place that gets to the
root cause. Data auditing software solutions capture and report on things like control changes to data,
records of who accessed sensitive information, and the file path utilized. These audit procedures are all vital
to the breach investigation process. Proper data auditing solutions also provide IT administrators with
visibility in preventing unauthorized changes and potential breaches.
• Data Real-Time Alerts: Typically, it takes companies several months before they discover that a data breach
has actually taken place. All too often, companies discover breaches via their customers or third-party
vendors and contractors rather than their own IT departments. By using real-time systems and data
monitoring technology, you’ll be able to discover breaches more quickly. This helps you mitigate data
destruction, loss, alteration, or unauthorized access to personal data.
• Data Risk Assessment: A data risk assessment will help your organization identify its most overexposed,
sensitive data. A complete risk assessment will also offer reliable and repeatable steps towards prioritizing and
remediating serious security risks. The process begins by identifying sensitive data that’s accessed via global
groups, data that’s become stale, or data with inconsistent permissions. An accurate risk assessment will
summarize important findings, expose vulnerabilities, and include prioritized remediation recommendations.
• Data Minimization: Traditionally, organizations viewed having as much data possible as a benefit. There was
always the potential that it might come in handy in the future. Today, large amounts of data are seen as a
liability from a security standpoint. The more data you have, the greater the number of targets for hackers.
That’s why data minimization is now a key security tactic. Never hold more data than necessary and follow
all data minimization best practices.
• Purge Stale Data: If data doesn’t exist within your network, it can’t be compromised. That’s why you’ll want
to purge old or unnecessary data. Use systems that can track file access and automatically archive unused
files. In the modern age of yearly acquisitions, reorganizations, and “synergistic relocations,” it’s quite likely
that networks of any significant size have multiple forgotten servers that are kept around for no good reason.
▪ Show understanding of how software tools found within a DBMS are used in practice
The use and purpose of developer interface
Database Queries
• Databases allow us to store and filter data to find specific information. A database can be queried using a
variety of methods, although this depends on the software you are using
• A major benefit of storing information in a database is the ability to perform queries.
• A query is the tool that allows us to ask the database a question and get back any matching records (a search).
• We create queries by choosing at least one set of criteria upon which we wish to search. Complex multiple
criteria searches are possible and the results can be sorted in ascending or descending order.
• Queries are performed using a special language called SQL, however most Database Management Systems also
provide an easier visual method of creating a query.
• These visual tools are sometimes referred to as a query-by-example.
Query language
• Query language is a written language used only to write specific queries. This is a powerful tool as the user can
define precisely what is required in a database. SQL is a popular query language used with many databases.
DDL (Data Definition Language):
consists of the SQL commands that can be used to define the database schema. It simply deals with descriptions
of the database schema and is used to create and modify the structure of database objects in the database. DDL is
a set of SQL commands used to create, modify, and delete database structures but not data. These commands are
normally not used by a general user, who should be accessing the database via an application.
List of DDL commands:
• CREATE: This command is used to create the database or its objects (like table, index, function, views,
store procedure, and triggers).
• DROP: This command is used to delete objects from the database.
• TRUNCATE: This is used to remove all records from a table, including all spaces
Commands of DML
Command Description
1. SELECT
SELECT command or statement in SQL is used to fetch data records from the database
table and present it in the form of a result set. It is usually considered as a DQL command
but it can also be considered as DML.
In this example, we have fetched fields such as customer_id, sale_date, order_id and
store_state from customers table. Next, suppose if we want to fetch all the records from
the customers table. This can be achieved by a simple query as shown below.
Suppose if we have to insert values into all the fields of the database table, then we need
not specify the column names, unlike the previous query. Follow the following query for
further illustration.
In this example, we have successfully inserted all the values without having to specify the
fieldnames.
3. UPDATE
UPDATE command or statement is used to modify the value of an existing column in a
database table.
UPDATE table_name
SET column_name_1 = value1, column_name_2 = value2, ...
WHERE condition;
Having learnt the syntax, let us now try an example based on the UPDATE statement in
SQL.
UPDATE customers
SET store_state = 'DL'
WHERE store_state = 'NY';
In this example, we have modified the value of store_state for a record where store_state
was ‘NY’ and set it to a new value ‘DL’.
4. DELETE
DELETE statement in SQL is used to remove one or more rows from the database table. It
does not delete the data records permanently. We can always perform a rollback
operation to undo a DELETE command. With DELETE statements we can use the WHERE
clause for filtering specific rows.
Having
H i llearntt th
the syntax,
t we are allll sett tto ttry an example
l bbased
d on th
the DELETE command
d
in SQL.
DELETE FROM customers
WHERE store_state = 'MH'
AND customer_id = '1001';
In this example, we have removed a row from the customer’s table where store_state was
‘MH’ and customer_id was ‘1001’.
Conclusion
DML commands are used to modify or manipulate data records present in the database
tables. Some of the basic DML operations are data insert (INSERT), data updation
(UPDATE), data removal (DELETE) and data querying (SELECT).
DDL (Data Definition Language) Command in SQL
DDL or Data definition language is actually the definition or description of the database structure or
schema, it won't change the data inside the database. Create, modify, and delete the database
structures, but not the data. Only These commands are not done by all the users, who have access to
the database via an application.
SQL Create the database or its object (ie table, index, view, function, etc.).
Syntax
Example
Syntax
Example
Drop command helps to delete the object from the database (ie table, index, view, function, etc.).
Syntax
Example
Syntax
Example
Syntax
Example
Syntax
Example
Syntax
Example
Syntax
Example
Syntax
Example
1 ALTER TABLE Student
2 MODIFY Total Varchar(255);
TRUNCATE Command in SQL
Syntax
Example
SQL Comment is helpful to add comments to the data dictionary."--" is used to comment on the notes.
Syntax
--(notes,examples)
Example
1) PostgreSQL
Syntax
Example
2) MySQL
Example
3) SQL Server
Ask Question
In SQL Server we can rename the database through server application, by right click the existing
database and renaming it.
DQL (Data Query Language) Command in SQL
DQL or data query language is to perform the query on the data inside the schema or object (ie table,
index, view, function, etc). With the help of a DQL query, we can get the data from the database to
perform actions or operations like analyzing the data.
SQL SELECT a query on a table or tables to view the temporary table output from the database.
Syntax
Example
Syntax
Example
2) Column names do not need to mention in the query, Values should be given in the order according
to the column.
Syntax
Example
1 UPDATE Student
2 SET FirstName = "Navin" , LastName = "Kumar"
3 WHERE StudentId=12345;
SQL Delete command helps to delete the records from a database table.
Syntax
Syntax
Syntax
EXEC SQL
CALL GETEMPSVR (2, NULL)
END-EXEC
EXPLAIN PLAN
Syntax
Example
This query explanation will be stored in the PLAN_TABLE table. We can then select the execution plan
to review the queries.