0% found this document useful (0 votes)
25 views85 pages

Database Concepts

A database is an integrated collection of related records managed by a database management system (DBMS), which allows for efficient data storage, retrieval, and management. The document discusses the limitations of file-based systems, such as data redundancy and inconsistency, and highlights the advantages of relational databases, including minimal redundancy and better data integrity. Key concepts like entities, attributes, primary keys, and referential integrity are also explained, emphasizing their roles in organizing and maintaining data quality within a database.

Uploaded by

Tanatswa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views85 pages

Database Concepts

A database is an integrated collection of related records managed by a database management system (DBMS), which allows for efficient data storage, retrieval, and management. The document discusses the limitations of file-based systems, such as data redundancy and inconsistency, and highlights the advantages of relational databases, including minimal redundancy and better data integrity. Key concepts like entities, attributes, primary keys, and referential integrity are also explained, emphasizing their roles in organizing and maintaining data quality within a database.

Uploaded by

Tanatswa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

8.

Database
1. Database
is an integrated collection of logically related records or files.
A database consolidates records previously stored in separate files into a common pool of data records that provides
data for many applications. The data is managed by systems software called database management systems (DBMS).
The data stored in a database is independent of the application programs using it and of the types of secondary
storage devices on which it is stored.
A database management system (DBMS) is a software package designed to store, retrieve, query and manage data.
Database management systems are important because they provides programmers, database administrators and end
users with a centralized view of data and free applications and end users from having to understand where data is
physically located.
Well-known DBMSes include:
• Access – a lightweight relational database management system (RDMS) included in Microsoft Office
and Office 365.
• Amazon RDS – a native cloud DBMS that offers engines for managing MySQL, Oracle, SQL
Server, PostgreSQL and Amazon Aurora databases.
• Apache Cassandra - an open-source distributed database management system known for being able to handle
massive amounts of data.
• Filemaker - a low-code/no-code (LCNC) relational DBMS.
• MySQL – an open-source relational database management system (RDBMS) owned by Oracle.
• MariaDB - an open-source fork of MySQL.
• Oracle - a proprietary relational database management system optimized for hybrid cloud architectures.
• SQL Server – an enterprise-level relational database management system from Microsoft that is capable of
handling extremely large volumes of data and database queries.
The Relational Database Management System (RDBMS) is a database that uses a relational model to store data in
tables. The table comprises rows and columns, with each column containing an entry for data for a given category
and each row including an instance for that data determined by the category.
8.1 Database Concepts
▪ Show understanding of the limitations of using a file-based approach for the storage and retrieval of data
• In a computer, a file system -- sometimes written filesystem -- is the way in which files are named and where
they are placed logically for storage and retrieval.
• Without a file system, stored information wouldn't be isolated into individual files and would be difficult to
identify and retrieve.
• When there is only a single table in the database, this is called a 'flat file database'.
• It is an application program designed to manipulate data files.
• A "flat file" database allows the user to specify data attributes (columns, datatypes, etc) for one table at a time,
storing those attributes independently of an application. dBase III and Paradox were good examples of this
kind of database
• As an example

in the file-oriented approach of data processing, each department has its own files, which are specifically designed
for that department's applications. The figure above shows three departments—academics, accounts, and library,
each of these departments maintains a file containing students' personal details in addition to the file that is
necessary for its own application, Therefore, each application has a separate master file and its own set of personal
files, the following are the various types of files maintained for each application. The patient database is an
example of a flat-file as all of the information is stored in one single table:
Disadvantage of File-oriented system:

1. Data Redundancy: It is possible that the same information may be duplicated in different files. this leads to data
redundancy results in memory wastage.
2. Data Inconsistency: Because of data redundancy, it is possible that data may not be in consistent state.
3. Difficulty in Accessing Data: Accessing data is not convenient and efficient in file processing system.
4. Limited Data Sharing: Data are scattered in various files. also, different files may have different formats and these
files may be stored in different folders may be of different departments. So, due to this data isolation, it is difficult
to share data among different applications.
5. Integrity Problems: Data integrity means that the data contained in the database in both correct and consistent.
for this purpose, the data stored in database must satisfy correct and constraints.
6. Atomicity Problems: Any operation on database must be atomic. this means, it must happen in its entirely or not
at all.
7. Concurrent Access Anomalies: Multiple users are allowed to access data simultaneously, this is for the sake of
better performance and faster response.
8. Security Problems: Database should be accessible to users in limited way. Each user should be allowed to access
data concerning his requirements only
9. Incompatible File Formats: As the structure of the files is embedded in application programs, the structure is fully
dependent on application programming languages.
10. Fixed Queries: File based systems are very much dependent on application programs. Any report or query needed
by the organisation has to be developed by the application programmer. With type, the time, and number of
queries or reports increases. Producing dissimilar types of queries or reports is not possible in File Based Systems.
As a result, in some organisations the kind of queries or reports to be produced is fixed. No latest query or report
of the data could be produced.
▪ Describe the features of a relational database that address the limitations of a file-based approach
A relational database takes this "flat file" approach several logical steps further, allowing the user to specify
information about multiple tables and the relationships between those tables, and often allowing much more
declarative control over what rules the data in those tables must obey. The benefits of the database approach are
as follows:

1. Ease of application development: The programmer is no longer burdened with designing, building and
maintaining master files.
2. Minimal data redundancy: All data files are integrated into a composite data structure. In practice, not all
redundancy is eliminated, but at least the redundancy is controlled. Thus, inconsistency is reduced.
3. Enforcement of standards: The database administrator can define standards for names, etc.
4. Data can be shared. Physical data independence: Data descriptions are independent of the application programs.
This makes program development and maintenance an easier task. Data is stored independently of the program
that uses it.
5. Logical data independence: Data can be viewed in different ways by different users.
6. Better modelling of real-world data: Databases are based on semantically rich data models that allow the accurate
representation of real-world information.
7. Uniform security and integrity controls: Security control ensures that applications can only access the data they
are required to access. Integrity control ensures that the database represents what it purports to represent.
8. Economy of scale: Concentration of processing, control personal and technical expertise.
File Size Estimation – Calculations
▪ Show understanding of and use the terminology associated with a relational database model
1. Entity
An entity is considered strong if it can exist apart from all of its related entities. When you build a database, you are
organising data about entities. An entity is any item that has its attributes stored as data. An entity could be
anything, e.g. a book, a person, a film, a country or a football team. An entity can be a real-world object, either
animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes,
and courses offered can be considered as entities. All these entities have some attributes or properties that give them
their identity. An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise, a Teachers set
may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.
There is a standard notation for describing entities:
EntityName (EntityIdentifier, attribute1, attribute2, attribute3, ...)
Following from the previous example, the entity description for the Teacher entity is:
Teacher (TeacherId, FirstName, MiddleName, LastName, DateofBirth, HireDate, Email)
2. Attributes:
Each entity is described by a set of attributes (e.g., Student = (Name, Address, Date of Birth, Form, Set) and each
attribute has a name, and is associated with an entity and a domain of legal values.
Entities are represented by means of their properties, called attributes. All attributes have values. For example, a
student entity may have name, class, and age as attributes. There exists a domain or range of values that can be
assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's
age cannot be negative, etc. The details about entities are called attributes. A person is an entity with attributes
including age, height and nationality, among many others. When you design a database, you need to think about
which attributes you want to store. For example, the attributes of a film could include title, duration, certificate,
rating, genre, cast, director and year of creation. In other words - details about the entity
3. Bit (Character)
a bit is the smallest unit of data representation (value of a bit may be a 0 or 1). Eight bits make a byte which can
represent a character or a special symbol in a character code.
4. Field
A field consists of a grouping of characters.
A data field represents an attribute (a
characteristic or quality) of some entity
(object, person, place, or event). The
columns in a table are also referred to as an
attribute.
5. Record
a record represents a collection of attributes
that describe a real-world entity. A record consists of fields, with each field describing an attribute of the entity. A
tuple is one record (one row).
You can also think of it this way: an attribute is used to define the record and a record contains a set of attributes.
6. Table
a set of fields and records. The table contains all of the fields and the records for one type of entity. A database may
contain more than one table.

7. File
a group of related records. Frequently classified by the application for which they are primarily used EG. employee
file.

✓ Caragh’s database record is highlighted.

✓ There are three records in this database table.

✓ There are four fields in this database table.


8. Data types
There are a lot of data types. However, the most common are as follows:
• Number e.g. Integer – any number that doesn’t have a decimal point, Real, Decimal

• Date / Time – specifies data related to Date and time and must be formatted

• Text – often referred to as “string,” means simply any combination of letters instead of numbers or other
symbols but can also be that
• Currency – financial numeric data

• Boolean – TRUE or FALSE data, often migrated to YES or NO text, or 1 and 0 numbers. It is, in simple
terms, binary data
9. Primary Key
A primary key in a file is the field (or fields) whose value identifies a record among others in a data file.
It contains a unique identifier for each record. To make each record in a database unique we normally assign them a
primary key. Even if a record is deleted from a database, the primary key will not be used again. The primary key can
be automatically generated and will normally just be a unique number or mix of numbers and letters. The main key
– unique (pieces of data) field that can search things quickly. This allows you to create relationships between
databases. The primary key is selected from one of the candidate keys and becomes the identifying key of a table. It
can uniquely identify any data row of the table. Some ‘natural’ primary keys are:

✓ Car Registration Number


✓ ISBN – a 10-digit code that uniquely identifies a book
✓ MAC number – a 6-part number that uniquely identifies a network card
10. Candidate Key
It is a simple or composite key that is unique (no two rows in a table may have the same value) and minimal (every
column is necessary). The candidate keys in a table are defined as the set of keys that is minimal and can uniquely
identify any data row in the table.
11. Secondary Key
Only one of the candidate keys is selected as the primary key. The rest of them are known as secondary keys.
12. Alternate key
all candidate keys not chosen as the primary key
13. Super Key
Super Key is the superset of primary key. The super key contains a set of attributes, including the primary key,
which can uniquely identify any data row in the table.

14. Composite Key


If any single attribute of a table is not capable of being the key i.e. it cannot identify a row uniquely, then we
combine two or more attributes to form a key. This is known as a composite key.
15. Foreign Key
A foreign key is an attribute value in a table that acts as the primary key in another. Hence, the foreign key is useful
in linking together two tables. Data should be entered in the foreign key column with great care, as wrongly entered
data can invalidate the relationship between the two tables.
16. Composite Key
Composed of two or more attributes, but it must be minimal
17. Metadata

in simple terms, refers to data about data. The term metadata is used for different types of data in different
contexts. Metadata in DBMS is characterized as data about data. It describes the context and information about data,
the way that data is stored and the various relations among data. Metadata in a relational DBMS stores data about
Constraints, Table Relationships, Data Types, Columns, Tables, and so on.
18. Referential Integrity
• Referential integrity ensures that records in related tables are linked correctly
• is a database constraint that ensures that references between data are indeed valid and intact.
• It’s a database management safeguard that ensures every foreign key matches a primary key. For example,
customer numbers in a customer file are the primary keys, and customer numbers in the order file are the foreign
keys. If a customer record is deleted, the order records must also be deleted; otherwise they are left without a primary
reference.
• Referential integrity is a fundamental principle of database theory and arises from the notion that a database
should not only store data, but should actively seek to ensure its quality.
• It is the logical dependency of a foreign key on a primary key.

There are many benefits of defining referential integrity in a database.


• Improved data quality. An obvious benefit is the boost to the quality of data that is stored in a database. There can
still be errors, but at least data references are genuine and intact.
• Fewer bugs. The declarations of referential integrity are more concise than the equivalent programming code. In
essence, such declarations reuse the tried and tested general-purpose code in a database engine, rather than
redeveloping the same logic on a case-by-case basis.
• Consistency across applications. Referential integrity ensures the quality of data references across the multiple
application programs that may access a database
• Faster development. Referential integrity is declared. This is much more productive (one or two orders of
magnitude) than writing custom programming code.
Besides data integrity and data consistency, referential integrity provides these benefits:

19. Data integrity


• Refers to the maintenance and assurance that the data in a database are correct and consistent.
• At its most basic level, data integrity is the accuracy and consistency of data across its entire life cycle, from when it
is captured and stored to when it is processed, analysed and used.
• Data integrity management means ensuring data is complete and accurate, free from errors or anomalies that could
compromise data quality.
• Data that has been accurately and consistently recorded and stored will retain its integrity, while data that has been
distorted or corrupted cannot be trusted or relied upon for business use.
20. Indexing
is a data structure technique which allows you to quickly retrieve records from a database file.
An Index is a small table having only two columns.
• The first column comprises a copy of the primary or candidate key of a table.
• Its second column contains a set of pointers for holding the address of the disk block where that specific key
value stored.
Advantages of Indexing
• Important pros/ advantage of Indexing are:
• It helps you to reduce the total number of I/O operations needed to retrieve that data, so you don’t need to
access a row in the database from an index structure.
• Offers Faster search and retrieval of data to users.
• Indexing also helps you to reduce tablespace as you don’t need to link to a row in a table, as there is no need to
store the ROWID in the Index. Thus, you will able to reduce the tablespace.
• You can’t sort data in the lead nodes as the value of the primary key classifies it.
Disadvantages of Indexing
• Important drawbacks/cons of Indexing are:
• To perform the indexing database management system, you need a primary key on the table with a unique
value.
• You can’t perform any other indexes in Database on the Indexed data.
• You are not allowed to partition an index-organized table.
• SQL Indexing Decrease performance in INSERT, DELETE, and UPDATE query.
21. Data dictionary

• is a software module and database containing descriptions and definitions concerning the structure, data elements,
interrelationships, and other characteristics of an organization's database. A data dictionary:
1. Contains all the data definitions, and the information necessary to identify data ownership
2. Ensures security and privacy of the data, as well as the information used during the development and
maintenance of applications which rely on the database
• A data dictionary is a collection of metadata such as object name, data type, size, classification, and relationships
with other data assets. A data dictionary acts as a reference guide on a dataset.
Data dictionary - (Idea behind it)
A data dictionary is a crucial part of a relational database as it provides additional information about the relationships
between multiple tables in a database. It describes the structure and attributes of data to be used or within the
database.
It includes:
•The names and descriptions of the tables and fields contained in each table.
•Data types. •Field sizes. •Format of fields. •Validation rules. •Primary, compound and foreign keys.
• Think of it as a list along with a description of tables, fields, and columns. The primary goal of a data dictionary
is to help data teams understand data assets.

What is in a data dictionary?


An elementary example of a data dictionary

Data dictionary – Example 1

Field name/ Attribute Data type Field size Format

Product Code Text 6 XX99XX

Description Text 20

Category Code Integer 4 9999

Price Currency 3.2 $999.99

Data dictionary – Example 2


A relationship of two or more entities is used to represent an interaction or association that exists between those
entities. For example, an employee works in a company, therefore a relationship exists between the entities Employee
and Company. The associations or interactions between entities; used to connect related information between tables.

The following are the entity relationships – They are categorized as One-To-Many, One-To-One, and Many-To-
Many.
• One to many (1:M) relationship
A one to many (1:M) relationship should be the norm in any relational database design and is found in all
relational database environments. For example, one department has many employees.
Under One-to-Many (1:N) relationship, an instance of entity P is related to more than one instance of entity Q and
an instance of entity Q is related to more than one instance of entity P. Let us see an example
A Person can have more than one Bank Accounts but a bank account can have at most one person as account
holder

• One to one (1:1) relationship


A one to one (1:1) relationship is the relationship of one entity to only one other entity, and vice versa. It should be
rare in any relational database design. In fact, it could indicate
that two entities actually belong in the same table. An
example from the COMPANY database is one employee is
associated with one spouse, and one spouse is associated with
one employee. Under One-to-One (1:1) relationship, an instance of entity P is related to instance of entity Q and
an instance of entity Q is related to instance of entity P. Let us see an example – a person can have only one
passport, and a passport is assigned to a single person.
• Many to many (M:N) relationships
Under Many-to-Many (M: N) yes with an N relationship, more than one instance of entity P is related to more
than one instance of entity Q. For more than one instance of entity Q is related to more than one instance of entity
P. Let us see an example − A person can have more than one skill. More than one person can attain a skill.
For a many to many relationship, consider the following points:
✓ It cannot be implemented as such in the relational
model.
✓ It can be changed into two 1:M relationships.
✓ It can be implemented by breaking up to produce a
set of 1:M relationships.
✓ It involves the implementation of a composite entity.
✓ Creates two or more 1:M relationships.
✓ The composite entity table must contain at least the primary keys of the original tables.
✓ The linking table contains multiple occurrences of the foreign key values.
✓ Additional attributes may be assigned as needed.
✓ It can avoid problems inherent in an M: N relationship by creating a composite entity or bridge entity.
For example, an employee can work on many projects OR a project can have many employees working
on it, depending on the business rules. Or, a student can have many classes and a class can hold many
students
▪ Use an entity-relationship (E-R) diagram to document a database design
Entity relationship modelling
One of the first steps of database design is to produce a data model. A data model is used to represent and
visualise how data is structured for a given scenario, the relationships between data elements, and
constraints or limitations that may exist.

Data models are designed to fulfil specific requirements. These are specified on the analysis phase of the
development of a database system. This process includes defining the type of data that the database will
store, the rules and restrictions that will apply to the data, the database applications that will be coupled with
the database and the needs of the users that will interact with the database applications.

One of the most common ways to present a data model is an entity relationship diagram.

Entities and attributes

An entity is used to represent an object in the real world that can be distinguished from other objects.
An entity can be a physical object (such as a person or a place) or a concept (such as an activity or a
task) for which we need to record data in the database. For example, a physical object could be an
employee, a customer or a product, and a concept could be an online order, a school course or a
booking.

An attribute is used to represent a property, a quality or a characteristic that describes an entity. For
example, the name of an employee, or the date and time that a booking was submitted.

An entity has a value for each of its attributes. These values make up the main body of data that is
stored in the database. For example suppose that you are designing a model for a school where
students are able to book appointments with teachers for parents' evening. A Teacher entity could have
the following attributes and values:

eacherId FirstName MiddleName LastName DateofBirth HireDate E

208 Li Bella Wang 05/12/1980 03/07/2000 li.wang@e

209 Simon Bennett 19/11/1990 05/09/2000 simon.bennet

Sometimes an attribute can accept a null value, for example an attribute that is used to record the
middle name for a teacher will be empty for the teachers that only have a first name.
Each set of values that corresponds to a specific teacher is called an instance of the Teacher entity. In
order to distinguish between the different instances of an entity we need to establish an entity
identifier (also known an a key attribute). This is an attribute (or set of attributes) that can be used to
uniquely identify each instance of the entity. For example, the attribute TeacherId is a unique number
that is assigned to each teacher when they are hired. Therefore, it is unique for each instance of the
entity and can be used as an entity identifier. An entity identifier can't be null.

Sometimes one attribute on its own is not enough to uniquely identify each instance of an entity.
Instead a set of the minimum number of attributes that can achieve this goal are combined together
into a composite entity identifier. For example, suppose that the database you are designing has a
TimeOff entity that captures which teachers are absent on specific dates. The model uses the attributes
TeacherId and StartDate as a composite entity identifier. It is not possible to use only one of the entity
attributes as their values are not unique.

TeacherId StartDate EndDate Reason

208 14/09/2022 15/09/2022 Sick Leave

208 25/10/2022 01/11/2022 Holiday

209 25/10/2022 25/10/2022 Jury Duty

Entity descriptions
There is a standard notation for describing entities:

EntityName (EntityIdentifier, attribute1, attribute2, attribute3, ...)

Following from the previous example, the entity description for the Teacher entity is:
Teacher (TeacherId, FirstName, MiddleName, LastName, DateofBirth, HireDate, Email)

The definition starts with the name of the entity, followed by a list of attributes in parentheses
(brackets). The entity identifier is underlined, and it is conventional to specify it as the first attribute (or
set of attributes) in the list.

If the entity identifier is composite, all of the attributes that make up the identifier are underlined. In this
form the entity description for the TimeOff entity is:
TimeOff (TeacherId, StartDate, EndDate, Reason)

Notice that it is conventional to write the entity names in the singular, e.g. Teacher not Teachers.
Entity relationship diagram

A relationship of two or more entities is used to represent an interaction or association that exists
between those entities. For example, an employee works in a company, therefore a relationship exists
between the entities Employee and Company.

In an entity relationship (ER) diagram, each entity is represented by a rectangle. A relationship between
entities is shown as a line and can be one of three types:

One-to-one relationship

One-to-many relationship

Many-to-many relationship
The line between the entities is used to illustrate the cardinality of the relationship. The cardinality
refers to the number of times an instance in one entity can be associated with instances in the related
entity. The cardinality can be one or many:

For a cardinality of one, the end of the relationship line is straight.


For a cardinality of many, the end of the relationship line is a splayed line (commonly referred to
as a crow's foot).

Relationship categories
The different types of cardinality result in three main categories of relationships:

A one-to-one relationship is when one instance of an entity is associated with only one instance
of another entity.
A one-to-many relationship is when one instance of an entity is associated with more than one
instance of another entity.
A many-to-many relationship is when more than one instances of an entity are associated with
more than one instances of another entity.

Verbalising the nature of the relationship between two entities can help you find out the category of
relationship between them. To do this, form a sentence that describes the relationship from the point of
view of a single instance of each entity. It helps to start your sentences with "Each ...".

For example, think about the ER diagram for a school where students are able to book appointments
with teachers for parents' evening:

An example of a one-to-one relationship is that which exists between head teacher and school:
Each head teacher runs one school
Each school is run by one head teacher

An example of a one-to-many relationship is between student and appointment:


Each student books many appointments
Each appointment is for one student

An example of a many-to-many relationship is that which exists between student and teacher:

Each student is taught by many teachers


Each teacher has many students

In a relational database you can't implement many-to-many relationships. However, in the early stages
of database design, you will identify many examples of this type of relationship.

Once you have identified all of the entities and relationships, you can put them together into an ER
diagram.
ER diagram

A scenario for an ER diagram

For this topic you will work through the following scenario:
A sports club requires a relational database to store the data that it needs to manage its courses. The
members can gain certificates to recognise their achievement in a particular course, such as
badminton, swimming or climbing. Most courses have a fee to cover the cost of materials or
equipment. Prior to gaining a certificate, the performance of a member is assessed by an instructor
whose contact details are also stored in the database.

The members of the sports club are young people. On enrolment, each young person is issued a
membership card with a unique membership number. The member’s first name, last name, and
home phone number are recorded.
Each course is identified by a unique six-digit code and has a longer description that will appear
on the certificate. The fee for the course is also recorded.
Each instructor is identified by a unique number. Their first name, last name, and email address
are also recorded.
Members that successfully complete a course to an agreed standard receive a certificate for
their achievement. The date that the certificate was gained is recorded, as well as the identifier of
the instructor.

Firstly you need to identify the main entities for this scenario. Here are the entity descriptions in
standard notation:

Member (MemberId, FirstName, LastName, Phone)


Course (CourseCode, Description, Fee)
Instructor (InstructorId, FirstName, LastName, Email)
Certificate (MemberId, CourseCode, AssessmentDate, InstructorId)
Remember that it is conventional to write the entity names in the singular, e.g. Member not Members.
Notice the composite entity identifier for the Certificate entity. The combination of the two attributes (
MemberId and CourseCode) will ensure that each instance will be unique and that no entries can be
duplicated.

Now think about the categories of relationships that exist between the entities.

The relationship between Member and Course is many-to-many:

Each member can be assessed for many courses


Each course is assessed for many members.

Many-to-many

The relationship between Member and Certificate is one-to-many:

Each member can gain many certificates.


Each certificate is awarded to just one member.

One-to-many
The relationship between Course and Certificate is also one-to-many:

Each course can have many certificates (one for each member that completes it).
Each certificate is for just one course.

One-to-many
The relationship between Instructor and Certificate is also one-to-many:

Each certificate has one instructor (the instructor that assessed the course).
Each instructor can carry out many assessments.

One-to-many

The completed entity relationship diagram looks like this:

ER diagram: Sports club entities


WHAT IS NORMALIZATION

• Normalization is the branch of relational theory that provides


design insights. It is the process of determining how much redundancy
exists in a table.

• The goals of normalization are to:


✓ Eliminating redundant(useless) data
✓ Ensuring data dependencies make sense i.e data is logically stored.
ADVANTAGES AND DISADVANTAGES OF NORMALISATION
PARTIAL DEPENDENCY
• is a form of Functional dependency that holds on a set of attributes.
• In a functional dependency PQ → R, if either P alone or Q alone can
uniquely identify R, then this is said to be Partial Functional
Dependency

• In this example, if we know the value of Employee number, we can


obtain Employee Name, city, salary, etc.
• By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.
WHAT ARE TRANSITIVE FUNCTIONAL DEPENDENCIES?

A transitive functional dependency is when changing a non-key column, might


cause any of the other non-key columns to change

Consider the table 1. Changing the non-key column Full Name


may change Salutation.
0NF -
Comments On the table above
• We can see from looking at this database that there are repeated entries
for each artist.
• This means that there is more than one entry in the database for an artist.
The fields 'Concert Venue' and 'Agent Name' are not atomic - that is, they
have more than one item of data in the field.
• This means that the data in that field could be broken down further. For
example, the agent's first name and last name are both in the same field
'Agent Name’.
• This could be broken down into two fields.
• Another feature of this database is that each record in the database does
not have a unique identifier.
1NF
First Normal Form (1NF) - a table Comments On the tables above
should follow the following 4
• We have now made all the
rules: fields in the database atomic.
• We have done this by breaking
1.It should only have single(atomic) down the venue name and
location into two separate
valued attributes/columns. fields, and by breaking down
2.Values stored in a column should the agent first and last name
be of the same domain into two separate fields.
3.All the columns in a table should • Each table cell should contain
a single value.
have unique names.
• Each record needs to be
4.And the order in which data is unique.
stored, does not matter.
2 NF
Comments On the table above
2NF (Second Normal Form) Rules
• Now that we have taken
our Database to 1NF, we
1. Be in 1NF can take it to 2NF.
2. Single Column Primary • In 2NF our document
Key(Partial Dependency) looks like above

• Rule 1- Be in 1NF
• Rule 2- Single Column
Primary Key that does not
functionally dependent
on any subset of
candidate key relation
3NF
Comments On the table above
Third Normal Form (3NF) Rules
A table is said to be in the Third Normal Form when, • We have now separated our database into
different tables.
• We have introduced a primary key to each
database (marked with *).
1. It is in the Second Normal form. • This means that the data entered into that field
2. And, it doesn't have Transitive for each record must be different for every
record. The data in each table is dependent on
Dependency the primary key of that table
i. ArtistDetails table contains details about
each artist
ii. VenueDetails table contains details about
each venue
iii. ArtistsBookings table contains details
about each concert booking for each artist.
• An additional field Concert ID has been added to
this table to create a primary key.
• The fields that are in each of these tables are
directly related to the primary key of that table.
We have taken our database through the three different stages
Relationships of normalization and can now create links between the tables
to create a relational database.
Example 2
1st Normal Form (1NF)
• In this Normal Form, we tackle the problem of atomicity. Here
atomicity means values in the table should not be further divided. In
simple terms, a single cell cannot hold multiple values. If a table
contains a composite or multi-valued attribute, it violates the First
Normal Form.
• In the above table, we can clearly see that the Phone Number column has
two values.
• Thus it violated the 1st NF. Now if we apply the 1st NF to the above table
we get the below table as the result.

By this, we have achieved atomicity and also each and every


column have unique values.
2nd Normal Form (2NF)
• The first condition in the 2nd NF is that the table has to be in 1st NF. The table also should not
contain partial dependency.
• Here partial dependency means the proper subset of candidate key determines a non-prime
attribute.
• To understand in a better way lets look at the below example.
• Consider the table

This table has a composite primary key Emplyoee ID, Department ID. The non-key attribute is Office Location.
In this case, Office Location only depends on Department ID, which is only part of the primary key.
Therefore, this table does not satisfy the second Normal Form.
• To bring this table to Second Normal Form, we need to break the table into
two parts. Which will give us the below tables:

As you can see we have removed the partial functional dependency that we initially had. Now,
in the table, the column Office Location is fully dependent on the primary key
of that table, which is Department ID.
Example 3

ONF
1 Normal Form
2NF
3NF
23. Datasheet view
refers to row wise and column wise viewing of data in a table in database applications such as Access. The
information pertaining to individual records is provided in individual rows and the attributes related to that record is
given in the corresponding columns

24. Design View


Most Access objects are displayed in Design view, which allows you to work with the underlying structure of your
tables, queries, forms, and reports. To create a new table in Design view, you define the fields that will comprise the
table before you enter any data. In Design view for tables, each row corresponds to a field. You can edit, insert, and
delete fields in your database tables in Design view. You insert a field by adding a row, while you delete a field by
removing a row. You can also change field order by dragging a row selector to a new position.
25. Query Design
When you query, the results are presented to you in a table, but when you design one you use a different view. This
is called Query Design view, and it lets you see how your query is put together

26. Forms
A database form shows all or selected fields for one record. Forms show field names and data in an attractive and
easy-to-read format.
27. Filters
A filter displays records in a database according to criteria you select
28. Reports
A report presents data in an attractive format and is especially suitable for printing. Reports can display data from
tables or queries. All or selected fields can be included in a report. Data can be grouped or sorted and arranged in a
variety of ways
29. Queries:
A query finds records in a database according to criteria you specify.

Query by example (QBE)

allows the user to create queries based on a template, usually a set of filters presented in a graphical form. If you are
using database software it might have an option to connect blocks and set the filters you want. The system presents a
blank record and lets you specify the fields and values that define the query. Database management software like
MySQL, Microsoft Access and Oracle have front-end graphical interfaces which make it easier to run QBE

queries.

Below are some examples of searches (queries) and how they would be created using query-by-example.
Example 1
Field: Material Colour Price Product ID Description

Table: SOFA SOFA SOFA SOFA SOFA

Sort: Descending

Show: No No Yes Yes Yes

Criteria: =‘Leather’ =‘Brown’

or:

A search to select the price, product ID and description for all brown leather sofas, in descending order of price
Example 2 Field: Name History Maths

Table: MARKS MARKS MARKS

Sort: Ascending

Show: Yes No No

Criteria: >50

or: >50

A search to select the name of all the students, in ascending order, with more than 50 marks in History or more than
50 marks in Maths.

Example 3

Field: Name English Maths

Table: MARKS MARKS MARKS

Sort:

Show: Yes Yes Yes

Criteria: <30 <30

or:

A search to select the names and marks for all the students with less than 30 marks in both English and Maths.
Example 4

Field: Registration Make Model Price Sold

Table: CAR CAR CAR CAR CAR

Sort:

Show: Yes No Yes Yes No

Criteria: =’Ford’ False

or:

A search to select the registration, model and price of all the Ford cars that haven’t been sold yet.

Example 5

Field: Registration Make Model Price Sold

Table: CAR CAR CAR CAR CAR

Sort:

Show: Yes Yes Yes Yes No

Criteria: Like ‘F*’

or:

A search to select the registration, make, model and price of all the cars where the make begins with the letter F.
Example 6

Field: DOB FName SName

Table: STUDENT STUDENT STUDENT

Sort:

Show: No Yes Yes

Criteria: <01/01/2000

or:

A search to select the first name and surname of all students born before the year 2000.

Example 7

Field: DOB FName SName

Table: STUDENT STUDENT STUDENT

Sort:

Show: No Yes Yes

Criteria: Between 01/01/2000


and 31/01/2000

or:

A search to select the first name and surname of all students born between 1st and 31st January 2000.
Example 8
A database table, BIKETYRES, is used to keep a record of tyres for sale in a cycle shop. Tyres are categorised by
width and diameter in millimetres, whether they have an inner tube and the type of terrain for which they are
designed. The query-by-example grid below displays the tyre code and the stock level of all 28 mm width tyres
suitable for mixed terrain.

Field: Table: Tyre Code Stock Level Width Terrain

Sort: Show: BIKETYRES BIKETYRES BIKETYRES BIKETYRES

Criteria:
3 3
or:
= 28 = 'Mixed'

Alter the query to show the tyre code and stock level in ascending order of stock level for all 24 mm asphalt terrain
tyres. Write the new query in the following query-by-example grid. [4]

Field: Table:

Sort: Show:

Criteria:

or:
Answer?
Applying Test Data

It is important to test algorithms to check how they perform under a range of conditions.

This includes testing any validation you have created to ensure it performs as expected.

When creating a testing plan, the test data that you use shouldn’t be random values, but rather values that fulfil the
following test criteria.

1. Normal data: Normal data is test data that is typical (expected) and should be accepted by the system.
2. Extreme data or Boundary data: Extreme data is test data at the upper or lower limits of expectations that should
be accepted by the system. A pair of values at each end of a range:
- The data at the upper or lower limits of expectations that should be accepted
- The immediate values before or beyond the limits of expectations that should be rejected
3. Abnormal data (erroneous data): Abnormal data is test data that falls outside of what is acceptable and should be
rejected by the system.
Example: A system has validation to ensure that only integers between 1 and 10 are entered as an input. The test data
for this could be:
• Normal data: from 2 to 9 although 1 and 10 can be included – however, see below
• Boundary data / Extreme: 1 and 10 (to be accepted); 0, 11 (to be rejected)
• Abnormal data (erroneous data): Thirteen, 5.7, 14, outside the range
Validation and Verification
Verification
Verification Method Description
Double entry Data is entered twice and the computer checks that they match up
The user manually reads and compares the newly inputted data against the
Visual check
original source to ensure they match

Verification is performed to ensure that the data entered exactly matches the original source.
Verification is a way of preventing errors when data is copied from one medium to another. Verification does not
check if data makes sense or is within acceptable boundaries, it only checks that the data entered is identical to the
original source. Once we know our data is valid (i.e. it is logical, and in the right format, etc.) then we also need to
check if it is correct (it maybe a valid date-of-birth, but is it your date-of-birth?!)

Validation
Validation is an automatic computer check to ensure that the data entered is sensible, feasible and reasonable.
Validation cannot ensure data is accurate.
When programming, it is important that you include validation for data inputs. This stops unexpected or abnormal
data from crashing your program and prevents you from receiving impossible garbage outputs.
There are several validation methods that can be used to check the input data.
Range Check – this is generally used when working with data which contains numbers, currency, or date and time
values. A range check lets you set appropriate limits:
Boundary Description Validation
Upper limit The maximum price of any item in a shop is $10. <=10
Lower limit In a shop all items have a corresponding cost. >=0
A range Number of hours worked must be less than or equal to 8 but more than 0. >0 and <=8
Type Check – this is a way to confirm that the correct data type is inputted.
• For example, in an application form age may range from 0 to 100. A number data type would be an
appropriate choice for this data. By defining the data type as number, only numbers are allowed in the
field (e.g. 18, 20, 25) and it would prevent people from inputting verbal data, like ‘eighteen’.
• Some data types are capable of doing an extra type check. For example, a date data type will ensure
that a date inputted existed at some point in the past, or will exist in the future. It would not, for
example, accept the date 30/02/2018.
Check Digit – this is used to find out if a series of numbers has been keyed correctly. There are many ways to
produce check digits.
• For example, the ISBN-10 numbering system for books uses ‘Modulo-11’ division, where it outputs
the remainder of the division as the result of the operation.
Length Check – this is used to make sure that the correct number of characters are entered into the field. It
confirms that the character string entered is neither too short nor too long.
• For example, consider a password that needs to be 8 characters long. The length check will ensure
that exactly 8 characters are entered into the field.
Lookup – this helps to lessen errors in a field with a limited list of values.
• For example, the fact that there are only 12 possible months in a year ensures that the list of possible
values is limited.
• Advantages of a lookup list are as follows:
o Faster data entry—because it is typically much faster to select from a list than to type each individual
entry.
o Enhanced accuracy—because it lessens the risk of spelling mistakes.
o Greater ease of use—because it limits the options to choose from by only displaying the essential
choices.
Format Check – this checks that the input data is in the right format.
• For example, a National Insurance number is in the form XX 99 99 99 XX where X is any letter and
9 is any number.
Presence Check – this kind of check makes sure that an essential or required field cannot be left blank: it must be
filled in.
• If someone attempts to leave the field blank, then an error message will be displayed, and they won’t be
able to proceed to the next step, nor will they be able to save any other data which they have entered.
• Database fields should have validation rules to make sure the data entered follows the expected format.
Validation is an automatic check to ensure that the data entered is sensible and feasible. Validation cannot ensure
data is actually accurate. There are different types of validation checks a database can run:
8.2 Database Management System (DBMS)
▪ Show understanding of the features provided by a Database Management System (DBMS)
- A schema is a blueprint of the database which specifies what fields will be present and what would be their
types. For example, an employee table will have an employee_ID column represented by a string of 10 digits and
an employee_Name column with a string of 45 characters.
- Data model is a high-level design which decides what can be present in the schema. It provides a database user
with a conceptual framework in which we specify the database requirements of the database user and the structure
of the database to fulfil these requirements.
- A data model can, for example, be a relational model where the data will be organised in tables whereas the
schema for this model would be the set of attributes and their corresponding domains.
▪ data modelling - Why use Data Model?
The primary goal of using data model are:
• Ensures that all data objects required by the database are accurately represented. Omission of data will lead to
creation of faulty reports and produce incorrect results.
• A data model helps design the database at the conceptual, physical and logical levels.
• Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
• It provides a clear picture of the base data and can be used by database developers to create a physical database.
• It is also helpful to identify missing and redundant data.
• Though the initial creation of data model is labour and time consuming, in the long run, it makes your IT
infrastructure upgrade and maintenance cheaper and faster.
Types of Data Models in DBMS

Types of Data Models:


There are mainly three different types of data models: conceptual data models, logical data models, and physical data
models, and each one has a specific purpose. The data models are used to represent the data and how it is stored in
the database and to set the relationship between data items.

1. Conceptual Data Model:


This Data Model defines WHAT the system contains. This model is typically created by Business stakeholders
and Data Architects. The purpose is to organize, scope and define business concepts and rules. This Data Model
defines WHAT the system contains. This model is typically created by Business stakeholders and Data Architects.
The purpose is to organize, scope and define business concepts and rules. The conceptual level has a conceptual
schema, which describes the structure of the whole database for a community of users. The conceptual schema
hides the details of physical storage structures and concentrates on describing entities, data types, relationships,
user operations, and constraints. A high-level data model or an implementation data model can be used at this
level. A Conceptual Data Model is an organized view of database concepts and their relationships. The purpose of
creating a conceptual data model is to establish entities, their attributes, and relationships. In this data modelling
level, there is hardly any detail available on the actual database structure. Business stakeholders and data architects
typically create a conceptual data model.
The 3 basic tenants of Conceptual Data Model are
1. Entity: A real-world thing
2. Attribute: Characteristics or properties of an entity
3. Relationship: Dependency or association between two entities
Data model example:
• Customer and Product are two entities. Customer number and name are attributes of the Customer entity
• Product name and price are attributes of product entity
• Sale is the relationship between the customer and product

Conceptual Data Model


Characteristics of a conceptual data model
• Offers Organisation-wide coverage of the business concepts.
• This type of Data Models are designed and developed for a business audience.
• The conceptual model is developed independently of hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and technology. The focus is to represent data as a user
will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all stakeholders by
establishing basic concepts and scope.
2. Logical Data Model
Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model is
typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules
and data structures. The Logical Data Model is used to define the structure of data elements and to set
relationships between them. The logical data model adds further information to the conceptual data model
elements. The advantage of using a Logical data model is to provide a foundation to form the base for the
Physical model. However, the modelling structure remains generic.

At this Data Modelling level, no primary or secondary key is defined. At this Data modelling level, you need to
verify and adjust the connector details that were set earlier for relationships.

Characteristics of a Logical data model


• Describes data needs for a single project but could integrate with other logical data models based on the
scope of the project.
• Designed and developed independently from the DBMS.
• Data attributes will have datatypes with exact precisions and length.
• Normalization processes to the model is applied typically till 3NF.
3. Physical Data Model
This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is
typically created by DBA and developers. The purpose is actual implementation of the database.
A Physical Data Model describes a database-specific implementation of the data model. It offers database abstraction
and helps generate the schema. This is because of the richness of meta-data offered by a Physical Data Model. The
physical data model also helps in visualizing database structure by replicating database column keys, constraints,
indexes, triggers, and other RDBMS features.

Characteristics of a physical data model:


• The physical data model describes data need for a single project or application though it may be integrated
with other physical data models based on project scope.
• Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
• Developed for a specific version of a DBMS, location, data storage or technology to be used in the project.
• Columns should have exact datatypes, lengths assigned and default values.
• Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
Advantages of Data model:
• The main goal of a designing data model is to make certain that data objects offered by the functional team
are represented accurately.
• The data model should be detailed enough to be used for building the physical database.
• The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.
• Data Model helps business to communicate the within and across organizations.
• Data model helps to documents data mappings in ETL process
• Help to recognize correct sources of data to populate the model

Disadvantages of Data model:

• To develop Data model, one should know physical data stored characteristics.
• This is a navigational system produces complex application development, management. Thus, it requires a
knowledge of the biographical truth.
• Even smaller change made in structure require modification in the entire application.
• There is no set data manipulation language in DBMS.
▪ Logical Schema
The design of the database is called a schema.
A database schema is a set of rules that define the architecture of our database and data collection needs. Each
company will have its own database needs and will collect different information based on its business goals.
For instance, a store may need to collect transaction information while a free programming education site may
only need to collect user information and settings.
There are mainly three levels of data abstraction:
1. The internal level has an internal schema, which
describes the physical storage structure of the
database and access paths. The internal schema uses a
physical data model and describes the complete
details of data storage and access paths for the
database.
2. The external or view level includes a number of external schemas or user views. It describes various user views.
Each external schema describes the part of the database that a particular user group is interested in and hides the
rest of the database from that user group. A high-level data model or an implementation data model can be used
at this level.
3. Conceptual or Logical Level: Structure and
constraints for the entire database. Logical
Schema defines the design of the database at the
conceptual level of the data abstraction. At this
level, we define the entities, attributes, constraints,
relationships, etc. and how their relationship
would be logically implemented. The
programmers and the DBA work at this level and
they do all these implementations.
▪ data security, including backup procedures and the use of access rights to individuals / groups of users
Data security is a set of processes and practices designed to protect your critical information technology (IT)
ecosystem. This included files, databases, accounts, and networks. Effective data security adopts a set of controls,
applications, and techniques that identify the importance of various datasets and apply the most appropriate
security controls.

Effective data security considers the sensitivity of various datasets and corresponding regulatory compliance
requirements. Like other cybersecurity postures — perimeter and file security to name a few — data security isn’t
the end-all-be-all for keeping hackers at bay. Rather, data security is one of many critical methods for evaluating
threats and reducing the risk associated with data storage and handling.

▪ Why is Data Security Important?


Data security is critical to public and private sector organizations for a variety of reasons. First, there’s the legal
and moral obligation that companies have to protect their user and customer data from falling into the wrong
hands. Financial firms, for example, may be subject to the Payment Card Industry Data Security Standard (PCI
DSS) that forces companies to take all reasonable measures to protect user data.

Then there’s the reputational risk of a data breach or hack. If you don’t take data security seriously, your
reputation can be permanently damaged in the event of a publicized, high-profile breach or hack. Not to mention
the financial and logistical consequences if a data breach occurs. You’ll need to spend time and money to assess
and repair the damage, as well as determine which business processes failed and what needs to be improved.
Types of Data Security

• Access Controls: This type of data security measures includes limiting both physical and digital access to critical
systems and data. This includes making sure all computers and devices are protected with mandatory login entry,
and that physical spaces can only be entered by authorized personnel.
• Authentication: Similar to access controls, authentication refers specifically to accurately identifying users before
they have access to data. This usually includes things like passwords, PIN numbers, security tokens, swipe cards,
or biometrics.
• Backups & Recovery: Good data security means you have a plan to securely access data in the event of system
failure, disaster, data corruption, or breach. You’ll need a backup data copy, stored on a separate format such as a
physical disk, local network, or cloud to recover if needed.
• Data Erasure: You’ll want to dispose of data properly and on a regular basis. Data erasure employs software to
completely overwrite data on any storage device and is more secure than standard data wiping. Data erasure
verifies that the data is unrecoverable and therefore won’t fall into the wrong hands.
• Data Masking: By using data masking software, information is hidden by obscuring letters and numbers with
proxy characters. This effectively masks key information even if an unauthorized party gains access to it. The data
changes back to its original form only when an authorized user receives it.
• Data Resiliency: Comprehensive data security means that your systems can endure or recover from failures.
Building resiliency into your hardware and software means that events like power outages or natural disasters
won’t compromise security.
• Encryption: A computer algorithm transforms text characters into an unreadable format via encryption keys. Only
authorized users with the proper corresponding keys can unlock and access the information. Everything from files
and a database to email communications can — and should — be encrypted to some extent.
Main Elements of Data Security
There are three core elements to data security that all organizations should adhere to: Confidentiality, Integrity,
and Availability. These concepts are also referred to as the CIA Triad, functioning as a security model and
framework for top-notch data security. Here’s what each core element means in terms of keeping your sensitive
data protected from unauthorized access and data exfiltration.
• Confidentiality. Ensures that data is accessed only by authorized users with the proper credentials.
• Integrity. Ensure that all data stored is reliable, accurate, and not subject to unwarranted changes.
• Availability. Ensures that data is readily — and safely — accessible and available for ongoing business needs.
Data Security Technologies
Using the right data security technologies can help your organization prevent breaches, reduce risk, and sustain
protective security measures.
• Data Auditing: Security breaches are often inevitable, so you’ll need to have a process in place that gets to the
root cause. Data auditing software solutions capture and report on things like control changes to data,
records of who accessed sensitive information, and the file path utilized. These audit procedures are all vital
to the breach investigation process. Proper data auditing solutions also provide IT administrators with
visibility in preventing unauthorized changes and potential breaches.
• Data Real-Time Alerts: Typically, it takes companies several months before they discover that a data breach
has actually taken place. All too often, companies discover breaches via their customers or third-party
vendors and contractors rather than their own IT departments. By using real-time systems and data
monitoring technology, you’ll be able to discover breaches more quickly. This helps you mitigate data
destruction, loss, alteration, or unauthorized access to personal data.
• Data Risk Assessment: A data risk assessment will help your organization identify its most overexposed,
sensitive data. A complete risk assessment will also offer reliable and repeatable steps towards prioritizing and
remediating serious security risks. The process begins by identifying sensitive data that’s accessed via global
groups, data that’s become stale, or data with inconsistent permissions. An accurate risk assessment will
summarize important findings, expose vulnerabilities, and include prioritized remediation recommendations.
• Data Minimization: Traditionally, organizations viewed having as much data possible as a benefit. There was
always the potential that it might come in handy in the future. Today, large amounts of data are seen as a
liability from a security standpoint. The more data you have, the greater the number of targets for hackers.
That’s why data minimization is now a key security tactic. Never hold more data than necessary and follow
all data minimization best practices.
• Purge Stale Data: If data doesn’t exist within your network, it can’t be compromised. That’s why you’ll want
to purge old or unnecessary data. Use systems that can track file access and automatically archive unused
files. In the modern age of yearly acquisitions, reorganizations, and “synergistic relocations,” it’s quite likely
that networks of any significant size have multiple forgotten servers that are kept around for no good reason.

Best Practices for Ensuring Data Security


There is no silver bullet that will guarantee 100 percent security of your data. However, there are several steps,
tactics, and best practices that can help minimize the chances of a data breach, loss, and exposure.
• Quarantine Sensitive Files: One common data management mistake is placing sensitive files on a shared or
open drive accessible to the entire company. You’ll want to eliminate this practice, placing sensitive data into
safely quarantined areas. Gain control of your data by using data security software that continually classifies
sensitive data and moves it to a secure location.
• Behaviour-Based Permissions: Overly permissive behaviour is another common misstep, where more people
have access to data than is necessary. A convoluted web of temporary access and permissions quickly arises,
with individuals having access to data that they shouldn’t. Limit over-permissioning by using software that
profiles user behaviour and automatically places appropriate behaviour-based permissions via an entitlement
review.
• Prepare for Cyber Threats: Good data security is all about thinking ahead. You’ll want to have a solid
cybersecurity policy that encompasses current and potential future threats to your data. This includes both
external hackers and insider threats. Aside from your policy, employ software that provides real-time
monitoring and alerts of suspicious activities.
• Delete Unused Data: Storing stale data for longer than necessary presents a significant liability in terms of
data security. You’ll want to have processes and technologies in place to eliminate sensitive data that’s no
longer necessary for ongoing business activities. The last thing you want is a mountain of data that you’re
unaware of as a sitting duck for hackers.
Capabilities and Solutions
Aside from the right technologies and cyber hygiene best practices, your company should also have the following
business process capabilities and solutions to ensure ongoing data security
• Knowing Where Data Lives: It’s critical to know where all of your data resides at any given time. This
includes data you’re currently using as well as data that should be deleted or retired. Make sure you have
both technologies and processes in place that will give you visibility into your data at all times.
• Tracking User Access: One of the biggest dangers to data security is internal personnel gaining access to data
that they shouldn’t. Therefore, you’ll need to track user access to ensure only the right people are accessing
the most sensitive data.
• Blocking High-Risk Activities: Not all data handling actions are created equal. Individuals can engage in high-
risk activities and data movements, such as sending sensitive information in a non-encrypted format via
email. You want to have systems and software in place that block all high-risk activities.

▪ Show understanding of how software tools found within a DBMS are used in practice
The use and purpose of developer interface

The use and purpose of query processor


• Typically, a query processor consists of four sub-components; each of them corresponds to a different stage
in the lifecycle of a query.
• The sub-components are the query parser, the query rewriter, the query optimizer and the query executor
▪ 8.3 Data Definition Language (DDL) and Data Manipulation Language (DML)

Database Queries
• Databases allow us to store and filter data to find specific information. A database can be queried using a
variety of methods, although this depends on the software you are using
• A major benefit of storing information in a database is the ability to perform queries.
• A query is the tool that allows us to ask the database a question and get back any matching records (a search).
• We create queries by choosing at least one set of criteria upon which we wish to search. Complex multiple
criteria searches are possible and the results can be sorted in ascending or descending order.
• Queries are performed using a special language called SQL, however most Database Management Systems also
provide an easier visual method of creating a query.
• These visual tools are sometimes referred to as a query-by-example.
Query language
• Query language is a written language used only to write specific queries. This is a powerful tool as the user can
define precisely what is required in a database. SQL is a popular query language used with many databases.
DDL (Data Definition Language):
consists of the SQL commands that can be used to define the database schema. It simply deals with descriptions
of the database schema and is used to create and modify the structure of database objects in the database. DDL is
a set of SQL commands used to create, modify, and delete database structures but not data. These commands are
normally not used by a general user, who should be accessing the database via an application.
List of DDL commands:
• CREATE: This command is used to create the database or its objects (like table, index, function, views,
store procedure, and triggers).
• DROP: This command is used to delete objects from the database.

• ALTER: This is used to alter the structure of the database.

• TRUNCATE: This is used to remove all records from a table, including all spaces

• COMMENT: This is used to add comments to the data dictionary.

• RENAME: This is used to rename an object existing in the database.

DML (Data Manipulation Language):


The SQL commands that deals with the manipulation of data present in the database belong to DML or Data
Manipulation Language and this includes most of the SQL statements. It is the component of the SQL statement
that controls access to data and to the database. Basically, DCL statements are grouped with DML statements.
List of DML commands:
• INSERT: It is used to insert data into a table.

• UPDATE: It is used to update existing data within a table.

• DELETE: It is used to delete records from a database table.

• LOCK: Table control concurrency.

• CALL: Call a PL/SQL or JAVA subprogram.

• EXPLAIN PLAN: It describes the access path to data.


Introduction of SQL DML Commands
Data Manipulation Language (DML) commands in SQL deals with manipulation of data
records stored within the database tables. It does not deal with changes to database
objects and its structure. The commonly known DML commands are INSERT, UPDATE and
DELETE. Liberally speaking, we can consider even SELECT statement as a part of DML
commands. Albeit, it strictly forms part of the Data Query Language (DQL) command.
We will be learning about all the above mentioned DML commands in great detail in the
subsequent sections. But let us first have a look at this summary table for a brief overview
on each of them.

Commands of DML

Command Description

SELECT Used to query or fetch selected fields or columns from


a database table

INSERT Used to insert new data records or rows in the


database table

UPDATE Used to set the value of a field or column for a


particular record to a new value

DELETE Used to remove one or more rows from the database


table
Now let us try to understand each of the above mentioned DML commands in detail one
by one.

1. SELECT
SELECT command or statement in SQL is used to fetch data records from the database
table and present it in the form of a result set. It is usually considered as a DQL command
but it can also be considered as DML.

The basic syntax for writing a SELECT query in SQL is as follows :

SELECT column_name1, column_name2, …


FROM table_name
WHERE condition_ expression;

The parameters used in the above syntax are as follows :

column_name1, column_name2, … : Specify the column_names which have to be


fetched or selected for the final result set.
table_name: Specify the name of the database table from which these results have to
be fetched.
condition_expression: Specify the condition expression for filtering records for the
final result set.

Here are a few examples to illustrate the use of SELECT command.


SELECT customer_id,
sale_date,
order id
order_id,
store_state
FROM customers;
The query returns the following output.

In this example, we have fetched fields such as customer_id, sale_date, order_id and
store_state from customers table. Next, suppose if we want to fetch all the records from
the customers table. This can be achieved by a simple query as shown below.

SELECT * FROM customers;

The query returns the following output.


2. INSERT
INSERT commands in SQL are used to insert data records or rows in a database table. In
an INSERT statement, we specify both the column_names for which the entry has to be
made along with the data value that has to be inserted.

The basic syntax for writing INSERT statements in SQL is as follows :

INSERT INTO table_name (column_name_1, column_name_2, column_name_3,


...)
VALUES (value1, value2, value3, ...)

By VALUES, we mean the value of the corresponding columns.

Here are a few examples to further illustrate the INSERT statement.

INSERT INTO public.customers(


customer_id, sale_date, sale_amount, salesperson, store_state,
order_id)
VALUES (1005,'2019-12-12',4200,'R K Rakesh','MH','1007');
Here we have tried to insert a new row in the Customers table using the INSERT
command. The query accepts two sets of arguments, namely field names or column
names and their corresponding values.

Suppose if we have to insert values into all the fields of the database table, then we need
not specify the column names, unlike the previous query. Follow the following query for
further illustration.

INSERT INTO customers


VALUES ('1006','2020-03-04',3200,'DL', '1008');

In this example, we have successfully inserted all the values without having to specify the
fieldnames.

3. UPDATE
UPDATE command or statement is used to modify the value of an existing column in a
database table.

The syntax for writing an UPDATE statement is as follows :

UPDATE table_name
SET column_name_1 = value1, column_name_2 = value2, ...
WHERE condition;
Having learnt the syntax, let us now try an example based on the UPDATE statement in
SQL.

UPDATE customers
SET store_state = 'DL'
WHERE store_state = 'NY';

In this example, we have modified the value of store_state for a record where store_state
was ‘NY’ and set it to a new value ‘DL’.

4. DELETE
DELETE statement in SQL is used to remove one or more rows from the database table. It
does not delete the data records permanently. We can always perform a rollback
operation to undo a DELETE command. With DELETE statements we can use the WHERE
clause for filtering specific rows.

The syntax for writing an DELETE statement is as follows :

DELETE FROM table_name WHERE condition;

Having
H i llearntt th
the syntax,
t we are allll sett tto ttry an example
l bbased
d on th
the DELETE command
d
in SQL.
DELETE FROM customers
WHERE store_state = 'MH'
AND customer_id = '1001';

In this example, we have removed a row from the customer’s table where store_state was
‘MH’ and customer_id was ‘1001’.

Conclusion
DML commands are used to modify or manipulate data records present in the database
tables. Some of the basic DML operations are data insert (INSERT), data updation
(UPDATE), data removal (DELETE) and data querying (SELECT).
DDL (Data Definition Language) Command in SQL
DDL or Data definition language is actually the definition or description of the database structure or
schema, it won't change the data inside the database. Create, modify, and delete the database
structures, but not the data. Only These commands are not done by all the users, who have access to
the database via an application.

CREATE Command in SQL

SQL Create the database or its object (ie table, index, view, function, etc.).

Syntax

CREATE DATABASE databasename

Example

1 CREATE DATABASE Student_data;

Syntax

CREATE TABLE table_name (


column1 datatype,
column2 datatype,
column3 datatype,
....
);

Example

1 CREATE TABLE Student (


2 StudendId int,
3 LastName varchar(255),
4 FirstName varchar(255),
5 Address varchar(255),
6 Mark int
7 );

DROP Command in SQL

Drop command helps to delete the object from the database (ie table, index, view, function, etc.).

Syntax

DROP object object_name

Example

1 DROP TABLE Student;

Syntax

DROP DATABASE database_name

Example

1 DROP DATABASE Student_data;


ALTER Command in SQL

Syntax

ALTER TABLE table_name


ADD column_name datatype

Example

1 ALTER TABLE Student


2 ADD Total int;

Syntax

ALTER TABLE table_name


DROP COLUMN column_name

Example

1 ALTER TABLE Student


2 DROP COLUMN Mark;

1) SQL Server / MS Access

Syntax

ALTER TABLE table_name


ALTER COLUMN column_name datatype

Example

1 ALTER TABLE Student


2 ALTER COLUMN Total Varchar(255);

2) My SQL / Oracle (prior version 10G)

Syntax

ALTER TABLE table_name


MODIFY COLUMN column_name datatype

Example

1 ALTER TABLE Student


2 MODIFY COLUMN Total Varchar(255);

3) Oracle 10G and later

Syntax

ALTER TABLE table_name


MODIFY column_name datatype

Example
1 ALTER TABLE Student
2 MODIFY Total Varchar(255);
TRUNCATE Command in SQL

SQL Truncate command helps to remove all records from a table

Syntax

TRUNCATE TABLE table_name

Example

1 TRUNCATE TABLE Student;

COMMENT Command in SQL

SQL Comment is helpful to add comments to the data dictionary."--" is used to comment on the notes.

Syntax

--(notes,examples)

Example

1 --select the student data


2 SELECT * FROM Student;

RENAME Command in SQL

SQL Rename is helpful to rename an object existing in the database.

1) PostgreSQL

Syntax

ALTER DATABASE "Old_DatabaseName" RENAME TO "New_DatabaseName";

Example

1 ALTER DATABASE "Student_data" RENAME TO "Employee_data";

2) MySQL

Example

SQL Command for Dump copy

1 mysqldump -u username -p"password" -R testDb > testDb.sql;

SQL Command for creating new DB

1 mysqladmin -u username -p"password" create testDB1;

SQL Command for Import

1 mysql -u username -p"password" testDb1 < testDb.sql;

3) SQL Server
Ask Question

In SQL Server we can rename the database through server application, by right click the existing
database and renaming it.
DQL (Data Query Language) Command in SQL
DQL or data query language is to perform the query on the data inside the schema or object (ie table,
index, view, function, etc). With the help of a DQL query, we can get the data from the database to
perform actions or operations like analyzing the data.

SELECT Command in SQL

SQL SELECT a query on a table or tables to view the temporary table output from the database.

Syntax

Select * from Table_Name;

Example

1 Select * from Student;

DML(Data Manipulation Language) Command in SQL


DML or Data Manipulation Language is to manipulate the data inside the database. With the help of
DML commands, we can insert, delete, and change the data inside the database. Find more about DML
Command in SQL: DML Command in SQL.

INSERT Command in SQL

SQL Insert command is helpful to insert the data into a table.

1) All the column names are mentioned in the insert statement.

Syntax

INSERT INTO table_name (column1, column2, column3, ...)


VALUES (value1, value2, value3, ...)

Example

1 INSERT INTO Student (StudendId, FirstName, LastName)


2 VALUES (12345, "Sri", "Durga");

2) Column names do not need to mention in the query, Values should be given in the order according
to the column.

Syntax

INSERT INTO table_name


VALUES (value1, value2, value3, ...)
Example

1 INSERT INTO Student


2 VALUES (12345, "Sri", "Durga");
UPDATE Command in SQL

SQL Update command is helpful to update the existing data in a table.


Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition

Example

1 UPDATE Student
2 SET FirstName = "Navin" , LastName = "Kumar"
3 WHERE StudentId=12345;

DELETE Command in SQL

SQL Delete command helps to delete the records from a database table.

Syntax

DELETE FROM table_name WHERE condition;


Example

1 DELETE FROM Student WHERE StudentId=12345;

LOCK Command in SQL

SQL Lock command is helpful to lock the table to control concurrency.

Syntax

LOCK TABLE table-Name IN { SHARE | EXCLUSIVE } MODE


Example

1 LOCK TABLE Student IN SHARE MODE;

CALL Command in SQL


SQL Call command is helping to Call a PL/SQL or JAVA subprogram.

Syntax
EXEC SQL
CALL GETEMPSVR (2, NULL)
END-EXEC
EXPLAIN PLAN
Syntax

EXPLAIN PLAN FOR


SELECT Column_name FROM table_name

Example

1 EXPLAIN PLAN FOR


2 SELECT last_name FROM Student;

This query explanation will be stored in the PLAN_TABLE table. We can then select the execution plan
to review the queries.

You might also like