Normalization
Normalization
Structured Query Language is a domain-specific language designed for managing data held in a relational
database management system.
SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce after learning about
the relational model from Edgar F. Codd in the early 1970s.
SQL commands are like instructions to a table. It is used to interact with the database with some operations.
It is also used to perform specific tasks, functions, and queries of data. SQL can perform vario us tasks like
creating a table, adding data to tables, dropping the table, modifying the table, set permission for users.
DCL or Data Control Language is to provide rights, permissions, and other controls of the database system
Syntax
GRANT privileges ON object TO user;
Example: GRANT INSERT, SELECT on accounts TO Alex;
SQL Revoke command is to withdraw the user’s access privileges given by using the GRANT command.
Syntax
REVOKE privileges ON object FROM user;
Example: REVOKE INSERT, SELECT on accounts FROM John;
TCL commands in SQL
Transaction control language or TCL commands deal with the transaction within the database.
COMMIT
Syntax:
COMMIT;
For example:
ROLLBACK
Rollback command allows you to undo transactions that have not already been saved to the database.
Syntax:
ROLLBACK;
Example:
ROLLBACK TO SavepointName;
SAVEPOINT
This command helps you to sets a savepoint within a transaction.
Syntax:
SAVEPOINT SAVEPOINT_NAME;
Example:
SAVEPOINT RollNo;
Database security in DBMS is a technique for protecting and securing a database from intentional or
accidental threats. As a result, database security encompasses hardware parts, software parts, human
resources, and data.
1) SQL INJECTION
It is a type of attack which occurs when a malicious code is injected into frontend (web) apps and then
transmitted to the backend database. SQL injections provide hackers with unrestricted access to any data saved
in a database.
Any database system is vulnerable to these attacks if developers do not follow secure coding practices and the
organization does not conduct regular vulnerability testing.
2) Malware
Malware is software designed to corrupt data or harms a database. Malware could enter your system via any
endpoint device connected to the database's network and exploit vulnerabilities in your system. Malware
protection is important on any endpoint, but it is particularly necessary on database servers due to their high
value and sensitivity. Examples of common malware include spyware, Trojan viruses, viruses, worms,
adware, and ransomware.
Databases are breached and leaked due to insufficient level of IT security expertise and education of non-
technical employees, who may violate basic database security standards and endanger databases. IT security
employees may also lack the necessary expertise to create security controls, enforce rules, or execute incident
response processes.
The following are the key control measures used to ensure data security in databases:
Authentication
Authentication is the process of confirming whether a user logs in only with the rights granted to him
to undertake database operations. A certain user can only log in up to his privilege level, but he cannot
access any other sensitive data.
Access Control
Database access control is a means of restricting access to sensitive company data to only those
people (database users) who are authorized to access such data and permitting access to unauthorized
persons. It is a key security concept that reduces risk to the business or organization.
Encryption
Data encryption protects data confidentiality by converting it to encoded information known as cipher
text, which can only be decoded with a unique decryption key generated either during or before
encryption.
NORMALIZATION (V.IMP)
Normalization is the process of organizing data in a database. It includes creating tables and
establishing relationships between those tables according to rules designed both to protect the
data and to make the database more flexible by eliminating data redundancy and inconsistent
dependency.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more
than one place must be changed, the data must be changed in exactly the same way in all
locations. A customer address change is easier to implement if that data is stored only in the
Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it's intuitive for a user to look in the Customers table
for the address of a particular customer, it may not make sense to look there for the salary of the
employee who calls on that customer. The employee's salary is related to, or dependent on, the
employee and thus should be moved to the Employees table. Inconsistent dependencies can make
data difficult to access because the path to find the data may be missing or broken.
There are a few rules for database normalization. Each rule is called a "normal form."
If a table is not properly normalized and has data redundancy (repetition) then it will not only eat
up extra memory space but will also make it difficult for you to handle and update the data in the
database, without losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not normalized.
In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod(Head of Department), and office_tel are repeated
for the students who are in the same branch in the college, this is Data Redundancy.
Suppose for a new admission, until and unless a student opts for a branch, data of the
student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data for 100 students of the same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.
So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch
information.
What if Mr. X leaves the college? Or Mr. X is no longer the HOD of the computer science
department? In that case, all the student records will have to be updated, and if by mistake
we miss any record, it will lead to data inconsistency.
This is an Updation anomaly because you need to update all the records in your table just
because one piece of information got changed.
In our Student table, two different pieces of information are kept together, the Student
information and the Branch information.
So if only a single student is enrolled in a branch, and that student leaves the college, or for
some reason, the entry for the student is deleted, we will lose the branch information too.
So never in DBMS, we should keep two different entities together, which in the above
example is Student and branch,
The solution for all the three anomalies described above is to keep the student information and
the branch information in two different tables. And use the branch_id in the student table to reference the branch.
For a table to be in the First Normal Form, it should follow the following 4 rules:
So how do you fix the above table? There are two ways to do this:
1. Remove the emp_skills column from the Employee table and keep it in some other table.
2. Or add multiple rows for the employee and each row is linked with one skill.
emp_id emp_skill
1 Python
1 JavaScript
2 HTML
2 CSS
2 JavaScript
3 Java
emp_id emp_skill
3 Linux
3 C++
Let us take an example of the following <EmployeeProjectDetail> table to understand what partial
dependency is and how to normalize the table to the second normal form:
<EmployeeProjectDetail>
The prime attributes in DBMS are those which are part of one or more candidate keys.To remove
partial dependencies from this table and normalize it into second normal form, we can decompose
the <EmployeeProjectDetail> table into the following three tables:
<EmployeeDetail>
<EmployeeProject>
Employee_ID Project_ID
101 P03
101 P01
102 P04
103 P02
<ProjectDetail>
Project_ID Project_Name
P01 Project101
P02 Project102
P03 Project103
P04 Project104
Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into
<EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above
tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is
fully dependent on the primary key.
The relations in 2NF are clearly less redundant than relations in 1NF.
We assume the Score table is already in the Second Normal Form. If we have to store some extra
information in it, like,
1. exam_type
2. total_marks
The Score table will look like this,
1 1 70 Theory 100
1 2 82 Theory 100
2 1 42 Practical 50
In the table above, the column exam_type depends on both student_id and subject_id,
But the column total_marks just depends on the exam_type column. And
the exam_type column is not a part of the primary key. Because the primary key
is student_id + subject_id, hence we have a Transitive dependency here.
1 Practical 50 45
We have created a new table Exam Type and we have added more related information in it
like duration (duration of exam in mins.), and now we can use the exam_type_id in
the Score table.
Example :
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal