DATABASE CONCEPTS NOTES
DATABASE CONCEPTS NOTES
DATABASE CONCEPTS
DATABASE
A Database is a collection of logically related data
organized in a way that data can be easily accessed,
managed and updated.
DATA
INFORMATION
• Information is processed data, stored, or transmitted by a computer.
APPLICATIONS OF DATABASE.
• Data Integrity: Data Integrity refers to the correctness of the data in the
database. In other words, the data available in the database is reliable
data.
• Data Sharing: In DBMS, data is stored in the centralized database and all the
permitted users can access the same piece of information required at the
same time.
1 The volume of data, which can be The volume of data, which can be
processed, is limited. processed is large
2 Requires large quantity of paper Requires less quantity of paper
3 Speed and accuracy is executed is limited Faster and Accurate
4 Labour cost is high Labour cost is low
5 Storage medium is paper. Storage medium is Hard disk etc.
DATA PROCESSING CYCLE.
DATA PROCESSING CYCLE.
Data Collection: It is the process of systematic gathering of data from various sources
that has been systematically observed, recorded and organized.
Data Input: The raw data is put into the computer using a keyboard, mouse or other
devices such as the scanner, microphone and the digital camera.
Data Processing: Processing is the series of actions or operations on the input data to
generate outputs.
Data storage: Data and information should be stored in memory so
that it can be accessed later.
FIELD
RECORD/TUPLE
A single entry in a table is called a record or row. A
record in a table represents set of related data.
Records are also called the tuple.
DATABASE TERMS
ENTITY
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles.
INSTANCE
The collection of information stored in the
database at a particular moment is called an
instance of the database.
ATTRIBUTE/FIELD
RELATION
A relation is defined as a table with columns and
rows. Data can be stored in the form of a two-
dimensional table.
DOMAIN
It is defined as a set of allowed values for one or
more attributes.
TABLE
A table is a collection of data elements organized in
terms of rows and columns. Table is the simplest form
of data storage.
KEY
It is a column or columns which identifies the each
row or tuple.
DATA TYPES OF DBMS
• Integer
• Characters
• Strings
• Date fields
• Text fields
System Analysts
System analysts determine the requirement of end users; (especially naïve users), to
create a solution for their business need and focus on non-technical and technical
aspects.
Application programmers
These are the computer professionals who implement the specifications given by the
system analysts and develop the application programs.
• Enforcing Data Integrity: Data Integrity refers to the correctness of the data in the database. In other
words, the data available in the database is reliable data.
• Data Sharing: In DBMS, data is stored in the centralized database and all the permitted users can
access the same piece of information required at the same time.
• Database Security: DBMS provides a variety of security mechanisms for the user to protect his or her
data stored in the database.
• Supports Concurrent access: DBMS supports concurrent access to the same data stored in the
database by applying locking and time stamp mechanisms.
• Multiple user interfaces: In order to meet the needs of various users having different technicial
knowledge.DBMS provides different types of interfaces such as query languages, application
program interfaces, and graphical user interfaces.
• Backup and Recovery : This RDBMS provides backup and recovery subsystems that is responsible for
recovery from hardware and software failures.
DATA ABSTRACTION.
A major purpose of a database system is to provide users with an abstract view of the
data. That is the system hides certain details of how the data are stored and
maintained.
The physical level describes complex low- level data structures in detail.
It also contains the method of deriving the objects in the conceptual view from the objects
in the internal view.
The capacity to change data at one layer does not affect the data at another layer is called data independence.
It is the capacity to change the internal level without having to change either the schemas at the conceptual
or external level.
Changes to the internal schema may be needed because some physical files had to be reorganized.
Physical data independence refers to the data insulation of an application from the physical storage structure
only, it is easier to achieve than logical data independence.
The physical data independence are:
o File Organization
o Database Architecture
o Database Models
DIFFERENCE BETWEEN SERIAL AND DIRECT ACCESS FILE
ORGANIZATION.
Advantages
o Search time is less.
o There are fewer index entries than there are records in the data file.
o Quick access to the records even when the volume of records is high.
Disadvantages
o Additional file (index file) has to be created.
o Wastage of storage space by creating and maintaining the index file.
o Always indirect retrieval of data because first search begins in the index files
then moves to the data file (No direct retrieval).
DBMS ARCHITECTURE.
DBMS is the only entity where user directly sits on DBMS and uses it.
It does not provide handy tools for end users and preferably database
designers and programmers use single tier architecture.
TWO-TIER CLIENT / SERVER ARCHITECTURE:
Most DBMS vendors provide ODBC drivers. A client program may connect to
several DBMS’s.
In this architecture some variation of client is also possible for example in some
DBMS's more functionality is transferred to the client including data dictionary,
optimization etc.
THREE-TIER CLIENT / SERVER ARCHITECTURE:
Advantages:
Simplicity: The relationship between the various layers is logically simple.
Data Security: The data security is provided by the DBMS.
Data Integrity: There is always link between the parent segment and the child
segment under it.
Efficiency: It is very efficient because when the database contains a large number
of one to many relationships and when the user requires large number of
transaction.
Disadvantages:
Implementation complexity
Database management problem
Lack of structural Independence.
Operational Anomalies
Network data model. Advantages:
In 1971, the Conference on Data Systems It is simple and easy to implement.
Languages (CODASYL) formally defined the network It can handle many relationships within the
models. organization.
In this model, data is represented by a collection of It has better data independence compared
records and the relationships are represented by to hierarchical model.
links.
Each record is collection of fields,
which contains only one data value. A link is an Disadvantages:
association between two records. More complex system of database structure
In the network model, entities are organized in a Lack of structural dependence.
graph, in which some entities can be accessed
through several paths.
Relation Data Model.
The relation data model was developed by E.F Codd in 1970.
Unlike, hierarchical and network model, there are no physical links.
All data is maintained in the form of tables consisting of rows and columns.
Each row (record) represents an entity and a column (field) represents an attribute of the entity.
In this model, data is organized in two-dimensional tables called relations. The tables or relation are
related to each other.
Relational Model Concepts
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2.Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Relation Schema: A relation schema represents the name of the relation with its
attributes.
5.Degree: The total number of attributes which in the relation is called the degree of
the relation.
6.Cardinality: Total number of rows present in the Table.
7.Column: The column represents the set of values for a specific attribute.
8.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10.Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
E-R diagram.
Entity: An entity is represented using rectangles. Entity
Attribute: Attributes are represented by means of
eclipses Relation Attribute
.
Relationship: Relationship is represented using
diamonds shaped box.
Three components of E-R model.
ER-Diagram is a visual representation of data that describes how data is related to each other.
Entity:
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles.
Rectangles are named with the entity set they represent.
Attribute:
An Attribute describes a property or characteristic of an entity.
Attributes are represented by means of eclipses.
Every eclipse represents one attribute and is directly connected to its entity (rectangle).
For example, Roll_No, Name and Birth date can be attributes of a student
Relationship:
A relationship type is a meaningful association between entity types.
Relationship is represented using diamond shaped box.
There are three types of relationship that exist between entities.
Binary Relationship
Recursive Relationship
Ternary Relationship
Binary Relationship: It means relation between two entities. This is further
divided into three types.
1. One to One:
Specification :
Types of Keys
A key is one of the attributes of a table to identify one or more
tuples/records of the table.
Primary key-A primary key uniquely identifies a tuple /record in a table. A
primary key cannot be duplicated for different records in a table.
Ex: Student_id, Bank_accno are examples for primary key
Candidate key-There may be more than one unique field in a table that
can be selected as primary key- All such fields that are unique for every
row of table are known as candidate keys.
Alternate keys- Those candidate keys that are not selected as primary keys
are known as alternate keys.
Foreign key- A field in a table that can be chosen as primary key of
another table is known as foreign key.
For ex: bank_accno in a student table that may have student_id as primary
key and bank_accno as the foreign key.
Composite key-A key that consists of two or more attributes to identify a
record in a table are known as composite keys.
Data warehouse
A data ware house is a repository of an organization's electronically stored data.
Data warehouse are designed to facilitate reporting and supporting data analysis.
The concept of data warehouses was introduced in late 1980’s.
Data mining is concerned with the analysis and picking out relevant information.
E.F.Codd was a computer Scientist who invented Relational model
for database management.
Rule Zero:
This rule states that for a system to qualify as on RDBMS, it must be
able to manage database entirely through the relational
capabalities.
CODD’s Rule AND Normalization
Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must
obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only
its relational capabilities. This is a foundation rule, which acts as a base for all the other
rules.
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
Third Normal Form (3NF)
•A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used
to achieve the data integrity.
•If there is no transitive dependency for non-prime
attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of
the following conditions for every non-trivial function
dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each element of Y is part of
some candidate key.
Example:
EMPLOYEE_DETAIL table:
Super key in the table above:
1.{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
Next
TopicDBMS BCNF