0% found this document useful (0 votes)
88 views

DATABASE CONCEPTS NOTES

The document provides an overview of database concepts, including definitions of databases, data, and information, as well as their applications in various fields such as banking and finance. It discusses the advantages of databases, the evolution from manual to electronic data processing, and the data processing cycle. Additionally, it covers database architecture, models, and users, emphasizing the importance of data abstraction and independence.

Uploaded by

Shamith Rai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

DATABASE CONCEPTS NOTES

The document provides an overview of database concepts, including definitions of databases, data, and information, as well as their applications in various fields such as banking and finance. It discusses the advantages of databases, the evolution from manual to electronic data processing, and the data processing cycle. Additionally, it covers database architecture, models, and users, emphasizing the importance of data abstraction and independence.

Uploaded by

Shamith Rai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Chapter-13

DATABASE CONCEPTS
DATABASE
 A Database is a collection of logically related data
organized in a way that data can be easily accessed,
managed and updated.
DATA

 Data is a collection of facts, numbers, letters or symbols that the computer


process into meaningful information.

INFORMATION
• Information is processed data, stored, or transmitted by a computer.
APPLICATIONS OF DATABASE.

Banking: For customer information, accounts and loans, and


banking transactions.
Colleges: For student information, course registrations and
grades.

Credit card transactions: For purchases on credit cards and


generation of monthly statements.

Finance: For storing information about holdings, sales and


purchases of financial instruments such as stocks and bonds.

Sales: For customer, product, and purchase information.


Telecommunication: For keeping records of call made,
generating monthly bills, maintaining balance on prepaid
calling cards, and storing information about the
communication networks.

Aadhaar database: This is the biggest database in the world


storing a data about 60 million people residing in India.
ADVANTAGES OF DATABASE.

• Redundancy can be minimized or controlled: In DBMS environment if


redundancy is present, then it can be controlled by propagating updates in
all the places where ever redundant data is present.

• Data Integrity: Data Integrity refers to the correctness of the data in the
database. In other words, the data available in the database is reliable
data.

• Data Sharing: In DBMS, data is stored in the centralized database and all the
permitted users can access the same piece of information required at the
same time.

• Database Security: DBMS provides a variety of security mechanisms for the


user to protect his or her data stored in the database.

• Supports Concurrent access: DBMS supports concurrent access to the same


data stored in the database by applying locking and time stamp
mechanisms.
EVOLUTION OF DATABASE
MANUAL DATA PROCEESING AND ELECTRONIC DATA PROCESSING

Manual Data Processing Computerized Data Processing

1 The volume of data, which can be The volume of data, which can be
processed, is limited. processed is large
2 Requires large quantity of paper Requires less quantity of paper
3 Speed and accuracy is executed is limited Faster and Accurate
4 Labour cost is high Labour cost is low
5 Storage medium is paper. Storage medium is Hard disk etc.
DATA PROCESSING CYCLE.
DATA PROCESSING CYCLE.
Data Collection: It is the process of systematic gathering of data from various sources
that has been systematically observed, recorded and organized.

Data Input: The raw data is put into the computer using a keyboard, mouse or other
devices such as the scanner, microphone and the digital camera.

Data Processing: Processing is the series of actions or operations on the input data to
generate outputs.
Data storage: Data and information should be stored in memory so
that it can be accessed later.

Output: The result obtained after processing the data must be


presented to the user in user understandable form. The output can
be generated in the form of report as hard copy or soft copy.

Communication: Computers now-a- days have communication


ability which increases their power. With
wired or wireless communication connections, data may be input
from a far place, processed in a remote area and stored in several
different places and then transmitted by modem as an e- mail or
posted to the website where the online services are rendered.
File : File is a basic unit of storage in computer system.

Database : A Database is a collection of


logically related data organized a way that
data can be easily accessed, managed or
updated.

FIELD

Each column is identified by a distinct header called


attribute or filed.

RECORD/TUPLE
A single entry in a table is called a record or row. A
record in a table represents set of related data.
Records are also called the tuple.
DATABASE TERMS

ENTITY
 An Entity can be any object, place, person or class.
 In E-R Diagram, an entity is represented using rectangles.

INSTANCE
The collection of information stored in the
database at a particular moment is called an
instance of the database.

ATTRIBUTE/FIELD

It is defined as a named column of a relation.


Ex: In STUDENT table, Regno, Name, Age, Class,
Combination and Marks.
DATABASE TERMS

RELATION
 A relation is defined as a table with columns and
rows. Data can be stored in the form of a two-
dimensional table.

DOMAIN
It is defined as a set of allowed values for one or
more attributes.

TABLE
A table is a collection of data elements organized in
terms of rows and columns. Table is the simplest form
of data storage.

KEY
It is a column or columns which identifies the each
row or tuple.
DATA TYPES OF DBMS

• Integer

• Logical data type/Boolean

• Characters

• Strings

• Date fields

• Text fields

• Memo data type


DATABASE USERS.
To design, use and maintain the database, many peoples are involved. The people who
work with the database include:
End Users, System Analysts, Application programmers, Database Administrators (DBA)

End Users (Database Users)


Database users are those who interact with the database in order to query and update
the database, and generate reports.

System Analysts
System analysts determine the requirement of end users; (especially naïve users), to
create a solution for their business need and focus on non-technical and technical
aspects.

Application programmers
These are the computer professionals who implement the specifications given by the
system analysts and develop the application programs.

Database Administrators (DBA)


DBA is a person who has central control over both data and application.
Some of the responsibilities of DBA are authorization access, schema definition and
modification, new software installation and security enforcement and administration
DBMS – DATA BASE MANAGEMENT SYSTEM

• A DBMS is a software that allows creation definition


and manipulation of Database.

• DBMS is a tool used to perform any kind of operation


on data in Database.

• DBMS also provide protection and security to


database.
FEATURES OF DATABASE SYSTEM.

• Controlled data Redundancy can be minimized or controlled: In DBMS environment if redundancy is


present, then it can be controlled by propagating updates in all the places where ever redundant
data is present.

• Enforcing Data Integrity: Data Integrity refers to the correctness of the data in the database. In other
words, the data available in the database is reliable data.

• Data Sharing: In DBMS, data is stored in the centralized database and all the permitted users can
access the same piece of information required at the same time.

• Database Security: DBMS provides a variety of security mechanisms for the user to protect his or her
data stored in the database.

• Supports Concurrent access: DBMS supports concurrent access to the same data stored in the
database by applying locking and time stamp mechanisms.

• Multiple user interfaces: In order to meet the needs of various users having different technicial
knowledge.DBMS provides different types of interfaces such as query languages, application
program interfaces, and graphical user interfaces.

• Backup and Recovery : This RDBMS provides backup and recovery subsystems that is responsible for
recovery from hardware and software failures.
DATA ABSTRACTION.
A major purpose of a database system is to provide users with an abstract view of the
data. That is the system hides certain details of how the data are stored and
maintained.

There are three level of data abstraction.

Physical Level( Internal level)

Conceptual Level (Logical level)

View Level(External level)


Physical Level:
It is the lowest level of abstraction describes how the data are actually stored.

The physical level describes complex low- level data structures in detail.

It contains the definition of stored record and


method of representing the data fields and access aid used.
Conceptual Level:

It is the next higher level of abstraction that


describes what data are stored in the database and what relationships exist among those
data.

It also contains the method of deriving the objects in the conceptual view from the objects
in the internal view.

External /View Level:


It is the highest level of abstraction that describes only part of the entire database.
It also contains the method of deriving the objects in the external view from the objects
in the conceptual view.
Data Independence

The capacity to change data at one layer does not affect the data at another layer is called data independence.

 Two types of data independence are:


o Physical Data Independence
o Logical Data Independence
physical data independence.

 It is the capacity to change the internal level without having to change either the schemas at the conceptual
or external level.
 Changes to the internal schema may be needed because some physical files had to be reorganized.
 Physical data independence refers to the data insulation of an application from the physical storage structure
only, it is easier to achieve than logical data independence.
 The physical data independence are:
o File Organization
o Database Architecture
o Database Models
DIFFERENCE BETWEEN SERIAL AND DIRECT ACCESS FILE
ORGANIZATION.

 Serial File Organization:


 Organization is continuous and simple.
 Data processing, which requires the use of all records, is best suited to use this
method.

 Direct Access File Organization


 The type of storage device used is comparatively expensive.
 It is less efficient in the usage of storage space compared to the sequential
organization.
ISAM with example.
 The index sequential file organization is a combination of Sequential file
organization and an Index file.
 Also referred as ISAM (indexed sequential access method).
 Data is stored physically in adjacent storage locations and there exists a logical
relationship among the data stored by using ordering field.
 An additional file called as Index file would be created, which contains n number of
records.
 Each record of index file has two fields:
o The field is of the same data type as the ordering key field and
o The second field is a pointer to a disk block (a block address).
ADVANTAGES AND DISADVANTAGES OF ISAM.

 Advantages
o Search time is less.
o There are fewer index entries than there are records in the data file.
o Quick access to the records even when the volume of records is high.

 Disadvantages
o Additional file (index file) has to be created.
o Wastage of storage space by creating and maintaining the index file.
o Always indirect retrieval of data because first search begins in the index files
then moves to the data file (No direct retrieval).
DBMS ARCHITECTURE.

The design of Database Management System highly depends on its architecture.


It can be centralized or decentralized or hierarchical.

Database architecture is logically divided into three types.

Logical one-tier in 1-tier Architecture


Logical two-tier Client/Server Architecture.
Logical three-tier Client/Server Architecture.
LOGICAL ONE-TIER IN 1-TIER ARCHITECTURE:

DBMS is the only entity where user directly sits on DBMS and uses it.

Any changes done here will directly be on DBMS itself.

It does not provide handy tools for end users and preferably database
designers and programmers use single tier architecture.
TWO-TIER CLIENT / SERVER ARCHITECTURE:

Two-tier Client / Server architecture is used for User


Interface program and Application
Programs that runs on client side.

An interface called ODBC (Open Database Connectivity) provides an API that


allows client side program to call the DBMS.

Most DBMS vendors provide ODBC drivers. A client program may connect to
several DBMS’s.
In this architecture some variation of client is also possible for example in some
DBMS's more functionality is transferred to the client including data dictionary,
optimization etc.
THREE-TIER CLIENT / SERVER ARCHITECTURE:

Three-tier Client / Server database architecture is


commonly used architecture for web applications.
Intermediate layer called Application server or Web
Server stores the web connectivity software and the
business logic (constraints) part of application used
to access the right amount of data from the
database server.

This layer acts like medium for sending partially


processed data between the database server and
the client.
Database Model.

Data model is a collection of conceptual tools for


describing data, data relationship, data semantics
and constraints.

A data model generally consists of Data model


theory, which is a formal description of how data
may be structured and used.
Data model instance, which is a practical data
model designed for a particular application.

The process of applying model theory to create a


data model instance is known as data modelling.

In history of database design, three models have been


in use.
Hierarchical Model
Network Model
Relational Model
Hierarchical data model.
The Hierarchical data model organizes data in a tree structure.
In this data model, data is represented by a
collection of records and the relationships are represented by links.
In this model each entity has only one parent but can have several children. At the
top of hierarchy there is only one entity which is called Root node.

Advantages:
Simplicity: The relationship between the various layers is logically simple.
Data Security: The data security is provided by the DBMS.
Data Integrity: There is always link between the parent segment and the child
segment under it.

Efficiency: It is very efficient because when the database contains a large number
of one to many relationships and when the user requires large number of
transaction.

Disadvantages:
Implementation complexity
Database management problem
Lack of structural Independence.
Operational Anomalies
Network data model. Advantages:
In 1971, the Conference on Data Systems It is simple and easy to implement.
Languages (CODASYL) formally defined the network It can handle many relationships within the
models. organization.
In this model, data is represented by a collection of It has better data independence compared
records and the relationships are represented by to hierarchical model.
links.
Each record is collection of fields,
which contains only one data value. A link is an Disadvantages:
association between two records. More complex system of database structure
In the network model, entities are organized in a Lack of structural dependence.
graph, in which some entities can be accessed
through several paths.
Relation Data Model.
 The relation data model was developed by E.F Codd in 1970.
 Unlike, hierarchical and network model, there are no physical links.
 All data is maintained in the form of tables consisting of rows and columns.
 Each row (record) represents an entity and a column (field) represents an attribute of the entity.
 In this model, data is organized in two-dimensional tables called relations. The tables or relation are
related to each other.
Relational Model Concepts
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2.Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Relation Schema: A relation schema represents the name of the relation with its
attributes.
5.Degree: The total number of attributes which in the relation is called the degree of
the relation.
6.Cardinality: Total number of rows present in the Table.
7.Column: The column represents the set of values for a specific attribute.
8.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10.Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain
E-R diagram.
Entity: An entity is represented using rectangles. Entity
Attribute: Attributes are represented by means of
eclipses Relation Attribute
.
Relationship: Relationship is represented using
diamonds shaped box.
Three components of E-R model.

ER-Diagram is a visual representation of data that describes how data is related to each other.
Entity:
An Entity can be any object, place, person or class.
In E-R Diagram, an entity is represented using rectangles.
Rectangles are named with the entity set they represent.
Attribute:
An Attribute describes a property or characteristic of an entity.
Attributes are represented by means of eclipses.
Every eclipse represents one attribute and is directly connected to its entity (rectangle).
For example, Roll_No, Name and Birth date can be attributes of a student
Relationship:
A relationship type is a meaningful association between entity types.
Relationship is represented using diamond shaped box.
There are three types of relationship that exist between entities.
Binary Relationship
Recursive Relationship
Ternary Relationship
Binary Relationship: It means relation between two entities. This is further
divided into three types.
1. One to One:

This type of relationship is rarely seen in real world.


Generalization:

In generalization, a number of entities are bought together into


one generalized entity based on their similar characterstics.
For example, pigeon,sparrow,crow can all be generalized as
Birds.

Specification :

Specification is the opposite of generalization.


In specialization , a group of entities is divided in to sub groups
based on their characterstics.
Take a group of person for example. Aperson has name , dob
gender etc.
Similarly , in a school database, persons can be specialized as
teacher, student, or a staff, based on what role they play in school
as entities.
DATABASE KEYS

Types of Keys
A key is one of the attributes of a table to identify one or more
tuples/records of the table.
Primary key-A primary key uniquely identifies a tuple /record in a table. A
primary key cannot be duplicated for different records in a table.
Ex: Student_id, Bank_accno are examples for primary key
Candidate key-There may be more than one unique field in a table that
can be selected as primary key- All such fields that are unique for every
row of table are known as candidate keys.
Alternate keys- Those candidate keys that are not selected as primary keys
are known as alternate keys.
Foreign key- A field in a table that can be chosen as primary key of
another table is known as foreign key.
For ex: bank_accno in a student table that may have student_id as primary
key and bank_accno as the foreign key.
Composite key-A key that consists of two or more attributes to identify a
record in a table are known as composite keys.
Data warehouse
 A data ware house is a repository of an organization's electronically stored data.
 Data warehouse are designed to facilitate reporting and supporting data analysis.
 The concept of data warehouses was introduced in late 1980’s.

Components of Data warehouse.


 The components of data warehouse are:
o Data Source
o Data Transformation
o Reporting
o Metadata
 Additional components are Dependent data marts, Logical Data marts, Operational Data
store.
DATA MINING

 Data mining is concerned with the analysis and picking out relevant information.
E.F.Codd was a computer Scientist who invented Relational model
for database management.

Based on Relational model, Relational database was created.

Rule Zero:
This rule states that for a system to qualify as on RDBMS, it must be
able to manage database entirely through the relational
capabalities.
CODD’s Rule AND Normalization

Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must
obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only
its relational capabilities. This is a foundation rule, which acts as a base for all the other
rules.

Rule 1: Information Rule


The data stored in a database, may it be user data or metadata, must be a value of some
table cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule


Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column value).
No other means, such as pointers, can be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very
important rule because a NULL can be interpreted as one the following − data is missing, data is
not known, or data is not applicable.

Rule 4: Active Online Catalog


The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.

Rule 5: Comprehensive Data Sub-Language Rule


A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without any
help of this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.

Rule 7: High-Level Insert, Update, and Delete Rule


A database must support high-level insertion, updation, and deletion. This must not be limited to a
single row, that is, it must also support union, intersection and minus operations to yield sets of data
records.

Rule 8: Physical Data Independence


The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.

Rule 9: Logical Data Independence


The logical data in a database must be independent of its user’s view (application). Any change in
logical data must not affect the applications using it. For example, if two tables are merged or one is
split into two different tables, there should be no impact or change on the user application. This is
one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in
the application. This rule makes a database independent of the front-end
application and its interface.

Rule 11: Distribution Independence


The end-user must not be able to see that the data is distributed over various
locations. Users should always get the impression that the data is located at
one site only. This rule has been regarded as the foundation of distributed
database systems.

Rule 12: Non-Subversion Rule


If a system has an interface that provides access to low-level records, then the
interface must not be able to subvert the system and bypass security and
integrity constraints.
Normalization
•Normalization is the process of organizing the data in the
database.
Types of Normal Forms
•Normalization is used to minimize the redundancy from a There are the four types of normal forms:
relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and
Deletion Anomalies.
•Normalization divides the larger table into the smaller
table and links them using relationship.
•The normal form is used to reduce redundancy from the
database table.
First Normal Form (1NF)
•A relation will be 1NF if it contains an atomic value.
•It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
•First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.EMPLOYEE table:
The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry
25 Biology
47 English
83 Math
83 Computer

TEACHER_ID TEACHER_AGE

25 30
47 35
83 38
Third Normal Form (3NF)
•A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used
to achieve the data integrity.
•If there is no transitive dependency for non-prime
attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of
the following conditions for every non-trivial function
dependency X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each element of Y is part of
some candidate key.
Example:
EMPLOYEE_DETAIL table:
Super key in the table above:
1.{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:

Next
TopicDBMS BCNF

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMP_ZIP EMP_STATE EMP_CITY


EMP_ID EMP_NA EMP_ZIP EMP_ST EMP_CIT
201010 UP Noida ME ATE Y

02228 US Boston 222 Harry 201010 UP Noida


60007 US Chicago 333 Stephan 02228 US Boston
06389 UK Norwich 444 Lan 60007 US Chicago
462007 MP Bhopal 555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

You might also like