Unit 3 Notes
Unit 3 Notes
SUBMITTED BY:-
PROF. LUCKY GUPTA
Management department
rkgit
What is Data Management?
Data management is the practice of collecting, organizing, protecting, and storing an organization’s data so it
can be analyzed for business decisions. As organizations create and consume data at unprecedented rates, data
management solutions become essential for making sense of the vast quantities of data. Today’s leading data
management software ensures that reliable, up-to-date data is always used to drive decisions. The software helps
with everything from data preparation to cataloging, search, and governance, allowing people to quickly find the
information they need for analysis.
• Data management programs aim to complete these tasks in a cost-effective way and maintain the
security of the data they manage.
• As technology progresses, organizations and companies generate and consume data at unprecedented
rates, so data management processes are essential for turning large quantities of data into useable
information. Successful data management systems ensure data is reliable, up to date, accessible to
those who use it and secure from attacks and leaks.
Types of Data Management
• Data preparation is used to clean and transform raw data into the right shape and format for
analysis, including making corrections and combining data sets.
• Data pipelines enable the automated transfer of data from one system to another.
• ETLs (Extract, Transform, Load) are built to take the data from one system, transform it, and
load it into the organization’s data warehouse.
• Data catalogs help manage metadata to create a complete picture of the data, providing a summary
of its changes, locations, and quality while also making the data easy to find.
• Data warehouses are places to consolidate various data sources, contend with the many data types
businesses store, and provide a clear route for data analysis.
• Data governance defines standards, processes, and policies to maintain data security and integrity.
• Data architecture provides a formal approach for creating and managing data flow.
• Data security protects data from unauthorized access and corruption.
• Data modeling documents the flow of data through an application or organization.
Why data management is important
Data management is a crucial first step to employing effective data analysis at scale, which leads to
important insights that add value to your customers and improve your bottom line. With effective data
management, people across an organization can find and access trusted data for their queries. Some
benefits of an effective data management solution include:
• Visibility
Data management can increase the visibility of your organization’s data assets, making it easier for
people to quickly and confidently find the right data for their analysis. Data visibility allows your
company to be more organized and productive, allowing employees to find the data they need to better
do their jobs.
• Reliability
Data management helps minimize potential errors by establishing processes and policies for usage and
building trust in the data being used to make decisions across your organization. With reliable, up-to-
date data, companies can respond more efficiently to market changes and customer needs.
• Security
Data management protects your organization and its employees from data losses, thefts, and breaches
with authentication and encryption tools. Strong data security ensures that vital company information
is backed up and retrievable should the primary source become unavailable. Additionally, security
becomes more and more important if your data contains any personally identifiable information that
needs to be carefully managed to comply with consumer protection laws.
• Scalability
Data management allows organizations to effectively scale data and usage occasions with repeatable
processes to keep data and metadata up to date. When processes are easy to repeat, your organization
can avoid the unnecessary costs of duplication, such as employees conducting the same research over
and over again or re-running costly queries unnecessarily.
Challenges of data management
• Data Variation Challenge 1: Increased data volumes
• Incorrect data Every department within your organization has access to diverse
types of data and specific needs to maximize its value. Traditional
• Governance and models require IT to prepare the data for each use case and then
storage maintain the databases or files. As more data accumulates, it’s easy
for an organization to become unaware of what data it has, where the
• Data security data is, and how to use it.
• Skill shortage
Challenge 2: New roles for analytics
As your organization increasingly relies on data-driven decision-making, more of your people are asked
to access and analyze data. When analytics falls outside a person’s skill set, understanding naming
conventions, complex data structures, and databases can be a challenge. If it takes too much time or
effort to convert the data, analysis won’t happen and the potential value of that data is diminished or
lost.
Application consistency is nothing more than transaction consistency between programs. For
example, if the banking program communicates with a tax program on the computer, application
consistency means that the information moving between the programs will remain in its original state.
Without application consistency, the same problems arise here as do under flawed transaction
consistency: there will be no way to tell whether a value entered into the system remains correct over
time.
Data Redundancy
Data redundancy occurs when the same piece of data exists in multiple places, whereas data
inconsistency is when the same data exists in different formats in multiple tables. Unfortunately, data
redundancy can cause data inconsistency, which can provide a company with unreliable and/or
meaningless information.
Data redundancy refers to the practice of keeping data in two or more places within a database or data
storage system. Data redundancy ensures an organization can provide continued operations or services
in the event something happens to its data -- for example, in the case of data corruption or data loss. The
concept applies to areas such as databases, computer memory and file storage systems
Data redundancy can occur within an organization intentionally or accidentally. If done intentionally,
the same data is kept in different locations with the organization making a conscious effort to protect it
and ensure its consistency. This data is often used for backups or disaster recovery.
Benefits of Data Redundancy
Data redundancy has benefits or risks depending on the implementation. Potential benefits include the
following:
Helps protect data. When data cannot be accessed, redundant data can help replace or rebuild
missing data.
Data accuracy. Hosting multiple locations for the same data means that a data management system
can evaluate any differences, meaning data is assured to be accurate.
Access speed. Some locations for data may be easier to access than others for an organization that
spans different physical areas. A person within an organization may access data from redundant
sources to have faster access to the same data.
Drawbacks of Data Redundancy
• Increase in database sizes. More storage space is needed for a redundant copy of a large amount
of data. A larger database may also cause longer load times or create confusion if employees do not
know where certain data is stored.
• Cost. More need for storage also means an increased cost in addition to any extra overhead or
resources needed to maintain and update redundant data.
• Data discrepancies. Storing data in multiple locations can cause discrepancies such as missing
records or incorrect values if the data is not continually updated.
• Corruption. Storing multiple copies of the same data increases the chance of data corruption.
Damaged data could result from errors in writing, reading, storage or processing of data.
Basic database Administration
What is DBMS?
Database Management System (DBMS) is software for storing and retrieving users’ data while considering
appropriate security measures. It consists of a group of programs that manipulate the database. The DBMS accepts the
request for data from an application and instructs the operating system to provide the specific data. In large systems, a
DBMS helps users and other third-party software store and retrieve data.
DBMS allows users to create their own databases as per their requirements. The term “DBMS” includes the user of
the database and other application programs. It provides an interface between the data and the software application.
A simple example of a university database. This database is maintaining information concerning students, courses, and grades in a
university environment. The database is organized as five files:
1. The STUDENT file stores the data of each student
2. The COURSE file stores contain data on each course.
3. The SECTION stores information about sections in a particular course.
4. The GRADE file stores the grades which students receive in the various sections
5. The TUTOR file contains information about each professor.
History of DBMS
Here, are the important landmarks from the history of DBMS:
1960 – Charles Bachman designed the first DBMS system
1970 – Codd introduced IBM’S Information Management System (IMS)
1976- Peter Chen coined and defined the Entity-relationship model, also known as the ER
model
1980 – Relational Model becomes a widely accepted database component
1985- Object-oriented DBMS develops.
1990s- Incorporation of object-orientation in relational DBMS.
1991- Microsoft ships MS access, a personal DBMS, and that displaces all other personal
DBMS products.
1995: First Internet database applications
1997: XML applied to database processing. Many vendors begin to integrate XML into DBMS
products.
Characteristics of DBMS
The characteristics and properties of a Database Management System are as follows:
Provides security and removes redundancy
Self-describing nature of a database system
Insulation between programs and data abstraction
Support of multiple views of the data
Sharing of data and multiuser transaction processing
Database Management Software allows entities and relations among them to form tables.
It follows the ACID concept ( Atomicity, Consistency, Isolation, and Durability).
DBMS supports a multi-user environment that allows users to access and manipulate data in parallel.
DBMS vs. Flat File
DBMS Flat File Management System
Design to fulfill the need of small and large businesses It is only limited to smaller DBMS systems.
Expensive. But in the long term Total Cost of Ownership is It’s cheaper
cheap
The end users are the people who interact with the
database management system. They conduct
End-Users
various operations on databases like retrieving,
updating, deleting, etc.
Popular DBMS Software
• MySQL
• Microsoft Access
• Oracle
• PostgreSQL
• dBASE
• FoxPro
• SQLite
• IBM DB2
• LibreOffice Base
• MariaDB
• The cost of Hardware and Software of a DBMS is quite high, which increases the budget of your organization.
• Most database management systems are often complex, so training users to use the DBMS is required.
• In some organizations, all data is integrated into a single database that can be damaged because of electric failure
or corruption in the storage media.
• Using the same program at a time by multiple users sometimes leads to data loss.
A Database Architecture is a representation of DBMS design. It helps to design, develop, implement, and
maintain the database management system. A DBMS architecture allows dividing the database system into
individual components that can be independently modified, changed, replaced, and altered. It also helps to
understand the components of a database.
A Database stores critical information and helps access data quickly and securely. Therefore, selecting the correct
Architecture of DBMS helps in easy and efficient data management.
DATABASE INSTANCE
It is important that we distinguish these two terms individually. Database schema is the skeleton of database.
It is designed when the database doesn't exist at all. Once the database is operational, it is very difficult to
make any changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a
valid state, by diligently following all the validations, constraints, and conditions that the database designers
have imposed.
Fields, Records, Table, View, Reports and Queries
• Table: In Relational database model, a table is a collection of data elements organized in terms of rows and
columns. A table is also considered as a convenient representation of relations. But a table can have duplicate row
of data while a true relation cannot have duplicate data. Table is the most simplest form of data storage. Below is
an example of an Employee table.
• Fields: A Field consists of a grouping of characters. A data field represents an attribute ( a characteristic or quality)
of some entity (object, person, place, or event)
• Record: A single entry in a table is called a Tuple or Record or Row. A tuple in a table represents a set or related
data. For Example,
1 Alex 25 15000
•Attributes: A table consists of several records(row), each record can be broken down into several smaller parts of
data known as Attributes. The above Employee table consist of four attributes, ID, Name, Age and Salary.
•File: A group of related records is a File. Files are frequently classified by the application for which they are
primarily used.
•A primary key in a file is the field whose value identifies a record among others in a
data file.
•Database: It is an integrated collection of logically related records or files.
•View: A database view is a subset of a database and is based on a query that runs on one or more database tables.
Database views are saved in the database as named queries and can be used to save frequently used, complex
queries. Views are kind of virtual tables. A view also has rows and columns as they are in a real table in the
database. We can create a view by selecting fields from one or more tables present in the database. A View can
either have all the rows of a table or specific rows based on certain condition.
Student_Detail
Student_Marks
Name Address
Stephan Delhi
Rohan Noida
Vidisha Ghaziabad
Alina Gurugram
we will create a view named StudentNames from the
table StudentDetails. Query:
CREATE VIEW StudentNames
AS SELECT S_ID, NAME
FROM StudentDetails ORDER BY NAME;
STU_ID Name
1 Stephan
2 Rohan
3 Vidisha
4 Alina
Creating View from multiple tables:
CREATE VIEW MarksView
AS SELECT StudentDetails.NAME, StudentDetails.ADDRESS,
StudentMarks.MARKS
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;
DROP VIEW view_name; view_name: Name of the View which we want to delete.
For example, if we want to delete the View MarksView, we can do this as: