0% found this document useful (0 votes)
119 views52 pages

Unit 3 Notes

This document discusses data management. It defines data management as collecting, organizing, protecting and storing an organization's data so it can be analyzed for business decisions. As organizations create data at unprecedented rates, data management solutions are essential to make sense of vast quantities of data. The document then discusses types of data management including data preparation, data pipelines, ETLs, data catalogs, data warehouses, data governance, data architecture, data security, and data modeling. It also discusses challenges of data management and best practices for effective data management.

Uploaded by

Shivam Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views52 pages

Unit 3 Notes

This document discusses data management. It defines data management as collecting, organizing, protecting and storing an organization's data so it can be analyzed for business decisions. As organizations create data at unprecedented rates, data management solutions are essential to make sense of vast quantities of data. The document then discusses types of data management including data preparation, data pipelines, ETLs, data catalogs, data warehouses, data governance, data architecture, data security, and data modeling. It also discusses challenges of data management and best practices for effective data management.

Uploaded by

Shivam Pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

UNIT 3

SUBMITTED BY:-
PROF. LUCKY GUPTA
Management department
rkgit
What is Data Management?
Data management is the practice of collecting, organizing, protecting, and storing an organization’s data so it
can be analyzed for business decisions. As organizations create and consume data at unprecedented rates, data
management solutions become essential for making sense of the vast quantities of data. Today’s leading data
management software ensures that reliable, up-to-date data is always used to drive decisions. The software helps
with everything from data preparation to cataloging, search, and governance, allowing people to quickly find the
information they need for analysis.
• Data management programs aim to complete these tasks in a cost-effective way and maintain the
security of the data they manage.
• As technology progresses, organizations and companies generate and consume data at unprecedented
rates, so data management processes are essential for turning large quantities of data into useable
information. Successful data management systems ensure data is reliable, up to date, accessible to
those who use it and secure from attacks and leaks.
Types of Data Management
• Data preparation is used to clean and transform raw data into the right shape and format for
analysis, including making corrections and combining data sets.
• Data pipelines enable the automated transfer of data from one system to another.
• ETLs (Extract, Transform, Load) are built to take the data from one system, transform it, and
load it into the organization’s data warehouse.
• Data catalogs help manage metadata to create a complete picture of the data, providing a summary
of its changes, locations, and quality while also making the data easy to find.
• Data warehouses are places to consolidate various data sources, contend with the many data types
businesses store, and provide a clear route for data analysis.
• Data governance defines standards, processes, and policies to maintain data security and integrity.
• Data architecture provides a formal approach for creating and managing data flow.
• Data security protects data from unauthorized access and corruption.
• Data modeling documents the flow of data through an application or organization.
Why data management is important
Data management is a crucial first step to employing effective data analysis at scale, which leads to
important insights that add value to your customers and improve your bottom line. With effective data
management, people across an organization can find and access trusted data for their queries. Some
benefits of an effective data management solution include:
• Visibility
Data management can increase the visibility of your organization’s data assets, making it easier for
people to quickly and confidently find the right data for their analysis. Data visibility allows your
company to be more organized and productive, allowing employees to find the data they need to better
do their jobs.
• Reliability
Data management helps minimize potential errors by establishing processes and policies for usage and
building trust in the data being used to make decisions across your organization. With reliable, up-to-
date data, companies can respond more efficiently to market changes and customer needs.
• Security
Data management protects your organization and its employees from data losses, thefts, and breaches
with authentication and encryption tools. Strong data security ensures that vital company information
is backed up and retrievable should the primary source become unavailable. Additionally, security
becomes more and more important if your data contains any personally identifiable information that
needs to be carefully managed to comply with consumer protection laws.
• Scalability
Data management allows organizations to effectively scale data and usage occasions with repeatable
processes to keep data and metadata up to date. When processes are easy to repeat, your organization
can avoid the unnecessary costs of duplication, such as employees conducting the same research over
and over again or re-running costly queries unnecessarily.
Challenges of data management
• Data Variation Challenge 1: Increased data volumes

• Incorrect data Every department within your organization has access to diverse
types of data and specific needs to maximize its value. Traditional
• Governance and models require IT to prepare the data for each use case and then

storage maintain the databases or files. As more data accumulates, it’s easy
for an organization to become unaware of what data it has, where the
• Data security data is, and how to use it.

• Skill shortage
Challenge 2: New roles for analytics
As your organization increasingly relies on data-driven decision-making, more of your people are asked
to access and analyze data. When analytics falls outside a person’s skill set, understanding naming
conventions, complex data structures, and databases can be a challenge. If it takes too much time or
effort to convert the data, analysis won’t happen and the potential value of that data is diminished or
lost.

Challenge 3: Compliance requirements


Constantly changing compliance requirements make it a challenge to ensure people are using the right
data. An organization needs its people to quickly understand what data they should or should not be
using—including how and what personally identifiable information (PII) is ingested, tracked, and
monitored for compliance and privacy regulations.
Data Management Practices
1. Clearly identify your business goals
Just like in every business practice, the first step is identifying your organization’s goals. Setting goals
will help determine the process for collecting, storing, managing, cleaning, and analyzing data. Clearly
defined business objectives ensure you’re only keeping and organizing data relevant for decision-making
and prevent your data management software from becoming overcrowded and unmanageable.
2. Focus on the quality of data
You set up a data management system to provide your organization with reliable data, so put the
processes in place to improve the quality of that data. First create goals to streamline your data
collection and storage, but make sure to complete regular checks for accuracy so data does not become
outdated or stale in any way that can negatively impact analytics. These processes should also identify
incorrect or inconsistent formatting, spelling errors, and other errors that will impact results. Training
team members on the proper process for inputting data and setting up data prep automation is another
way to ensure data is correct from the beginning.
3. Allow the right people to access the data
Having quality data is half the battle. You also need to make sure the right people can access that data
when and where they need it. Instead of issuing blanket rules for everyone in the company, it is often
smart to set up different levels of permissions so each person can access the relevant data to do their
jobs. It can be difficult to find the right balance between convenience and security, but if your team
cannot access the data they need efficiently, it can lead to a loss of time and money.
4. Prioritize data security
Data should be appropriately accessible inside your organization, but you must put protections in place
to keep your data secure from outsiders. Train your team members on how to handle data properly, and
ensure your processes meet compliance requirements. Be prepared for the worst-case scenario and have
a strategy in place for handling a potential breach. Finding the right data management software can
help keep your data secure and safe.
Data Independence
Physical Data Independence
Physical Data Independence means changing the physical level without affecting the logical level or
conceptual level. Using this property, we can change the storage device of the database without affecting
the logical schema.
The changes in the physical level may include changes using the following −
 A new storage device like magnetic tape, hard disk, etc.
 A new data structure for storage.
 A different data access method or using an alternative files organization technique.
 Changing the location of the database.
Logical Data Independence
Logical view of data is the user view of the data. It presents data in the form that can be accessed by the
end users.
Database administrator is the one who decides what information is to be kept in the database and how
to use the logical level of abstraction. It provides the global view of Data. It also describes what data is
to be stored in the database along with the relationship.
Users cannot manipulate the logical structure of the database.
The changes in the logical level may include −
 Change the data definition.
 Adding, deleting, or updating any new attribute, entity or relationship in the database.
Data Consistency
Data consistency is the process of keeping information uniform as it moves across a network and
between various applications on a computer. There are typically three types of data consistency: point in
time consistency, transaction consistency, and application consistency.
Point in time
Point in time consistency deals with ensuring that all elements of a system are uniform at a specific
moment in time. This prevents loss of data during system crashes, improper shutdowns, and other
problems on the network. It functions by referencing pieces of data on the system via timestamps and
other markers of consistency, allowing the system to be restored to a specific moment in time with each
piece of data in its original place. Without point in time consistency, there would be no guarantee that all
information on a crashing computer could be restored to its pre-crash state.
Transaction consistency is consistency of a piece of data across a working transaction within the
computer. For example, a banking program might originally request an end user's starting account
balance. Without transaction consistency, nothing entered into a program remains reliable.

Application consistency is nothing more than transaction consistency between programs. For
example, if the banking program communicates with a tax program on the computer, application
consistency means that the information moving between the programs will remain in its original state.
Without application consistency, the same problems arise here as do under flawed transaction
consistency: there will be no way to tell whether a value entered into the system remains correct over
time.
Data Redundancy
Data redundancy occurs when the same piece of data exists in multiple places, whereas data
inconsistency is when the same data exists in different formats in multiple tables. Unfortunately, data
redundancy can cause data inconsistency, which can provide a company with unreliable and/or
meaningless information.

Data redundancy refers to the practice of keeping data in two or more places within a database or data
storage system. Data redundancy ensures an organization can provide continued operations or services
in the event something happens to its data -- for example, in the case of data corruption or data loss. The
concept applies to areas such as databases, computer memory and file storage systems

Data redundancy can occur within an organization intentionally or accidentally. If done intentionally,
the same data is kept in different locations with the organization making a conscious effort to protect it
and ensure its consistency. This data is often used for backups or disaster recovery.
Benefits of Data Redundancy
Data redundancy has benefits or risks depending on the implementation. Potential benefits include the
following:
 Helps protect data. When data cannot be accessed, redundant data can help replace or rebuild
missing data.
 Data accuracy. Hosting multiple locations for the same data means that a data management system
can evaluate any differences, meaning data is assured to be accurate.
 Access speed. Some locations for data may be easier to access than others for an organization that
spans different physical areas. A person within an organization may access data from redundant
sources to have faster access to the same data.
Drawbacks of Data Redundancy
• Increase in database sizes. More storage space is needed for a redundant copy of a large amount
of data. A larger database may also cause longer load times or create confusion if employees do not
know where certain data is stored.

• Cost. More need for storage also means an increased cost in addition to any extra overhead or
resources needed to maintain and update redundant data.

• Data discrepancies. Storing data in multiple locations can cause discrepancies such as missing
records or incorrect values if the data is not continually updated.

• Corruption. Storing multiple copies of the same data increases the chance of data corruption.
Damaged data could result from errors in writing, reading, storage or processing of data.
Basic database Administration
What is DBMS?
Database Management System (DBMS) is software for storing and retrieving users’ data while considering
appropriate security measures. It consists of a group of programs that manipulate the database. The DBMS accepts the
request for data from an application and instructs the operating system to provide the specific data. In large systems, a
DBMS helps users and other third-party software store and retrieve data.

DBMS allows users to create their own databases as per their requirements. The term “DBMS” includes the user of
the database and other application programs. It provides an interface between the data and the software application.
A simple example of a university database. This database is maintaining information concerning students, courses, and grades in a
university environment. The database is organized as five files:
1. The STUDENT file stores the data of each student
2. The COURSE file stores contain data on each course.
3. The SECTION stores information about sections in a particular course.
4. The GRADE file stores the grades which students receive in the various sections
5. The TUTOR file contains information about each professor.
History of DBMS
Here, are the important landmarks from the history of DBMS:
1960 – Charles Bachman designed the first DBMS system
1970 – Codd introduced IBM’S Information Management System (IMS)
1976- Peter Chen coined and defined the Entity-relationship model, also known as the ER
model
1980 – Relational Model becomes a widely accepted database component
1985- Object-oriented DBMS develops.
1990s- Incorporation of object-orientation in relational DBMS.
1991- Microsoft ships MS access, a personal DBMS, and that displaces all other personal
DBMS products.
1995: First Internet database applications
1997: XML applied to database processing. Many vendors begin to integrate XML into DBMS
products.
Characteristics of DBMS
The characteristics and properties of a Database Management System are as follows:
 Provides security and removes redundancy
 Self-describing nature of a database system
 Insulation between programs and data abstraction
 Support of multiple views of the data
 Sharing of data and multiuser transaction processing
 Database Management Software allows entities and relations among them to form tables.
 It follows the ACID concept ( Atomicity, Consistency, Isolation, and Durability).
 DBMS supports a multi-user environment that allows users to access and manipulate data in parallel.
DBMS vs. Flat File
DBMS Flat File Management System

Multi-user access It does not support multi-user access

Design to fulfill the need of small and large businesses It is only limited to smaller DBMS systems.

Remove redundancy and Integrity. Redundancy and Integrity issues

Expensive. But in the long term Total Cost of Ownership is It’s cheaper
cheap

Easy to implement complicated transactions No support for complicated transactions


Users of DBMS

Following are the various category of users of DBMS

Component Name Task


The Application programmers write programs in
Application Programmers various programming languages to interact with
databases.

Database Admin is responsible for managing the


Database Administrators entire DBMS system. He/She is called Database
admin or DBA.

The end users are the people who interact with the
database management system. They conduct
End-Users
various operations on databases like retrieving,
updating, deleting, etc.
Popular DBMS Software
• MySQL

• Microsoft Access

• Oracle

• PostgreSQL

• dBASE

• FoxPro

• SQLite

• IBM DB2

• LibreOffice Base

• MariaDB

• Microsoft SQL Server


Application of DBMS
• Railway Reservation System − The railway reservation system database plays a very important role by keeping
record of ticket booking, train’s departure time and arrival status and also gives information regarding train late to
people through the database.
• Library Management System − Now-a-days it’s become easy in the Library to track each book and maintain it
because of the database. This happens because there are thousands of books in the library. It is very difficult to keep a
record of all books in a copy or register. Now DBMS used to maintain all the information related to book issue dates,
name of the book, author and availability of the book.
• Banking − Banking is one of the main applications of databases. We all know there will be a thousand transactions
through banks daily and we are doing this without going to the bank. This is all possible just because of DBMS that
manages all the bank transactions.
• Universities and colleges − Now-a-days examinations are done online. So, the universities and colleges are
maintaining DBMS to store Student’s registrations details, results, courses and grade all the information in the
database. For example, telecommunications. Without DBMS there is no telecommunication company. DBMS is most
useful to these companies to store the call details and monthly postpaid bills.
• Credit card transactions − The purchase of items and transactions of credit cards are made possible only by DBMS.
A credit card holder has to know the importance of their information that all are secured through DBMS.
• Social Media Sites − By filling the required details we are able to access social media platforms. Many users
sign up daily on social websites such as Facebook, Pinterest and Instagram. All the information related to the
users are stored and maintained with the help of DBMS.
• Finance − Now-a-days there are lots of things to do with finance like storing sales, holding information and
finance statement management etc. these all can be done with database systems.
• Military − In military areas the DBMS is playing a vital role. Military keeps records of soldiers and it has so
many files that should be kept secure and safe. DBMS provides a high security to military information.
• Online Shopping − Now-a-days we all do Online shopping without wasting the time by going shopping with the
help of DBMS. The products are added and sold only with the help of DBMS like Purchase information, invoice
bills and payment.
• Human Resource Management − The management keeps records of each employee’s salary, tax and work
through DBMS.
• Manufacturing − Manufacturing companies make products and sell them on a daily basis. To keep records of
all those details DBMS is used.
• Airline Reservation system − Just like the railway reservation system, airlines also need DBMS to keep
records of flights arrival, departure and delay status.
Types of DBMS

The main Four Types of Database Management Systems are:


•Hierarchical database
•Network database
•Relational database
•Object-Oriented database
Hierarchical DBMS
• In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored Hierarchically (top-down
or bottom-up) format. Data is represented using a parent-child relationship. In Hierarchical DBMS, parents may
have many children, but children have only one parent.
Network Model
• The network database model allows each child to have multiple parents. It helps you to address the need to model
more complex relationships like the orders/parts many-to-many relationship. In this model, entities are organized
in a graph which can be accessed through several paths.
Relational Model
• Relational DBMS is the most widely used DBMS model because it is one of the easiest. This model is based on
normalizing data in the rows and columns of the tables. Relational model stored in fixed structures and manipulated
using SQL.
Object-Oriented Model
• In the Object-oriented Model data is stored in the form of objects. The structure is called classes which display data
within it. It is one of the components of DBMS that defines a database as a collection of objects that stores both data
members’ values and operations.
Advantages of DBMS
 DBMS offers a variety of techniques to store & retrieve data
 DBMS serves as an efficient handler to balance the needs of multiple applications using the same data
 Uniform administration procedures for data
 Application programmers are never exposed to details of data representation and storage.
 A DBMS uses various powerful functions to store and retrieve data efficiently.
 Offers Data Integrity and Security
 The DBMS implies integrity constraints to get a high level of protection against prohibited access to data.
 A DBMS schedules concurrent access to the data in such a manner that only one user can access the same data at a
time
 Reduced Application Development Time
Disadvantage of DBMS
DBMS may offer plenty of advantages, but it has certain flaws-

• The cost of Hardware and Software of a DBMS is quite high, which increases the budget of your organization.

• Most database management systems are often complex, so training users to use the DBMS is required.

• In some organizations, all data is integrated into a single database that can be damaged because of electric failure
or corruption in the storage media.

• Using the same program at a time by multiple users sometimes leads to data loss.

• DBMS can’t perform sophisticated calculations.


Database Architecture

A Database Architecture is a representation of DBMS design. It helps to design, develop, implement, and
maintain the database management system. A DBMS architecture allows dividing the database system into
individual components that can be independently modified, changed, replaced, and altered. It also helps to
understand the components of a database.
A Database stores critical information and helps access data quickly and securely. Therefore, selecting the correct
Architecture of DBMS helps in easy and efficient data management.

TYPES OF DBMS ARCHITECTURE


There are mainly three types of DBMS architecture:
• One Tier Architecture (Single Tier Architecture)
• Two Tier Architecture
• Three Tier Architecture
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire database. It defines
how the data is organized and how the relations among them are associated. It formulates all the constraints
that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of the
database, which can be depicted by means of schema diagrams. It’s the database designers who design the
schema to help programmers understand the database and make it useful.
A database schema can be divided broadly into two categories −
Physical Database Schema − This schema pertains to the actual storage of data and its form of storage like
files, indices, etc. It defines how the data will be stored in a secondary storage.
Logical Database Schema − This schema defines all the logical constraints that need to be applied on the
data stored. It defines tables, views, and integrity constraints.

DATABASE INSTANCE
It is important that we distinguish these two terms individually. Database schema is the skeleton of database.
It is designed when the database doesn't exist at all. Once the database is operational, it is very difficult to
make any changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a
valid state, by diligently following all the validations, constraints, and conditions that the database designers
have imposed.
Fields, Records, Table, View, Reports and Queries
• Table: In Relational database model, a table is a collection of data elements organized in terms of rows and
columns. A table is also considered as a convenient representation of relations. But a table can have duplicate row
of data while a true relation cannot have duplicate data. Table is the most simplest form of data storage. Below is
an example of an Employee table.

S. No Name Age Salary ($)


1 Adam 37 35000
2 Alex 25 15000
3 Stuart 28 19080
4 Ross 20 13000

• Fields: A Field consists of a grouping of characters. A data field represents an attribute ( a characteristic or quality)
of some entity (object, person, place, or event)
• Record: A single entry in a table is called a Tuple or Record or Row. A tuple in a table represents a set or related
data. For Example,

1 Alex 25 15000
•Attributes: A table consists of several records(row), each record can be broken down into several smaller parts of
data known as Attributes. The above Employee table consist of four attributes, ID, Name, Age and Salary.
•File: A group of related records is a File. Files are frequently classified by the application for which they are
primarily used.
•A primary key in a file is the field whose value identifies a record among others in a
data file.
•Database: It is an integrated collection of logically related records or files.
•View: A database view is a subset of a database and is based on a query that runs on one or more database tables.
Database views are saved in the database as named queries and can be used to save frequently used, complex
queries. Views are kind of virtual tables. A view also has rows and columns as they are in a real table in the
database. We can create a view by selecting fields from one or more tables present in the database. A View can
either have all the rows of a table or specific rows based on certain condition.
Student_Detail

STU_ID Name Address


1 Stephan Delhi
2 Rohan Noida
3 Vidisha Ghaziabad
4 Alina Gurugram

Student_Marks

STU_ID Name Marks Age


1 Stephan 97 19
2 Rohan 86 21
3 Vidisha 90 18
4 Alina 75 18
CREATE VIEW view_name ASview_name: Name for the View
table_name: Name of the table
SELECT column1, column2..... condition: Condition to select rows
FROM table_name STUDENT DETAILS
WHERE condition;

Creating View from single table


CREATE VIEW DetailsView STUDENT MARKS
AS SELECT NAME,ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
To see the data in the View, we can query the view in the same manner as we query a table.

SELECT * FROM DetailsView;

Name Address
Stephan Delhi
Rohan Noida
Vidisha Ghaziabad
Alina Gurugram
we will create a view named StudentNames from the
table StudentDetails. Query:
CREATE VIEW StudentNames
AS SELECT S_ID, NAME
FROM StudentDetails ORDER BY NAME;

SELECT * FROM StudentNames;


CREATE VIEW Student_Names
AS SELECT STU_ID, NAME
FROM StudentDetails ORDER BY NAME;

SELECT * FROM Student_Names;

STU_ID Name
1 Stephan
2 Rohan
3 Vidisha
4 Alina
Creating View from multiple tables:
CREATE VIEW MarksView
AS SELECT StudentDetails.NAME, StudentDetails.ADDRESS,
StudentMarks.MARKS
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;

To display data of View MarksView:

SELECT * FROM MarksView;


Name Address Marks
Stephan Delhi 97
Rohan Noida 86
Vidisha Ghaziabad 90
Alina Gurugram 75
DELETING VIEWS

DROP VIEW view_name; view_name: Name of the View which we want to delete.

For example, if we want to delete the View MarksView, we can do this as:

DROP VIEW MarksView;


UPDATING VIEWS
There are certain conditions needed to be satisfied to update a view. If any one of these conditions
is not met, then we will not be allowed to update the view.
1. The SELECT statement which is used to create the view should not include GROUP BY clause or
ORDER BY clause.
2. The SELECT statement should not have the DISTINCT keyword.
3. The View should have all NOT NULL values.
4. The view should not be created using nested queries or complex queries.
5. The view should be created from a single table. If the view is created using multiple tables then we
will not be allowed to update the view.
CREATE OR REPLACE VIEW view_name
AS SELECT column1,column2,..
FROM table_name
WHERE condition;

CREATE OR REPLACE VIEW MarksView


AS SELECT StudentDetails.NAME, StudentDetails.ADDRESS, StudentMarks.MARKS,
StudentMarks.AGE
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;

SELECT * FROM MarksView;


INSERT INTO view_name(column1, column2 , column3,..)
VALUES(value1, value2, value3..); view_name:
Name of the View

INSERT INTO DetailsView(NAME, ADDRESS)


VALUES("Suresh","Gurgaon");

SELECT * FROM DetailsView;


DELETE FROM view_name
WHERE condition;

view_name:Name of view from where we want to delete rows


condition: Condition to select rows

DELETE FROM DetailsView WHERE NAME="Suresh";


SELECT * FROM DetailsView;

You might also like