0% found this document useful (0 votes)
34 views59 pages

Unit 1

The document discusses database modeling and design, focusing on the distinctions between data, information, and metadata, as well as the limitations of file-based systems compared to database management systems (DBMS). It outlines the ANSI-SPARC three-level architecture, emphasizing data independence and the importance of mapping between different schema levels. Additionally, it covers the database design life cycle, including conceptual, logical, and physical data modeling, and highlights the significance of understanding user requirements in the design process.

Uploaded by

kashyapmehak2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views59 pages

Unit 1

The document discusses database modeling and design, focusing on the distinctions between data, information, and metadata, as well as the limitations of file-based systems compared to database management systems (DBMS). It outlines the ANSI-SPARC three-level architecture, emphasizing data independence and the importance of mapping between different schema levels. Additionally, it covers the database design life cycle, including conceptual, logical, and physical data modeling, and highlights the significance of understanding user requirements in the design process.

Uploaded by

kashyapmehak2008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Database Modelling

and Design
Unit 1
Data and Information and Metadata
• Data
Data can be viewed as raw material consisting of unorganized facts
about things, events, activities, and transactions.
Information
Information is data in context—that is, data that has been organized
into a specific context such that it has value to its recipient.
Information is not necessarily the “Truth” since the same data yields
different information based on the context; information is an inference
• Database: The database is a place where you put your data; data that
you wish to convert to information at some future time.
Database Management System: A DBMS is the software that converts
the data in your database to information. It is the DBMS that provides
you the capability for cross-referencing, correlating, sorting,
summarizing, etc.
Metadata
Metadata, in a database environment, is data that describes the
properties of data. It contains a complete definition or description of
database structure (i.e., the file structure, data type, and storage
format of each data item), and other constraints on the stored data.
Metadata may be characterized as follows:
• The lens to view data and infer information
• A precise definition of the context for framing the data
File Processing Systems
• A file system organizes files and folders on a disk drive. Files and folders are collections of data
that are stored together.
• You can think of a folder as a container for files; if you want to find a particular file, you look
inside the folder.
• You can use a file system to store almost anything. You can save documents, spreadsheets, images,
music, videos, and even games.
• File systems are not intended to be used for storing data. Instead, they’re designed to make it
easier to work with files and folders.
• File systems do not automatically update themselves. Every time you add a file or change the
name of an existing file, you need to manually update the file system.
Limitations of File-based
Approach
• Separation and isolation of data
• Each program maintains its own set of data. Users of one program may be
unaware of potentially useful data held by other programs.

• Duplication of data
• Same data is held by different programs. Wasted space and potentially
different values and/or different formats for the same item.
Limitations of File-based
Approach
• Data dependence
• File structure is defined in the program code.

• Incompatible file formats


• Programs are written in different languages, and so cannot easily access each
others files.
• Fixed Queries/Proliferation of application programs
• Programs are written to satisfy particular functions. Any new requirement
needs a new program.
Limitations of File-based
Approach
• Data Acess
• Concurrency
• Security
• Redundancy/Duplicacy
Database Approach
• Arose because:
• Definition of data was embedded in application programs, rather than being
stored separately and independently.
• No control over access and manipulation of data beyond that imposed by
application programs.

• Result - the database and Database Management System (DBMS).


Database
• A shared collection of logically related data (and a description of this
data), designed to meet the information needs of an organization.

• System catalog (data dictionary or metadata) provides the description


of the data to enable program–data independence.

• Logically related data comprises entities, attributes, and relationships


of an organization's information.
Database Management System
(DBMS)
• A software system that enables users to define, create, and maintain
the database and which provides controlled access to this database.
2 Tier and 3 Tier Architecture
• In 2-Tier there are only 2 layers
• Client layer (Application layer).
• Database server.
• Client Layer holds the application and communicates with the database. In our
case, it is the counter person. The user has no communication with the database.
Advantages
• Limited users. Only authorized person to communicate with the database.
• Maintenance is low.
Disadvantages
• Low Scalability.
• Low Security. The machine is interacting directly with the database
3 Tier
The client Layer is broken down into 2 layers
• 1st is the GUI or mobile application and 2nd is for the Business
process.
• The GUI Layer provides a graphical user interface for the End-user to
interact with the Database server.
• For the end-user, the GUI layer is the Database System, and the end-
user has no idea about the Client layer and the Database server.
Advantages
• Scalability.
• Improved security.
Disadvantages
• High maintenance.
ANSI-SPARC Three-level
Architecture
ANSI-SPARC ( American National
Standards Institute
Standards Planning and
Requirements Committee)
• The three schema architecture is also called ANSI/SPARC architecture
or three-level architecture.
• This framework is used to describe the structure of a specific
database system.
• The three schema architecture is also used to separate the user
applications and physical database.
• The three schema architecture contains three-levels. It breaks the
database down into three different categories.
Internal Schema/Level/View
• Schema: Description of Metadata
• View: a term that describes the information of interest to a user or a
group of users, where a user can be either an end user or a programmer.
• The internal level has an internal schema which describes the physical
storage structure of the database.
• The internal schema is also known as a physical schema.
• It uses the physical data model. It is used to define that how the data will
be stored in a block.
• The physical level is used to describe complex low-level data structures in
detail.
Internal Schema
• The internal level is generally is concerned with the following activities:
• Storage space allocations.
For Example: B-Trees, Hashing etc.
• Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and sequencing.
• Data compression and encryption techniques.
• Optimization of internal structures.
• Representation of stored fields.
Conceptual Level/Schema

• The conceptual schema describes the design of a database at the


conceptual level. Conceptual level is also known as logical level.
• The conceptual schema describes the structure of the whole
database.
• The conceptual level describes what data are to be stored in the
database and also describes what relationship exists among those
data.
• In the conceptual level, internal details such as an implementation of
the data structure are hidden.
• Programmers and database administrators work at this level.
External Level/Schema

• At the external level, a database contains several schemas that sometimes called
as subschema. The subschema is used to describe the different view of the
database.
• An external schema is also known as view schema.
• Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
• The view schema describes the end user interaction with database systems.
Objectives of Three-Level
Architecture Proposed by ANSI-SPARC
• DBA should be able to change database storage structures without affecting the
users' views.

• Internal structure of database should be unaffected by changes to physical aspects


of storage.

• DBA should be able to change conceptual structure of database without affecting


all users.
Data Independence/ Data
Abstraction
The very purpose of a three-schema architecture is to enable data
independence.
The concept of data independence is that when a schema at a lower level is
changed, the higher-level schemas themselves are unaffected by such
changes.
In other words, when a change is made to storage structure or access
strategy in the internal schema, there will be no need to make any changes
in the conceptual or external schemas
only the mapping information—i.e., transforming requests and results
between levels of schema—between a schema and higher-level schemas
need to be changed. Only then can it be said that data independence is fully
supported.
Mapping between Views

• The three levels of DBMS architecture don't exist independently of each other.
There must be correspondence between the three levels i.e. how they actually
correspond with each other. DBMS is responsible for correspondence between the
three types of schema. This correspondence is called Mapping.
• There are basically two types of mapping in the database architecture:
• Conceptual/ Internal Mapping
• External / Conceptual Mapping
• Conceptual/ Internal Mapping
• The Conceptual/ Internal Mapping lies between the conceptual level and the
internal level. Its role is to define the correspondence between the records and
fields of the conceptual level and files and data structures of the internal level.
• External/ Conceptual Mapping
• The external/Conceptual Mapping lies between the external level and the
Conceptual level. Its role is to define the correspondence between a particular
external and the conceptual view.
Database System
• A system is generally defined as a set of interrelated components working together for
some purpose.
• A database system is a self-describing collection of interrelated data. A database
system includes data and metadata. Here are the properties of a database system:
• Data consists of recorded facts that have implicit meaning.
• Viewed through the lens of metadata, the meaning of recorded data becomes explicit.
• A database is self-describing in that the metadata is recorded within the database, not
in application programs.
• A database is a collection of files whose records are logically related to one another. In
contrast with that of a file-processing system, integration of data as needed is the
responsibility of the DBMS software instead of the programmer.
• Embedded pointers and various forms of indexes exist in the database system to
facilitate access to the data.
Type of Database System
• A single-user database system supports only one user at a time. In other words, if
user A is using the database, users B and C must wait until user A has completed
his or her database work.
• A multi-user database system supports multiple users concurrently. If the multi-
user database supports a relatively small number of users (usually fewer than 50)
or a specific workgroup within an organization, it is called a workgroup database
system.
• If the database is used by the entire organization and supports many users (more
than 50, usually hundreds) across many locations, the database is known as an
enterprise database system.
• A distributed database (DDB) is a collection of multiple logically interrelated
databases that may be geographically dispersed over a computer network.
Components of DBMS
• A database management system (DBMS) is a collection of general-purpose
software that facilitates the processes of defining, constructing, and manipulating
a database.
• The major components of a DBMS include:
• One or more query languages
• Tools for generating reports
• Facilities for providing security, integrity, backup, and recovery
• A data manipulation language for accessing the database
• A data definition language used to define the structure of data
• Structured Query Language (SQL) plays an integral role in each of these
components.
• SQL is used in the data definition language (DDL) for creating the structure of
database objects such as tables, views, and synonyms.
• SQL statements are also generated by programming languages used to build
reports in order to access data from the database.
• People involved in the data administration function use data control languages
(DCLs) that make use of SQL statements to-
(a) Control the resource locking required in a multi-user environment
(b) Facilitate backup and recovery from failures
(c) Provide the security required to ensure that users access only the data that they
are authorized to use
• Data manipulation languages (DMLs) facilitate the retrieval, insertion, deletion,
and modification of data in a database.
• SQL is the most well-known nonprocedural DML and can be used to specify
many complex database operations in a concise manner.
• The access routines handle database access at run time by passing requests to the
file manager of the operating system to retrieve data from the physical files of the
database.
• Dictionary is a reference book that provides information about the form, origin,
function, meaning, and syntax of words, a data dictionary in a DBMS
environment stores metadata that provides such information as the definitions of
the data items and their relationships, authorizations, and usage statistics.
• The DBMS makes use of the data dictionary to look up the required data
component structures and relationships, thus relieving application developers (end
users and programmers) from having to incorporate data structures and
relationships in their applications.
• Any changes made to the physical structure of the database are automatically
recorded in the data dictionary.
• The data repository is a collection of metadata about data models and application
program interfaces.
• CASE (computer-aided software engineering) tools such as Oracle Designer and
ERWIN that are used for developing a conceptual/logical schema interact with the
data repository and are independent of the database and the DBMS. 1.5.3
Advantages of Database System
Data Models and Database
design Life cycle
• A data model is used to represent real-world phenomena.
• As simplified abstractions of reality, data models enable better understanding of
data specifications, such as data types, relationships, and constraints gathered
from user requirements.
• A data model be a complete and accurate reflection of the data requirements of a
database system.
• A database represents some aspect of the real world called the universe of
interest. For a small organization, the universe of interest may be all
functionalities of the company (marketing, finance, accounting, production,
human resources, and so on).
Requirements specification
• During this step, systems analysts review existing documents and systems and
interview prospective users in an effort to identify the objectives to be supported
by the database system. The output of the requirements specification activity is a
set of data and process specifications.
• In order to define the data requirements, one needs to know the process
requirements— that is, what is going to be done with the data.
• For example, suppose a company is going to sell a product.
• What processes are involved?
• When a company sells a product, it bills the customers who purchase the product.
Then, shipping has to be notified to dispatch the product to the customer.
• Shipping also has to check the inventory and make sure that inventory levels are
adjusted as a result of sales.
• The inventory system must make sure that inventory levels are optimal and,
accordingly, replenish inventory periodically.
• Data is required in order to accomplish processes such as those just mentioned.
• Customers’ names, addresses, and telephone numbers are needed for billing
purposes.
• For shipping, a shipping address is required.
• In the inventory system one needs to know in which warehouse and in which bin a
particular product is located.
• One also needs to know quantity on hand, quantity on order, and the lead time
required by the supplier to fill an order.
• In short, data and process requirements go hand-in-hand.
• Data modeling and database design follow a life cycle that includes
three tiers:
• Conceptual data modeling
• Logical data modeling
• Physical data modeling.
Conceptual Data Modeling
• The conceptual data model describes the structure of the data to be stored in the database
without specifying how and where it will be physically stored or the methods used to retrieve it.
• The conceptual design activity is technology independent.
• During the conceptual design, the focus should be on capturing the user-specified business
rules in all their richness, unconstrained by the boundaries of the anticipated technology or
DBMS product that will be used for implementation.
• The product of the conceptual design activity is the conceptual schema.
• Several conceptual data modeling methods exist (for example, ER modeling and NIAM
modeling), each with its own specific grammar.
• . Entity relationship (ER) modeling is a “design by analysis” modeling approach and is top-
down in nature, while NIAM (Nijssen Information Analysis Methodology) modeling is a
“design by synthesis” approach and is bottom-up in nature.
• The ER modeling technique is used due to its significant popularity in the database design
sphere.
• The conceptual model is portrayed in two progressive layers: the Presentation Layer ER model
and the Design-Specific ER model.
Logical Data Modeling
• Need:
The technology-independent orientation of the conceptual design activity, it is
possible that the conceptual schema may contain constructs not directly compatible
with the technology intended for implementation.
It is also possible that some of the design may require refinement to eliminate data
redundancy problems. Transforming a conceptual schema to a schema more
compatible with the implementation technology of choice becomes necessary.
• The second tier of the data modeling activity is called logical design. The product
of the logical design activity is the logical schema.
• It is typically modeled using the hierarchical, network, or relational architecture.
Physical Data Modeling
• Here, the major task is to determine the internal storage structure and access
strategies for the database.
• While innovations in storage technology abound in the marketplace, the robust
conceptual strategies for storage structures/architectures remain the ideas of
parallelizing disk access employing what is popularly known as RAID
(Redundant Array of Independent Disks) access.
• The concept is centered on using a large array of independent disk storage devices
to act as a single high-performance “logical disk.” Parallelism across the array of
disks is achieved using a technique called “data striping.
• The transition from a logical schema to a physical design entails an intermediate
step of transforming the logical schema to a database language. In the relational
database architecture, this language is called SQL.
• The physical design activity is fully technology dependent.
• Physical design involves using the tools of a particular DBMS product to create
the database and to design and develop applications that address the high-level
requirements of the universe of interest.
• The objective here is :
(a) Developing an appropriate structure for the database
(b) keeping focus on performance while determining the physical structure for the
database.
A good physical database design is impossible without the database designer
understanding the “job mix” for the particular application environment—that is, the
mix of transactions, queries, applications, etc.
Conceptual Data Modelling
The ER modeling grammar for conceptual modeling serves two major
purposes:
• As a communication/presentation device used by an analyst to
interact with the end-user community (the Presentation Layer ER
model/schema)
• As a design tool at the highest level of abstraction to convey a deeper-
level understanding to the database designer (the Design-Specific ER
model/ schema)
• An ER model includes
(1) An ER diagram (ERD) portraying entity types, attributes for each
entity type, and relationships among entity types
(2) Semantic integrity constraints that reflect the business rules about
data not captured in the ERD
The Presentation Layer ER
Model
• This layer of the ER model serves the principal purpose of communicating with
the end-user community.
• The Presentation Layer ERD is a surface-level expression of the application
domain, and the semantic integrity constraints are a reiteration of the business
rules that are not captured in the Presentation Layer ERD.
• As a high-level diagrammatic portrayal of the application domain, an ERD is not
capable of capturing some of the finer business rules that are part of the data
requirements.
• The specification of constraints is the mechanism to record the business rules not
captured in the ERD.
• Together, the ERD and the semantic integrity constraints must preserve all the
information conveyed in the data requirements of the application.
• The ERD coupled with the semantic integrity constraints represents the
conceptual schema for the application, and because the ER modeling grammar in
this case expresses the conceptual schema, the resulting script is referred to as the
ER model/schema.

You might also like