0% found this document useful (0 votes)
273 views13 pages

Chapter 5: File Organization

This document discusses different types of file organization, including heap, sorted, indexed, and hashed. It provides details on sequential, direct access, and indexed sequential file organization. Sequential organization stores records in order but searching is slow. Direct access allows random searching but is more expensive. Indexed sequential combines the two for faster sequential and random access. The document also covers topics like record storage, clustering for related data, and the goals of optimal access, efficient updates and storage.

Uploaded by

Javed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
273 views13 pages

Chapter 5: File Organization

This document discusses different types of file organization, including heap, sorted, indexed, and hashed. It provides details on sequential, direct access, and indexed sequential file organization. Sequential organization stores records in order but searching is slow. Direct access allows random searching but is more expensive. Indexed sequential combines the two for faster sequential and random access. The document also covers topics like record storage, clustering for related data, and the goals of optimal access, efficient updates and storage.

Uploaded by

Javed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DSM College Parbhani Chapter 5 : File Organization

Sr. Chapter 5 : File organization


No.
1. Introduction
2. Organization of records in file
3. Types of file organization (Heap, Sorted, Indexed & Hashed

Introduction
 File organization refers to the relationship of the key of the record to the
physical location of that record in the computer file.
 File organization may be either physical file or a logical file. A physical file
is a physical unit, such as magnetic tape or a disk.
 A logical file on the other hand is a complete set of records for a specific
application or purpose.
 A logical file may occupy a part of physical file or may extend over more
than one physical file.
 There are three types of organizing the file:

1. Sequential access file organization


2. Direct access file organization
3. Indexed sequential access file organization

1. Sequential access file organization:


 Storing and sorting in contiguous block within files on tape or disk is called
as sequential access file organization.
 In sequential access file organization, all records are stored in a sequential
order. The records are arranged in the ascending or descending order of a
key field.
 Sequential file search starts from the beginning of the file and the records
can be added at the end of the file.
 In sequential file, it is not possible to add a record in the middle of the file
without rewriting the file.
Advantages of sequential file

 It is simple to program and easy to design.


 Sequential file is best use if storage space.
Shaikh Ejaz Ahmed M.Sc (CS)Page 1
DSM College Parbhani Chapter 5 : File Organization

Disadvantages of sequential file


 Sequential file is time consuming process.
 It has high data redundancy.
 Random searching is not possible.

2. Direct access file organization


 Direct access file is also known as random access or relative file
organization.
 In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout the
file.
 The records does not need to be in sequence because they are updated
directly and rewritten back in the same location.
 This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
 It is also called as hashing.
Advantages of direct access file organization
 Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
 In direct access file, sorting of the records are not required.
 It accesses the desired records immediately.
 It updates several files quickly.
 It has better control over record allocation.
Disadvantages of direct access file organization
 Direct access file does not provide back up facility.
 It is expensive.
 It has less storage space as compared to sequential file.

3. Indexed sequential access file organization


 Indexed sequential access file combines both sequential file and direct access
file organization.
 In indexed sequential access file, records are stored randomly on a direct
access device such as magnetic disk by a primary key.
 This file have multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.

Shaikh Ejaz Ahmed M.Sc (CS)Page 2


DSM College Parbhani Chapter 5 : File Organization

 The data can be access either sequentially or randomly using the index. The
index is stored in a file and read into memory when the file is opened.
Advantages of Indexed sequential access file organization
 In indexed sequential access file, sequential file and random file access is
possible.
 It accesses the records very fast if the index table is properly organized.
 The records can be inserted in the middle of the file.
 It provides quick access for sequential and direct processing.
 It reduces the degree of the sequential search.
Disadvantages of Indexed sequential access file organization
 Indexed sequential access file requires unique keys and periodic
reorganization.
 Indexed sequential access file takes longer time to search the index for the
data access or retrieval.
 It requires more storage space.
 It is expensive because it requires special software.
 It is less efficient in the use of storage space as compared to other file
organizations.

Organization of records in file

 There are several ways of organizing records in files.


1. heap file organization. Any record can be placed anywhere in the file
where there is space for the record. There is no ordering of records.
2. sequential file organization. Records are stored in sequential order, based
on the value of the search key of each record.
3. hashing file organization. A hash function is computed on some attribute of
each record. The result of the function specifies in which block of the file the
record should be placed -- to be discussed in chapter 11 since it is closely
related to the indexing structure.
4. clustering file organization. Records of several different relations can be
stored in the same file. Related records of the different relations are stored
on the same block so that one I/O operation fetches related records from all
the relations.

Sequential File Organization

Shaikh Ejaz Ahmed M.Sc (CS)Page 3


DSM College Parbhani Chapter 5 : File Organization

1. A sequential file is designed for efficient processing of records in sorted


order on some search key.
o Records are chained together by pointers to permit fast retrieval in
search key order.
o Pointer points to next record in order.
o Records are stored physically in search key order (or as close to this
as possible).
o This minimizes number of block accesses.
o Figure 10.15 shows an example, with bname as the search key.
2. It is difficult to maintain physical sequential order as records are inserted
and deleted.
o Deletion can be managed with the pointer chains.
o Insertion poses problems if no space where new record should go.
o If space, use it, else put new record in an overflow block.
o Adjust pointers accordingly.
o Figure 10.16 shows the previous example after an insertion.
o Problem: we now have some records out of physical sequential order.
o If very few records in overflow blocks, this will work well.
o If order is lost, reorganize the file.
o Reorganizations are expensive and done when system load is low.
3. If insertions rarely occur, we could keep the file in physically sorted order
and reorganize when insertion occurs. In this case, the pointer fields are no
longer required.

Clustering File Organization

1. One relation per file, with fixed-length record, is good for small databases,
which also reduces the code size.
2. Many large-scale DB systems do not rely directly on the underlying
operating system for file management. One large OS file is allocated to DB
system and all relations are stored in one file.
3. To efficiently execute queries involving , one may store
the depositor tuple for each cname near the customer tuple for the
corresponding cname, as shown in Figure 10.19.
4. This structure mixes together tuples from two relations, but allows for
efficient processing of the join.
5. If the customer has many accounts which cannot fit in one block, the
remaining records appear on nearby blocks. This file structure,
called clustering, allows us to read many of the required records using one
block read.

Shaikh Ejaz Ahmed M.Sc (CS)Page 4


DSM College Parbhani Chapter 5 : File Organization

6. Our use of clustering enhances the processing of a particular join but may
result in slow processing of other types of queries, such as selection on
customer.

Types of file organization (Heap, Sorted, Indexed & Hashed)


 In a database we have lots of data. Each data is grouped into related groups
called tables. Each table will have lots of related records.
 Any user will see these records in the form of tables in the screen.
 But these records are stored as files in the memory. Usually one file will
contain all the records of a table.
 In order to access the contents of the files records in the physical memory, it
is not that easy.
 They are not stored as tables there and our SQL queries will not work.
 We need some accessing methods. To access these files, we need to store
them in certain order so that it will be easy to fetch the records.
 It is same as indexes in the books, or catalogues in the library, which helps
us to find required topics or books respectively.
 Storing the files in certain order is called file organization. The main
objective of file organization is
1. Optimal selection of records i.e.; records should be accessed as fast as
possible.
2. Any insert, update or delete transaction on records should be easy,
quick and should not harm other records.
3. No duplicate records should be induced as a result of insert, update or
delete.
4. Records should be stored efficiently so that cost of storage is minimal.
 There are various methods of file organizations.
 These methods may be efficient for certain types of access/selection
meanwhile it will turn inefficient for other selections.
 Hence it is up to the programmer to decide the best suited file organization
method depending on his requirement.

Heap file organization


 This is the simplest form of file organization.
 Here records are inserted at the end of the file as and when they are inserted.
There is no sorting or ordering of the records.

Shaikh Ejaz Ahmed M.Sc (CS)Page 5


DSM College Parbhani Chapter 5 : File Organization

 Once the data block is full, the next record is stored in the new block.
 This new block need not be the very next block.
 This method can select any block in the memory to store the new records.
 It is similar to pile file in the sequential method, but here data blocks are not
selected sequentially. They can be any data blocks in the memory.
 It is the responsibility of the DBMS to store the records and manage them.

 If a new record is inserted, then in the above case it will be inserted into data
block 1.

 When a record has to be retrieved from the database, in this method, we


need to traverse from the beginning of the file till we get the requested
record.

Shaikh Ejaz Ahmed M.Sc (CS)Page 6


DSM College Parbhani Chapter 5 : File Organization

 Hence fetching the records in very huge tables, it is time consuming.


 This is because there is no sorting or ordering of the records. We need to
check all the data.
 Similarly if we want to delete or update a record, first we need to search for
the record.
 Again, searching a record is similar to retrieving it- start from the beginning
of the file till the record is fetched. If it is a small file, it can be fetched
quickly.
 But larger the file, greater amount of time needs to be spent in fetching.
 In addition, while deleting a record, the record will be deleted from the data
block.
 But it will not be freed and it cannot be re-used. Hence as the number of
record increases, the memory size also increases and hence the efficiency.
 For the database to perform better, DBA has to free this unused memory
periodically.
Advantages of Heap File Organization
1. Very good method of file organization for bulk insertion. i.e.; when there is
a huge number of data needs to load into the database at a time, then this
method of file organization is best suited.
2. They are simply inserted one after the other in the memory blocks.
3. It is suited for very small files as the fetching of records is faster in them. As
the file size grows, linear search for the record becomes time consuming.

Disadvantages of Heap File Organization


1. This method is inefficient for larger databases as it takes time to
search/modify the record.
2. Proper memory management is required to boost the performance.
Otherwise there would be lots of unused memory blocks lying and memory
size will simply be growing.

Sorted file Organization


 It is one of the simple methods of file organization. Here each file/records
are stored one after the other in a sequential manner. This can be achieved in
two ways:
 Records are stored one after the other as they are inserted into the tables.
This method is called pile file method.
 When a new record is inserted, it is placed at the end of the file.
Shaikh Ejaz Ahmed M.Sc (CS)Page 7
DSM College Parbhani Chapter 5 : File Organization

 In the case of any modification or deletion of record, the record will be


searched in the memory blocks.
 Once it is found, it will be marked for deleting and new block of record is
entered.

Inserting a new record:

 In the diagram above, R1, R2, R3 etc are the records.


 They contain all the attribute of a row. i.e.; when we say student record, it
will have his id, name, address, course, DOB etc. Similarly R1, R2, R3 etc
can be considered as one full set of attributes.

 In the second method, records are sorted (either ascending or descending)


each time they are inserted into the system.
 This method is called sorted file method. Sorting of records may be based
on the primary key or on any other columns.
 Whenever a new record is inserted, it will be inserted at the end of the file
and then it will sort – ascending or descending based on key value and
placed at the correct position.
 In the case of update, it will update the record and then sort the file to place
the updated record in the right place. Same is the case with delete.

Shaikh Ejaz Ahmed M.Sc (CS)Page 8


DSM College Parbhani Chapter 5 : File Organization

Inserting a new record:

Adventages

 Thedesign is very simple compared other file organization. There is no


much effort involved to store the data.

 When there are large volumes of data, this method is very fast and efficient.
This method is helpful when most of the records have to be accessed like
calculating the grade of a student, generating the salary slips etc where we
use all the records for our calculations
 This method is good in case of report generation or statistical calculations.
 These files can be stored in magnetic tapes which are comparatively cheap.

Disadvantages:

 Sorted file method always involves the effort for sorting the record. Each
time any insert/update/ delete transaction is performed, file is sorted.
 Hence identifying the record, inserting/ updating/ deleting the record, and
then sorting them always takes some time and may make system slow.
Indexed file Organization
 In this file organization, the records of the file are stored one after another in
the order they are added to the file.

Shaikh Ejaz Ahmed M.Sc (CS)Page 9


DSM College Parbhani Chapter 5 : File Organization

 In the unit "Introduction to direct access files", we showed how an Indexed


file is organized and we noted;
 that the records in the Indexed file are held sequenced on ascending primary
key and that this allows us to access the file sequentially on that key.
 that over these records the file system builds an index which allows direct
access to record using the primary key.
 that an Indexed file may be read sequentially on any of its alternate keys.
 While we explained how primary key index of an Indexed file was
organized, and how sequential access on the primary key is achieved, we did
not explain how the alternate indexes are organized or how the file can be
accessed sequentially on any of its alternate keys.
 In this section we revisit, and expand on, the explanation of how the primary
key index is organized and how it is used to read a record directly.
 In addition, we show an alternate key index is arranged and we show how
this index is used to read the file directly.
 We also describe how sequential access on the alternate key is achieved.
 An indexed file contains records ordered by a record key.
 A record key uniquely identifies a record and determines the sequence in
which it is accessed with respect to other records.
 Each record contains a field that contains the record key.
 A record key for a record might be, for example, an employee number or an
invoice number.
 An indexed file can also use alternate indexes, that is, record keys that let
you access the file using a different logical arrangement of the records.
 For example, you could access a file through employee department rather
than through employee number.
 The possible record transmission (access) modes for indexed files are
sequential, random, or dynamic.
 When indexed files are read or written sequentially, the sequence is that of
the key values.
 EBCDIC consideration: As with any change in the collating sequence, if
your indexed file is a local EBCDIC file, the EBCDIC keys will not be
recognized as such outside of your COBOL program.
 For example, an external sort program, unless it also has support for
EBCDIC, will not sort records in the order that you might expect.

Hashed file Organization


 In this method of file organization, hash function is used to calculate the
address of the block to store the records.

Shaikh Ejaz Ahmed M.Sc (CS)Page 10


DSM College Parbhani Chapter 5 : File Organization

 The hash function can be any simple or complex mathematical function.


 The hash function is applied on some columns/attributes – either key or non-
key columns to get the block address.
 Hence each record is stored randomly irrespective of the order they come.
Hence this method is also known as Direct or Random file organization.
 If the hash function is generated on key column, then that column is called
hash key, and if hash function is generated on non-key column, then the
column is hash column.

 When a record has to be retrieved, based on the hash key column, the
address is generated and directly from that address whole record is retrieved.
 Here no effort to traverse through whole file. Similarly when a new record
has to be inserted, the address is generated by hash key and record is directly
inserted.
 Same is the case with update and delete. There is no effort for searching the
entire file nor sorting the files. Each record will be stored randomly in the
memory.

Shaikh Ejaz Ahmed M.Sc (CS)Page 11


DSM College Parbhani Chapter 5 : File Organization

 These types of file organizations are useful in online transaction systems,


where retrieval or insertion/updation should be faster.

Advantages of Hash File Organization

 Records need not be sorted after any of the transaction. Hence the effort of
sorting is reduced in this method.
 Since block address is known by hash function, accessing any record is very
faster. Similarly updating or deleting a record is also very quick.

 This method can handle multiple transactions as each record is independent


of other. i.e.; since there is no dependency on storage location for each
record, multiple records can be accessed at the same time.
 It is suitable for online transaction systems like online banking, ticket
booking system etc.
Disadvantages of Hash File Organization
 This method may accidentally delete the data. For example, In Student table,
when hash field is on the STD_NAME column and there are two same
names – ‘Antony’, then same address is generated. In such case, older
record will be overwritten by newer.
 So there will be data loss. Thus hash columns needs to be selected with
utmost care. Also, correct backup and recovery mechanism has to be
established.
 Since all the records are randomly stored, they are scattered in the memory.
Hence memory is not efficiently used.
 If we are searching for range of data, then this method is not suitable.
Because, each record will be stored at random address.
 Hence range search will not give the correct address range and searching
will be inefficient. For example, searching the employees with salary from
20K to 30K will be efficient.
 Searching for records with exact name or value will be efficient. If the
Student name starting with ‘B’ will not be efficient as it does not give the
sexact name of the student.
 If there is a search on some columns which is not a hash column, then the
search will not be efficient. This method is efficient only when the search

Shaikh Ejaz Ahmed M.Sc (CS)Page 12


DSM College Parbhani Chapter 5 : File Organization

is done on hash column. Otherwise, it will not be able find the correct
address of the data.
 If there is multiple hash columns – say name and phone number of a person,
to generate the address, and if we are searching any record using phone or
name alone will not give correct results.
 If these hash columns are frequently updated, then the data block address is
also changed accordingly. Each update will generate new address. This is
also not acceptable.
 Hardware and software required for the memory management are costlier in
this case. Complex programs needs to be written to make this method
efficient.

Shaikh Ejaz Ahmed M.Sc (CS)Page 13

You might also like