UNIT VI- File Organization
File is a collection of records related to each other.
File organization refers to the way data is stored in a file. File organization is
very important because it determines the methods of access, efficiency, and
storage devices to use.
Factors affecting file organizations:
• Fast data retrieval
• High throughput for processing data input and maintenance transactions,
• Efficient use of storage space,
• Protection from failures or data loss,
• Minimizing need for reorganization,
• Accommodating growth
• Security from unauthorized use.
Types of File Organizations:
1. Sequential access file organization
2. Direct access file organization
3. Indexed sequential access file organization
1. Sequential File Organization:
This method is the easiest method for file organization. In this method, records are
stored sequentially. In sequential file, it is not possible to add a record in the middle of
the file without rewriting the file.This method can be implemented in two ways:
1. Pile File Method:
It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the sequence they come.
Suppose we want to insert a new record in the sequence, then it will be placed at the
end of the file.
2. Sorted File Method:
o In this method, the new record is always inserted at the file's end, and then it will
sort the sequence in ascending or descending order. Sorting of records is based
on any primary key or any other key.
Advantages of sequential file organization
• It is simple in design. It requires no much effort to store the data.
• If we use pile file method then insertion is quick.
• If we use sorted file method then searching could be done quickly using binary
search on ordering field.
Disadvantages of sequential file organization
• It will waste time as we cannot jump on a particular record that is required but
we have to move sequentially which takes our time.
• Sorted file method takes more time and space for sorting the records.
• Insertion and deletion is expensive as we need to move large amount of data.
• When we want to modify record ordered file it ‘s position may get change.
2. Direct access file organization
Direct access file is also known as random access or relative file organization.
In direct access file, all records are stored in direct access storage device such as
hard disk.
It could be implemented using hashing technique using hash function for storing
records at random positions.
The records are randomly placed throughout the file at calculated positions.
The records does not need to be in sequence because they are updated directly
and rewritten back in the same location.
This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases
Advantages of direct access file organization
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• Offers flexibility as the memory can be accessed in any order, not just sequentially.
Disadvantages of direct access file organization
• It is complex and difficult to implement
It requires a complex hardware architecture to support direct access to the
memory.
3.Index Sequential File Organization:
An indexed sequential file consists of records that can be accessed sequentially.
Direct access is also possible.
We need to maintain 2 Files −
1. Data File contains records in sequential scheme.
2. Index File contains the primary key and its address in the data file.
• Each record in index file consist of 2 fields , a key field and pointer field which
points to location of record in the main file.
• To find a particular record we have to search the index file for desired key and it
can be done using Binary search. Once we get desired key we can access actual
record from data file with the help of pointer.
• Index file is ordered based on primary key but data file is not in sorted order.
Advantages of index sequential access file organization :
• Efficient for both sequential and random access: Indexed sequential files allow
for efficient sequential access as well as direct access to specific records through
the use of an index.
• In this method, each record in Index file has the address of its data block. So,
searching a record in a huge database is quick and easy.
Disadvantages of index sequential access file organization :
• This method requires extra space in the disk to store the index value.
• When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
• When the record is deleted, then the space used by it needs to be released.
Otherwise, the performance of the database will slow down.
Difference between Sequential ,Index Sequential and Direct Access File
organizations:
Direct Access/Relative
Sequential files Indexed files files
These files can be accessed These files can be
These files can be
sequentially as well as accessed randomly with
accessed only
randomly with the help of the help of their relative
sequentially.
the record key. record number.
The records are stored
The records are stored based on the value of the The records are stored
sequentially. RECORD-KEY which is by their relative address.
the part of the data.
Records cannot be
It is possible to store the The records can be
deleted and can only be
records in the middle of the inserted at any given
stored at the end of the
file. position.
file.
It occupies less space
as the records are stored It occupies more space. It occupies more space.
in continuous order.
It provides slow access, It also provides slow
It provides fast access as
as in order to access any access(but is fast as
provides the record key
record all the previous compared to sequential
compared to the other
records are to be access) as it takes time to
two.
accessed first. search for the index.
Direct Access/Relative
Sequential files Indexed files files
In Indexed file
In Sequential file organization, the records In Relative file
organization, the records are written in sequential organization, the records
are read and written in order but can be read in are written and read in
sequential order. sequential as well as random order.
random order.
There is no need to
One or more KEYS can be Only one unique KEY is
declare any KEY for
created for storing and declared for storing and
storing and accessing
accessing the records. accessing the records.
the records.
File Opening modes in C++:
In C++, for every file operation, exists a specific file mode. These file modes allow us
to create, read, write, append or modify a file. The file modes are defined in the
class ios. Following are different modes in which we could open a file on disk.
• Fil File
Opening
modes in Description
C++:
e Modes
ios::in Searches for the file and opens it in the read mode (only if
the file is found).
• Searches for the file and opens it in the write mode.
• If the file is found, its content is overwritten.
ios::out • If the file is not found, a new file is created.
• Allows you to write to the file.
• Searches for the file and opens it in the append mode
ios::app i.e. this mode allows you to append new data to the
end of a file.
• If the file is not found, a new file is created.
Searches for the file, opens it and positions the pointer
at the end of the file by default.
After opening a file, it allows you to modify the
ios::ate
content of a file by placing file pointer at required
position
• Searches for the file and opens it to truncate or deletes
"ios::trunc" all of its content(if the file is found.)
C++ offers us the following operations in File Handling:
open():Creating a file
read():**Reading data
write():Writing new data
close():Closing a file
LINKED ORGANISATION:
Linked organization differs from the sequential organizations essentially
in the logical sequence of records is generally different from the physical sequence. In
a sequential organization if the I th record of the file is at Li then the(i+1)th record will
be at Li+c, where c is the size of the ith
record. In linked organization the next record is obtained by following the
link value from the present record. Linking records together by the primary key,
facilitates the deletion and insertion of records once the place at which insertion and
deletion to be made is known. An index with ranges of emp numbers can be
maintained to facilitate searching based on emp numbers. For example, ranges for emp
numbers 501-700, 701-900, 901-1100 can be created for the EMPLOYEE table given
in Table 1. All records having emp no in the same range can be linked together as
shown in the following figure:
Upper Value
700 Record B Record E
900
Record D
Record A
1100 Record C
Table 1
Multi-Key File Organization:
When a file records accessed based on more than one key are called as Multikey file
organization.
E.g. In banking system we keep records of accounts in file.
Now account holder needs account information which can be access through account
no. while loan officer needs account records with a given value of overdue limit.
Basically there are 2 approaches for implementing multi key organization.
1.Inverted File Organization:
In this file organization a key’s inversion index contain all of the values that the key
presently has in the records of the data file.
Each key-value entry in the inversion index points to all of the data records that have
the corresponding value. The data file is said to be inverted on that key.
Inverted files are sorted on inversion index so that binary search can be applied to find
out index of record.
Whenever record is added in data file its corresponding entry has to be made in
inverted file.
2.Multi List File Organization:
In multi list file organization the index contain all values that the secondary key has in
data file same as inverted file but the difference is that the entry in the multi index for
a secondary key value is pointer to the first data record with that key value. That data
record contains pointer to second record having same key. Thus there is a linked list of
data records for each value of secondary key. Multi list chains usually are bidirectional
and occasionally are circular to improve update operation.
• Examples of inverted File and Multi-list file are as shown below:
Cellular Partitions:
In order to reduce file search time, storage media may be divided into cells.
A cell may be an entire disk or it could be a cylinder.
Lists are localized to lie within a cylinder.
If we have mulitilist organization in which the list for particular key contains records
from several different cylinders then divide the list into several smaller lists where
each list contains all the records from the same cylinder.
This reduces the search time significantly.
Primitive operations on sequential File:
1.Create ( ) 2.Display() 3.Add() 4.Delete() 5.Search()
1. Algorithm for create function
S1: Open the file in the write mode, if the file specified is not found or unable to open then
display error message and go to step5, else go to step2.
S2: Read the no: of records N to be inserted to the file.
S3: Repeat the step4 N number of times.
S4: Read the details for record from the keyboard and write the same to the file.S5:
Close the file.
2. Algorithm for displaying all records
S1: Open the specified file in read mode.
S2: If the file is unable to open or not found then display error message and go to step 4
else go to Step 3
S3: Scan all the records one by one from file and display the same at the console
until end of file is reached.
S4: Close the file
S5: Return to the main function
3. Algorithm for add a record
S1: Open the file in the append mode, if the file specified is not found or unable to open
then display error message and go to step5 , else go to step2
S2: Scan all the details of record one by one from file until end of
file is reached.
S3: Read the details of the record form the keyboard and write
the same to the file
S4: Close the file.
4. Algorithm for deleting a record
S1: Open the file in the append mode, if the file specified is not found or unable to
open then display error message and go to step5, else go to step2
S2: Accept the key from the user to delete the record
S3: Search for the target key in file. If target key exits, copy all the records in the file
except the one to be deleted in another temporary file.
S4: Close both files
S5: Now, remove the old file & name the temporary file with name same as that of old
file name.
5. Algorithm for displaying particular record(search)
S1: Open the file in the read mode, if the file specified is not found or unable to
open then display error message and go to step6, else go to step2.
S2: Read the target key whose details need to be displayed.
S3: Read each record from the file.
S4: Compare the key field of the record scanned from file with target key specified
by the user.
S5: If they are equal then display the details of that record else display “required
record not found “ message and go to step6.
S6: Close the file.
External Sort(Merge Sort):
External sorting is a technique in which the data is stored on the secondary
memory, in which part by part data is loaded into the main memory and then
sorting can be done over there.
Then this sorted data will be stored in the intermediate files.
External sorting is required ,when the data we have to store in the main memory
does not fit into it.
Basically, External sort consists of two phases :
1. Sorting phase: This is a phase in which a large amount of data is sorted in
an intermediate file.
2. Merge phase: In this phase, the sorted files are combined into a single
larger file.
Merge sort is defined as sorting algorithm that works by dividing an array into
smaller subarrays, sorting each subarray, and then merging the sorted subarrays
back together to form the final sorted array.
Suppose we have an array A, such that our main concern will be to sort the
subsection, which starts at index l and ends at index h, represented by A[l..h].
Divide : If assumed m to be the central point somewhere in between l and h then
we will fragment the subarray A[l..h] into two arrays A[l..m] and A[m+1, h].
Conquer : After splitting the arrays into two halves, the next step is to conquer.
In this step, we individually sort both of the subarrays A[l..m] and A[m+1, h]. In
case if we did not reach the base situation, then we again follow the same
procedure, i.e., we further segment these subarrays followed by sorting them
separately.
Combine: As when the base step is acquired by the conquer step, we successfully
get our sorted subarrays A[l..m] and A[m+1, h], after which we merge them back
to form a new sorted array [h..r].
Working of merge sort is as shown below: