COM 214 File Organization and Management Lecture Note 3
COM 214 File Organization and Management Lecture Note 3
Page 1 of 7
COM 214 File Organization and Management Lecture Note 3
File updating: This is an act of changing values in one or more records of a file without
changing the organization of the file. That is making the file modern by adding most recent
data to the file.
Sorting: Sorting means the rearrangement of data in either ascending or descending order.
It involves the arrangement of grouped data elements into a predetermined sequence to
facilitate file processing.
Calculating: This is the arithmetic or logical manipulation of data in a file.
File querying/interrogating: This involves the retrieving of specific data from a file according
to the set of retrieval criteria.
File merging: This is the combination of multiple sets of data files or records to produce only
one set, usually in an ordered sequence.
Reporting: Reporting is a file processing operation that deals with the production [printing]
of report from the file in a specified format.
File display: The contents of a data file can be displayed either on the computer screen as
soft copy or printed on the paper as hard copy.
File storage: When a file is created, it is stored in the appropriate storage medium such as
disk, flash disk, tape, drum, etc.
Data Processing
Data Processing is the analysis and organization of data by the repeated use of one or more
computer programs. It is used extensively in business, engineering, science and in fact, in all
areas.
Page 2 of 7
COM 214 File Organization and Management Lecture Note 3
In Engineering and sciences, data processing is used for a wide variety of applications such
as:
The processing of seismic data for oil and mineral exploration
The analysis of new product designs
The processing of satellite imagery
The analysis of data from scientific experiments.
Database
A database is a collection of common records that can be searched, accessed, and modified,
e.g bank account records, school transcripts, and income tax data. In database processing, a
computerized database is used as the central source of reference data for the
computations.
Transaction processing refers to interaction between two computers in which one computer
initiates a transaction and another computer provides the first with the data or computation
required for that function.
Most modern data processing uses one or more databases at one or more central sites.
Transaction processing is used to access and update the databases when users need to
immediately view or add information; other data processing programs are used at regular
intervals to provide summary reports of activity and database status. Examples of systems
that involve all of these functions are:
Automated teller machines
Credit sales terminals
Airline reservation systems.
Page 3 of 7
COM 214 File Organization and Management Lecture Note 3
The original data is first recorded in a readable form by the computer. This can be
accomplished in several ways which are:
By manually entering information into some form of computer memory using a
keyboard
By using a sensor to transfer data onto a magnetic tape or floppy disk
By filling in ovals on a computer-readable paper form
By swiping a credit card through a reader.
The data are then transmitted to a computer that performs the data processing functions.
This step may involve physically moving the recorded data to the computer or transmitting it
electronically over telephone lines or the Internet.
Once the data reach the computer’s memory, it is been processed with the following
operations:
Accessing and updating a database
Creating or modifying statistical information.
After processing the data, the computer reports summary results to the program’s
operator.
As the computer processes the data, it stores both the modifications and the original data.
This storage can be both in the original data-entry form and in carefully controlled computer
data forms such as magnetic tape. Data are often stored in more than one place for both
legal and practical reasons. Computer systems can malfunction and lose all stored data, and
the original data may be needed to recreate the database as it existed before the crash.
The final step in the data-processing cycle is the retrieval of stored information at a later
time. This is usually done to access records contained in a database, to apply new data-
processing functions to the data or in the event that some part of the data has been lost to
recreate portions of a database. Examples of data retrieval in the data-processing cycle
include the analysis of store sales receipts to reveal new customer spending patterns and
the application of new processing techniques to seismic data to locate oil or mineral fields
that were previously overlooked.
Page 4 of 7
COM 214 File Organization and Management Lecture Note 3
and analyze census data became such an overwhelming task for the United States
government in 1890, the U.S. Census Bureau contracted American engineer and inventor
Herman Hollerith to build a special purpose data-processing system. With this system,
census takers recorded data by punching holes in a paper card the size of a dollar bill. These
cards were then forwarded to a census office, where mechanical card readers were used to
read the holes in each card and mechanical adding machines were used to tabulate the
results. In 1896 Hollerith founded the Tabulating Machine Company, which later merged
with several other companies and eventually became International Business Machines
Corporation (IBM).
During World War II (1939-1945) scientists developed a variety of computers designed for
specific data-processing functions. The Harvard Mark I computer was built from a
combination of mechanical and electrical devices and was used to perform calculations for
the U.S. Navy. Another computer, the British-built Colossus, which was all-electronic
computing machine designed to break German coded messages. It enabled the British to
crack German codes quickly and efficiently.
The role of the electronic computer in data processing began in 1946 with the introduction
of the Electronic Integrator and Computer [ENIAC], the first all-electronic computer. The
U.S. armed services used the ENIAC to tabulate the paths of artillery shells and missiles. In
1950 Remington Rand Corporation introduced the first non-military electronic
programmable computer for data processing. This computer is called the Universal
Automatic Computer [UNIVAC], which was initially sold to the U.S. Census Bureau in 1951;
several others were
eventually sold to other government agencies.
With the purchase of a UNIVAC computer in 1954, General Electric Company became the
first private firm to own a computer and was followed by Du Pont Company, Metropolitan
Life, and United States Steel Corporation. All of these companies used the UNIVAC for
commercial data-processing applications. The primary advantages of this machine are:
Its programmability
Its high-speed arithmetic capabilities
Its ability to store and process large business files on multiple magnetic tapes.
The UNIVAC gained national attention in 1952, when the American Broadcast Company
(ABC) used a UNIVAC during a live television broadcast to predict the outcome of the
Page 5 of 7
COM 214 File Organization and Management Lecture Note 3
presidential election. Based upon less than 10 percent of the election returns, the computer
correctly predicted a landslide victory for Dwight D. Eisenhower over his challenger, Adlai E.
Stevenson.
In 1953, IBM produced the first of its computers, the IBM 701—-a machine designed to be
mass-produced and easily installed in a customer’s building. The success of the 701 led IBM
to manufacture many other machines for commercial data processing. The sales of IBM’s
650 computer were a particularly good indicator of how rapidly the business world accepted
electronic data processing. Initial sales forecasts were extremely low because the machine
was too expensive, but over 1800 were eventually made and sold.
In 1950s and early 1960s data processing was essentially split into two distinct areas which
are:
Business data processing
Scientific data processing
with different computers designed for each. In an attempt to keep data processing as similar
to standard accounting as possible, business computers had arithmetic circuits that did
computations on strings of decimal digits [numbers with digits that range from 0 to 9].
Computers used for scientific data processing sacrificed the easy-to-use decimal number
system for the more efficient binary number system in their arithmetic circuitry.
The need for separate business and scientific computing systems changed with the
introduction of the IBM System/360 family of machines in 1964. These machines could all
run the same data-processing programs, but at different speeds. They could also perform
either the digit-by-digit math favoured by business or the binary notation favoured for
scientific applications. Several models had special modes in which they could execute
programs from earlier IBM computers, especially the popular IBM 1401. From that time on,
almost all commercial computers were general-purpose.
Page 6 of 7
COM 214 File Organization and Management Lecture Note 3
The division between business and scientific data processing also influenced the
development of programming languages in which application programs were written. Two
such languages that are still popular today are COBOL [Common Business Oriented
Language] and Fortran (FORmula TRANslation). Both of these programming languages were
developed in the late 1950s and early 1960s, with COBOL becoming the programming
language of choice for business data processing and FORTRAN for scientific processing. In
1970s other language
such as C was developed. This language reflected the general-purpose nature
of modern computers and allowed extremely efficient programs to be developed for almost
any data-processing application. One of the most popular language currently used in data-
processing applications is an extension of C called C++. C++ was developed in 1980s and is
an object-oriented language, a type of language that gives programmers more flexibility in
developing sophisticated applications than other types of programming languages.
Page 7 of 7