0% found this document useful (0 votes)
18 views11 pages

The Effect of Database Filters On The Pe

This paper investigates the impact of database filters on the performance of buffered relational database systems, proposing a queueing model to analyze their effectiveness. The findings indicate that while database filters can slightly improve system performance, they often lead to a degradation in performance due to decreased buffer hit rates. Ultimately, the use of database filters is not recommended for buffered database systems as their benefits are minimal and can result in additional I/O operations.

Uploaded by

cmjng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

The Effect of Database Filters On The Pe

This paper investigates the impact of database filters on the performance of buffered relational database systems, proposing a queueing model to analyze their effectiveness. The findings indicate that while database filters can slightly improve system performance, they often lead to a degradation in performance due to decreased buffer hit rates. Ultimately, the use of database filters is not recommended for buffered database systems as their benefits are minimal and can result in additional I/O operations.

Uploaded by

cmjng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

hformarion Sys&?m Vol. 18, NO. 2, pp. 99-109. 1993 0306-4379193 $6.00 + 0.

00
Printed in Great Britain Pergamon Press Ltd

THE EFFECT OF DATABASE FILTERS ON THE


PERFORMANCE OF BUFFERED RELATIONAL
DATABASE SYSTEMS

JANG-JONG FAN and KEH-YIH Su


Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30043, R.O.C.

(Receitwd 29 May 1991; in revised form 22 May 1992)

Abstract-To reduce the load of CPU’s, database filters are installed on many database machines to filter
out irrelevant data from the mass storage devices. Furthermore, as CPU’s are usually much faster than
I/O devices, database buffers are also installed on many database systems to avoid additional physical
I/O operations. However, after the data pages have been filtered by the database filter, the buffer hit rate
may decrease and the performance of the database system degrades accordingly. In this paper, a queueing
model is proposed to study the effectiveness of the database filter. The result shows that the system
performance can be improved slightly by the database filter. A simulation model is proposed afterwards
to compare the performance of database systems with and without the database filter under the execution
of various Selection operations. The result concludes that the database filter can only improve the system
performance by a factor of 1.24 at most, and the system perfo~an~ actually degrades in most cases.
Based on the performance analysis, the use of the database filter in the buffered database systems is not
recommended.

Key words: Database filter, database buffer, database machine

1. INTRODUCTION

In conventional database management systems, data in the secondary storage devices must be
loaded into the main memory before the database operations can be performed. During the process
of loading, a large amount of irrelevant data may also be transferred into the main memory. System
becomes inefficient while dealing with these redundant data. The database filter, which has been
used in many database machines [l-17], is thus used to eliminate this inefficiency. An abstract
model of a system with a database filter is shown in Fig. 1. The database filter, which is installed
between the secondary storage device and the host computer, can on-the-fly filter out irrelevant
data from the secondary storage device. Therefore, the advantage of using the database filter is
to save the CPU power and the I/O channel capacity.
In the database filter, an output buffer is usually installed to store the temporary result data
[8-lo]. The input to the database filter is all the tuples of a relation. The output tuples that satisfy
the filtering criterion are then stored in the output buffer. The filtering criterion usually consists
of the selection predicates in the Selection operation and a list of projected attributes in the zyxwvut
Projection operation. When the output buffer is full, the tuples stored in it are either dumped into
the secondary storage device as a new relation or directly transferred to the host computer for
further processing.
Despite the popular use of database filters in many database machines, very few papers address
their performance [8,9]. [9] States the performance of the database filter but not the overall system
performance. In [8], the database filter has been implemented as an I/O device and communicates
with PC/AT through the PC/AT internal bus. The execution time of performing the Selection
operation on a relation is evaluated for both dBASEII1 and the database filter. The relations in
the hard disk are stored as the sequential files. The results show that the performance of the
database filter works five times better than that of dBASE111 in average.
The performance evaluation of the database filter conducted in [8] is based on the single-user
environment (i.e. only one database operation is executed in the system at one time), which is
different from the multi-user environment used in many computers. In the multi-user environment,

99
100 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
JANG-JONGFANand KEH-YIH Su

Fig. 1. The abstract model for the system with a database filter.

the database filter cannot be monopolized by one Selection operation. It must be shared among
different Selection operations. Thus the effectiveness of the database filter might be different from
that in the single-user environment.
On the other hand, to reduce the number of the accesses of the physical disk pages, the database
buffer is usually installed in the database management system. As the disk I/O frequently becomes
the bottleneck in the database management system, the database buffer improves the system
performance greatly. To make the database buffer effective, it is important to keep the buffer hit
rate, defined as the probability of finding a desired disk page in the database buffer, as high as
possible. However, using the database filter to filter the data might decrease the buffer hit rate. The
reason is that the data of the disk page in the database buffer may be incomplete after filtering.
This filtered page therefore can not be reused by other queries unless the filtering criterion is the
same. The question is thus raised: considering the possible additional I/O accesses caused by the
database filter, is it still worthwhile using the database filter? A thorough analysis is made in this
paper to answer this question.
In this paper, a queueing model is first given to study the performance effectiveness of the
database filter under the multi-user environment. The result shows that the system performance
can be improved slightly by the database filter in the best case. A simulation is then conducted
for more detailed performance measurement. The simulation uses the synthetic workload composed
of various Selection operations. The simulation shows that the database filter can improve the
system performance by a factor of 1.24 at most, and the performance actually degrades in most
cases.
This paper is outlined as follows. Section 2 describes the queueing model of studying the
performance effectiveness of the database filter. A strategy of using the database filter is presented
in Section 3. The simulation and the result analysis are given in Section 4. Finally, Section 5 remarks
the conclusions.

2. THE EFFECT OF THE DATABASE FILTER ON THE


SYSTEM PERFORMANCE

In the database system with the database filter, only the tuples satisfying the filtering criterion
will be transferred to the database buffer. Therefore, a disk page in the database buffer may be
different from itself in the disk. The subsequent reference to the same disk page then requires an
additional physical disk access if the current filtering criterion is different from the previous one.
For example, the index nodes with higher level (especially the root node) will be frequently
reaccessed with different filtering criterion, and it will cause many additional I/O if those index
nodes are filtered each time. Due to this reason, the database filter may have negative effect on
the system performance. This section presents a queueing model to study the effect of the database
filter on the system performance under the multi-user environment.
The database filter can on-the-fly filter out the irrelevant information when the disk pages are
transferred from the disk to the database buffer. Thus, the query execution in the database system
with the database filter can reduce the data transmission time and the CPU processing time because
there are less data to be transferred and processed. Since the function of the database filter is
to filter the tuples according to the filtering criterion, only the Selection operation among
different database operations will be affected. The performance comparison between the database
systems with and without the database filter is thus conducted on the execution of the Selection
operation.
Database filters and buffered relational database systems 101

The behavior of performing the zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB


Selection operation can be represented by a sequence of page
references. For each page reference, only 1 page is obtained from either the database buffer or the
disk, and this page is then processed by the CPU. In a single-user environment, the time for 1 query
execution is equal to the sum of the total page processing time, which consist of the disk access
time and the CPU processing time. In the muhi-user environment, the resources (i.e. the disk and
CPU in this case) are shared among all the concurrently executed queries. The query execution time
estimated by the queueing model must also consider the disk waiting time and the CPU waiting
time. The advantage of using the queueing model is that it can be performed with little expense.
Although the service time and the queueing discipline used in the model are assumed to be
exponentially distributed and the First-Come-First-Serve, respectively, which may deviate from the
real situation, the queueing model is capable to give the satisfied results. The estimated utilization
and throughput are reported to have only 5-10% error, and the response time has I&30% error
1111.
To take the number of concurrently executed Selection operations as parameter, a closed
queueing model is adopted to model the execution of the Selection operation in the multi-user
environment, as shown in Fig. 2. This model consists of a CPU server and a disk server. The time
spent in the I/O channel is imbedded in the time spent in the disk server. Each terminal server
submits one Selection operation at a time. Each Selection operation consists of a sequence of page
references. However, for each Selection operation under execution, only I page reference is
generated at the same time. After having finished the current page, the next page reference is
generated accordingly. For each page reference, the access of the physical disk page occurs with
probability (1 - H), where H denotes the buffer hit rate. The referenced page is then processed by
the CPU and the next page reference is immediately generated without any delay (i.e. terminal
servers have zero thinking time).
In the following discussion, we assume that all the Selection operations consist of the same
number of referenced pages. Then the task of comparing the execution time of the Selection
operations in both database systems, with and without the database filter, is simplified to just
comparing the page processing time, which includes the time to access and process the page. With
the same page processing procedure in both database systems, the page processing time can be
calculated with different input parameters in the closed queueing model as shown in Fig. 2. In this
closed queueing model, the page processing time is equal to the degree of multiprogramming times
the inverse of the page processing rate. Suppose that successive CPU processing time and disk page
access time are independently exponentially distributed with mean l/p and l/n respectively, and
let n be the total number of the Selection operations executed. Then the CPU utilization is
calculated by [12]
p _p*+t
1 _pn+’ Pfl
’ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO
U=
n (1)
p=l
i z-i’

Disk server

Terminal servers
Fig. 2. The closed queueing model for the execution of the Selection operation in the multi-user
environment.
102 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
JANG-JONGFAN and KEH-YIH Su

where p is i/(p * (1 - H)) and H is the buffer hit rate of the database system without the database
filter. When CPU keeps busy, the completion rate of the page processing is 1. The page processing
rate of the database system without the database filter zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ
T is hence formulated as

T=p*U. (2)

Similarly, the page processing rate of the database system with the database filter T,, is calculated
by replacing l/p, l/1 and H with l/p,,r, l/& and Hdr, respectively, in equations (1) and (2). Here,
pd(dT
and &r denote the mean CPU processing and disk page access rates of the database system with
the database filter respectively, and HdT is the buffer hit rate of the database system with the
database filter.
The speed-up ratio, defined as T,,,/T, is used to measure the effect of the database filter on the
system performance. Consider the general case that p # 1, the speed-up ratio can be rewritten as
follows according to equations (1) and (2),
r n+l -
/I

L”
1

1
lI+l
u(1 - H) - u(1 -H) zyxwvutsrqponmlkjihgfedcb

c I- A 1
‘df
-
(3)
U

[ u(1 -H) 1
As presented in [13], the Selection operation on a 4 kbytes page requires l-2.5 msec CPU processing
time and 30 msec page access time. Those CPU processing and disk page access time are measured
from a very old and slow VAX 1l/750 minicomputer equipped with Fujitsu Eagle disk (the average
seek and latency time are 20 and 8 msec, respectively). The tuple length is 182 bytes. The selection
predicate includes 2 attributes, i.e. a 2-byte integer and a 52-byte string. With the current
technology, the capability of a CPU can reach few hundreds MIPS (Million Instructions per
Second). However, the current disk speed (the average seek and latency time are 14 and 8 msec,
respectively) is only slightly faster than Fujistu Eagle disk. Therefore, it is reasonable to assume
that the CPU processing rate p is much faster than the disk page access rate 1 in the model. In
addition, the buffer hit rate is rarely greater than 0.9 in actual applications. Thus, the terms
[i./u(l -H)]“” and [ndf/udf(l - Hdry+ ’ in equation (3) can thus be eliminated in the multi-user
environment. For example, given l/n = 30 msec, l/u = 1 msec, H = 0.9 and n = 4, then n/,u = 0.33
and [%/u( 1 - H)r + ’ = 0.004. Similar argument can be made on the term [&/l(&( 1 - HdF)]“+‘. The
equation (3) is then reduced to
i.&-(1 - H)
(4)
1(1 - HdT).
From equation (4), the maximum speed-up ratio is ldr/l. when both database systems have the
same buffer hit rate (i.e. Hdt = H). In order to release the I/O channel for the use of other devices
during the period of moving the disk head, a local buffer is usually installed in the disk controller
to buffer the disk page. The value of 1 is then equal to l/[( Txek + Tatcncy+ Tpagc)+ T,,,o], where T,, ,
Tlatency 3 Tpagcand T,!. denote the seek time, the latency time, the page reading time and the
3 Ttram~er
I/O channel transmission time, respectively. Since the database filter only reduces the I/O channel
transmission time from the local buffer of the disk controller to the system main memory, the value
of ).dT is equal to l/I(T,k + Tatency + T,,) + Tlio * P,], where Pr is the filtering factor, which is
defined as [(the portion of a disk page to be transferred)/(the total page size)]. When Pr= 0, the
maximum speed-up ratio is obtained with the value of
1
” = CT,, + T,atency
+ T,, ) + TPO
(5)
1 CT,, + Tatmy+ Tpage) ’
If the disk access time [i.e. (T,, + T,s,cncy+ T,,,)] dominates, the maximum speed-up ratio
approximates to 1. If the I/O channel transmission time (i.e. T,;,) dominates, the maximum
speed-up ratio increases as the I/O channel transmission time increases.
To get a practical range of the maximum speed-up ratio, the following parameters are plugged
in equation (5). The size of the disk page is assumed to be ranged from 512 to 4096 bytes [14].
Consider the current disk technology, the average seek and latency time can be 14 and 8 msec,
respectively, and the transfer rate can reach 3 Mbytes/set (e.g. FH-3000 x series with the SCSI
Database filters and buffered relational database systems 103

interface [l S]). The current l/O channel usually has 10 Mbytes/set transmission rate. Based on these
assumptions, the maximum speed-up ratio are estimated as 1.002 and 1.017 when the size of the
disk page are 512 and 4096 bytes, respectively.
Finally, we conclude this section with some remarks.
1. In the past, as the CPU speed was not much faster than the disk speed, using the database
filter to reduce the CPU load greatly improved the system performance. However, as the
current CPU speed is much faster than the disk speed and the database filter only reduces
the I/O channel transmission time, the database filter no longer plays an important role for
improving the system performance.
2. From equation (4). the buffer hit rate of the database system with the database filter must
be greater than / I \

c1+(1-H)
1
to make the speed-up ratio greater than 1. The lower bound of Hdf [i.e. (1 - &,/A( 1 - H))]
decreases as the page size increases (as the page size increases, r,,, increases), as shown in
equation (5). However, given the page size be 4096 bytes, the lower bound is
(1 .017 * H - 0.0 17). Therefore, the buffer hit rate of the database system, with the database
filter, must decrease no more than (0.017 - 0.017 * H) to make the speed-up ratio greater
than 1. When N = 0, the maximum decrease of the buffer hit rate is only 0.017. The
observation is that a little decrease of Hdf will make the speed-up ratio to be less than 1.
Since the advance of the disk speed is far behind the advance of the CPU speed, the maximum
allowable decrease of the buffer hit rate will even be smaller as the semiconductor technology
pushes further.

3. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
THE STRATEGY OF USING THE DATABASE FILTER WITH THE
DATABASE BUFFER

As discussed in the previous section, a little decrease of the buffer hit rate will cause the speed-up
ratio to be less than 1. A strategy of using the database filter is thus proposed in this section to
make the decrease of the buffer hit rate as little as possible.
Basically, the database filter can be used in a more effective way if the following principle is
obeyed: if a disk page will be reused in the near future, the disk page should not be filtered.
Otherwise, it can be filtered.
Since the reference patterns of relational database operations are usually regular and predictable,
they can be examined in advance to determine the usage of the database filter. Depending on the
file structure where the relation is stored, two access methods, which are sequential search and index
search, are commonly used to perform the zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJI
Selecrion operation. Once the access method is
determined, the use of the database filter can be set as follows.
1. If a relation is stored as a sequential file, each data page of the file is thus sequentially
scanned to perform the Selection operation. Since each page will not be reused after it is
released, it can be filtered without decreasing the buffer hit rate.
2. If an index tree (e.g. B+-tree (161) is available, the Selection operation can be performed
by a few index scans. In each index scan, the index pages (i.e. the disk pages contain the
index entries) in the index tree are first searched down from the root level, then followed
by searching the data pages (i.e. the disk pages contain the tuples) on the leaf level. Since
each data page is not likely to be reused, it can be filtered. However, the other index pages
are more likely to be reused ~especiaIly the root index page that will be reused for each index
scan), they should not be filtered.

4. THE SIMULATION

The queueing model given in Section 2 concludes that the maximum speed-up ratio is only
slightly greater than 1. It also shows that a little decrease of the buffer hit rate will make the
104 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
FAN and KEH-YIH SIJ
JANG-JONG

speed-up ratio to be less than I, However, it does not show how much the buffer hit rate will
decrease (or how much the system performance will degrade) when the database filter is installed
in the database system. Besides, it is not easy to accurately model the behavior of the database
buffer. Although directly measuring the performance of an existing system is feasible, it is too
expensive. Therefore, the computer simulation is a suitable choice to answer those questions based
on the trade-off between the cost and the accuracy. This section first describes the simulation model
and then discusses the performance effectiveness of the database filter based on the simulation
results. Finally, the simulation result is used to verify the correctness of the queueing model
proposed in Section 2.

4.1. The design of the simulation model


The simulation model which simulates the query execution in both database systems with and
without the database filter is shown in Fig. 3. This is the simulator for the database system with
a CPU, a disk, an I/O channel and a database buffer. Each terminal server submits a query at a
time. During the execution of a query, only 1 page reference is generated at the same time. For
each page reference, the buffer manager first checks if the referenced page is in the buffer pool.
If not, the referenced page is read from the disk and then transferred to the buffer pool through
the I/O channel. After the referenced page of a query is processed by the CPU, the next page
reference of that query is sent to the buffer manager.
To simulate the query execution in the database system without the database filter, each query
is represented by a sequence of events. Each event consists of a triplet of (page index, I/O time,
CPU time). The simulator takes the first event from each query and sends them to the buffer
manager. The buffer manager checks the events one by one if the referenced page, indicated
by the field “page index” in the triplet, is in the database buffer. If the referenced page is
missed, the corresponding event is then sent to the disk server. Otherwise, the event is directly
sent to the CPU server. The disk service time is the sum of the seek, latency and page transfer
time. The service time for the I/O channel and CPU servers are determined by the fields “‘T/O
time” and “CPU time” in the triplet, respectively. When the first event of a query is finished
(i.e. the event leaves the CPU server), it is deieted from the event list of that query. The same
procedure is repeated for each subsequent event until the query execution is finished. When a query
is finished, a new query is generated immediately such that the degree of multiprogramming is
fixed.
To simulate the query execution in the database system with the database filter, each query is
also represented by a sequence of events. Each event consists of a quadrupiet of (page index,
filtering criterion, I/O time, CPU time). The field “filtering criterion” indicates which filtering
criterion will be used to filter the referenced page. The major difference between the database

Uo channel
Diskserver server
Buffer M~ger CPU server

next page reference curnmtly finished page

Te&al servers
Fig. 3. The simulation model for the query execution in the multi-user environment.
Database filters and buffered relational database systems 105

systems with and without the database filter is the operation of the buffer manager. The buffer
manager with the database filter first checks if the referenced page is in the database buffer. If yes,
it then checks if the current filtering criterion on the referenced page is the same as the previous
one on the same page in the database buffer.
In order to generate the sequence of events for each query, a C program is implemented on SUN
3/160 to execute the query. During the query execution, the triplet or quadruplet, depending on
whether the database filter is used, is recorded for each page reference. This C program is designed
to count the total number of MC68020 assembly instructions executed during the processing of
the referenced page. The CPU time to process the referenced page is then determined by summing
the execution time of these assembly instructions according to the inst~~tion timing table of
MC68020 (171. The IjO channel transmission time is determined by the size of the data to he
transferred times the I/O transmission rate.
Other assumptions and parameters used in the simulation model are outlined as follows. These
parameters are set to be as realistic as possible.
1. Since the global LRU replacement policy is widely used in many commercial database
management systems (e.g. INGRES and System R), the buffer management policy is
assumed to be the global LRU replacement policy 1141.Shared read and exclusive write are
permitted for each buffer frame in the buffer pool. The size of a buffer frame is equal to
that of a disk page which is assumed to be 4 Kbytes. The size of the database buffer is set
to I Mbytes, which is about half the size of the accessed relation. This buffer size is adopted
to avoid the buffer accommodating an entire relation.
2. The clock rate of CPU is assumed to be 20 MHz. The scheduling algorithm for using CPU
is assumed to be round-robin.
3. The disk scheduling algorithm is LSTF (Least Seek Time First). The timing specifications
of the disk drive are taken from FH-3000 disk drive 1151.The seek time is dete~ined by
the distance between the cylinder number of the referenced page and that of the current
disk head position. The latency time is assumed to be uniformly distributed between zero
and one rotation time, which is 16msec.
4. The transmission rate of the I/O channel is assumed to be 10 Mbytes/set.
5. The setup time of the database filter is ignored, because the setup time is usually quite small
compared to the disk access time.
6. The overhead for buffer management, disk schedule, processor schedule and I/O setup time
are ignored due to the fact that these overhead are usually much smaller than the execution
time of a query.

4.2. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Work load synthesis
As addressed by Boral [I 81, the multiprogramming level, the query mix and the degree of data
sharing are three major factors that affect the database performance in a multi-user environment.
The number of concurrent queries, i.e. the multiprogramming level, in the following simulation
varies from 1 to 8. Three levels of data sharing are defined in our simulation:

I. Level 1: each query accesses a different copy of the relation.


2. Level 2: every 2 queries share a copy of the relation.
3. Level 3: all concurrent queries share only 1 copy of the relation.

The synthetic relations used in our simulation are based on the one proposed in 1191. The
advantage of adopting these synthetic relations is that the selectivity factor, defined as the ratio
of the total number of the tuples that satisfy the selection predicates to that of the tupies in the
relation, can be easily dete~ined by selecting different attributes. Two basic relations, namely
RelationA and RelationB, are designed as follows for our simulation. The size of each relation is
set to 10,000 tuples. Each tuple is 212 bytes long and consists of a number of integer and string
attributes as shown in Table 1. The domains of attributes are designed in such a way that various
selectivity factors can be obtained. For example, each attribute value of the attribute “uniquely’
is unique, and the domain of the attribute “two” contains only 2 different attribute values. For
the detailed description of the tuple format, reader may refer to 1201.
106 JANG-JONG zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF
FAN and KEH-YIH Su

Table 1. The schema of the synthetic relations


unique1 unique2 2 5 10 20 50 loo 200
4 bytes 4 bytes 4 bytes 4 bytes 4 bytes 4 bytes 4 bytes 4 bytes 4 bytes
500 1000 2000 5000 10,000 stringul stringd string4
4 bytes 4 bytes 4 bytes 4 bytes 4 bytes 52 bytes 52 bytes 52 bytes

To compare the performance of the database systems with and without the database filter, the
following two queries which consist of one Selection operation are designed. The first query
represents a sequential search and the second one represents an index search. To improve the system
performance with the database filter, the selectivity factor is set as low as possible. In the multi-user
environment, many users may simultaneously access the same relation with different selection
predicates. To simulate this situation, the searching key keyval in each query is generated by a
random variable which is uniformly distributed over (0,9999). In Query II, the B+-tree [12] is
constructed on the attribute ‘“uniquel” of RelationB.

Query I (search sequential file):


SELECT all
FROM relationA
WHERE “uniquel” = keyval;
Query II (search index file):
SELECT all
FROM relationB
WHERE “unique~“~keyval;

4.3. Performance measure


The buffer hit rate and the speed-up ratio defined in Section 2 are used as 2 performance
measures. It is important to use the performance measure which is independent of the given timing
parameters. Since the buffer hit rate is independent of the timing parameters, it is very reliable to
be used as the perfo~ance measure. When the CPU speed is much faster than the disk speed-up,
ratio is dominated by the buffer hit rate. Therefore, the speed-up ratio varies slightly when different
timing parameters are set. In order to make the measurements to be statistically significant,
Monte Carlo simulations are executed 1000 times so that the coefficient of variation [21] is less
than 0.1.

With the simulation model described above, the performance of the database systems with and
without the database filter are compared. Two strategies of using the database filter are adopted
in the simulation. The first strategy follows the principle described in Section 3. The second strategy
uses the database filter all the time regardless of whether it is an index page or a data page.
When Query I is executed, the speed-up ratio and the buffer hit rate are shown in Fig. 4 and
Table 2 respectively. Since RelationA is not indexed with any atribute, the Selection operation is
performed by sequential scan on all data pages. In this case, the same results are obtained using
either the first or the second strategy. As expected by the queueing model depicted in Section 2,
the speed-up ratio is slightly greater than 1 when the buffer hit rate does not decrease (i.e. in the
case of Level 1 data sharing). When data sharing happens (i.e. in the cases of Level 2 and Level
3 data sharing), the speed-up ratio is less than 1 because of the decrease of the buffer hit rate as
shown in Table 2. The database filter can only improve the system performance by a factor of 1.24

Table 2. The buffer hit rate in execution of Query I (search sequential file)
Number of Level 1 data sharing Level 2 data sharing Level 3 data sharing
concurrent
queries zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
H &if H f &l H f f di

?t =l 0.~ ~ 0.~
n=2 o.ooC@Oo 0.000000 0.495288 O.oowoO 0.495288 0.000000
n=4 0.000000 0.000000 0.271123 0.000000 0.695287 0.000000
n=6 0.000000 0.000000 0.191099 0.000000 0.805711 0.000179
?I=8 0.000000 0.000000 0.174714 0.000000 0.8665 I3 0.000000
Database filters and buffered relational database systems 107

n Level 1 data sharing,_____


-----_-___“_________ datapages are filtered
__ __________“__
o Level 2 data sharing,-V-.-M
----_-.-.--_--_------- datapages are filtered
_- --_-_---___-_
A Level 3 data sharing,_____
-----_~__~--_~---_-- datagages are filtered
__ _____________

O_________‘“‘----‘--f)
__-- __--
___---

c-
*.o-‘----
_A’
_/
.’
I”
c’

arc::_
--__
---___
--__,
*-__
--__
--__
--%_
--__ -A-----_______
---‘----A

2 3 4 5 6 I 8
Number of concu~ent queries
Fig. 4. The speed-up ratio in execution of Query I (search sequential file).

in the best case (i.e. in the case of the single-user environment without data sharing). In most cases,
using the database filter degrades the system performance.
As shown in Fig. 4, when the number of concurrent queries increases, the speed-up ratio increases
in the case of Level 2 data sharing, and it decreases in the case of Level 3 data sharing. This can
be expected from the buffer hit rates as shown in Table 2. When the number of concurrent queries
increases, the difference between H and Hdf decreases in the case of Level 2 data sharing, but the
difference increases in the case of Level 3 data sharing. Since every 2 queries simultaneously access
the same relation in the case of Level 2 data sharing, increasing the number of concurrent queries
will increase the number of different relations stored in the database buffer. Thus, the probability
for each query to find a desired disk page in the database buffer would decrease. In the case of
Level 3 data sharing, all the queries simuhaneously access the same relation. Because each page
will be reaccessed by many queries in the near future and the global LRU replacement policy is
adopted, the buffer hit rate would increase as the number of concurrent queries increases.
Table 3 shows the speed-up ratio measured from the simulation and that estimated by the
queueing model. The estimated speed-up ratio is calculated from equation (4) in Section 2 with
the values of H and Z& obtained from Table 2. The estimation error of the speed-up ratio from
the queueing model is less than 10% in the multi-user environment. This again verifies the rest&s
predicted by the queueing model.
When Query II is executed, the speed-up ratio and the buffer hit rate are shown in Fig. 5 and
Table 4 respectively. The similar results as those of the execution of Query I can be found (i.e. when

Table 3. The comparison of speed-up ratios measured from the simulation and that estimated by the queueing model in execution of
Ouerv 1
0% data sharing 50% data sharing 100% data sharing
Number of
concurrent Measured from Estimated by the Measured from Estimated by the Measured from Estimated by the
queries the simulation queue& model the simulation queueing model the simulation queneing model
n=l 1.236 1.017
n=2 I .027 1.017 0.449 0.493 0.449 0.493
n=4 1.001 1.017 0.716 0.741 0.341 0.309
n=6 I .002 1.017 0.81 I 0.823 0.203 0.197
n=8 0.994 1.017 0.831 0.839 0.148 0.135
108 JANG-JONGFAN and KEH-YIHSu

q Level 1 data sharing, _-_-


____________________ only _-_--
data pages are filtered
__ ___-____-_--_-_---
o Level 2 data sharing,_on!y_dala_pes __-____________-_-
_______-_--_-_------ are filtered
A Level 3 data
-_---------‘-‘-----.- sharing,_o~Jy datapages are filtered
_____ _ __________________
+ Level 1 data sharing, _________-_____
__________-_________ index and datapages are filtered
__ ___-______-__
x Level 2 data sharing, _r_--_-_----_--
_____________-__-__- index and datalages are filtered
_ _-_-____-_--_
o Level 3 data
____________________ sharing,mdexanli_datapages are filtered
__ ___-____-_-__

2 3 4 5 6 I 8
Number of concurrent queries
Fig. 5. The speed-up ratio in execution of Query II (search the index file).

the buffer hit rate decreases no more than 0.017, the speed-up ratio approximates to I; otherwise,
the speed-up ratio is less than 1). Besides, each entry in the column labeled as “II,,;’ in Table 4
contains 2 values. The upper value indicates the buffer hit rate using the first strategy, and the lower
one indicates the buffer hit rate using the second strategy. Since the locality property exists in
accessing the index pages, the second strategy, which filters the index pages, has a lower buffer hit
rate, and its performance is worse than that of the first strategy.
Based on the simulation results presented above, the same conclusion as predicted by the
queueing model is obtained. In addition, the simulation results show that the buffer hit rate actually
decreases greatly and the speed-up ratio degrades a lot in the case of data sharing. Although the
first strategy prevents the decrease of the buffer hit rate when the index pages are accessed, it does
not successfully prevent the decreases of the buffer hit rate when the data pages are accessed in
the case of data sharing. Therefore, the reaccess to the filtered page stored in the database buffer
usually causes an additional physical disk access when the filtering criterion is different. Although
the database filter can improve the system performance by a factor of 1.24 in the single-user
environment without the data sharing, it degrades the system performance in the case of the data
sharing. The database filter is not recommended to use in the database system with the database
buffer.

Table 4. The bufferhitrakein executionof QueryII (search the indexfile)


Number of Level I data sharing Level2 data sharing Level3 data sharing
concurrent .-
queries H Hdf H Hdl H Hdl
n=l 0.43%36 0.422045
O.oooooO
n=2 0.413135 0.402796 0.445061 0.427462 0.445061 0.427462
0.000888 O.ooo888 O.lxlO888
II=4 0.373280 0.370346 0.407506 0.398759 0.421986
0.441386
o.ooOoOO o.oOOOOO O.OWOO
0.422316
n=6 0.323431 0.322530 0.393610 0.382993 0.444765
O.ooo886 0.000000 0.000887
0.422641
n=8 0.250944 0.244400 0.361759 0.357190 0.433895
0.000222 O.OO@XXl 0.000887
Database filters and buffered relational database systems 109

5. CONCLUSIONS

This paper presents a queueing model and a simulation model for studying the effect of the
database filter on the buffered relational database system. Since the CPU speed is much faster than
the disk speed, both the queueing and simulation models show that using the database filter does
not improve the system performance in the multi-user environment. The simulation result also
shows that using the database filter can improve the system performance by a factor of 1.24 in the
best case, which occurs in a single-user environment without data sharing. When data sharing
exhibits in the query execution, the buffer hit rate may decrease since the filtered page needs to
be reaccessed in the near future. Therefore, the database filter indeed degrades the system
performance as shown by both the queueing model and simulation model. It is then concluded that
the database filter should not be used in the database system which has the database buffer.

REFERENCES

[I] E. Babb. Implementing a relational database by means of specialized hardware. zyxwvutsrqponmlkjihgfedcbaZYX


ACM Transact. Database Syst. 4(l),
1-29 (1979).
[2] F. Bancilhon, P. Richard et al. VERSO: The relational database machine. In Advanced Database M uchine Architecture
(Edited by D. K. Hsaio), Chap. 1, pp. I-18. Prentice-Hall, Englewood Cliffs, NJ (1983).
[3] J. Bane&e. D. K. Hsaio and K. Kannan. DBC-a database comnuter for verv. lame _ databases. IEEE Transact.
Cornput: C-28(6), 414429 (1979).
G. Gardarin, P. Bemadat, N. Temmerman, P. Valduriez and Y. Viemont. SABRE: The relational database system
for a multi~cropr~ssor machine. In Advanced Database machine Arch~fe~rure (Edited by D. K. Hsaio), Chap. 2,
pp. 19-35. Prentice-Hall, Englewood Cliffs, NJ (1983).
Ii. L&Ii&, G. Sties and H. A. Zeidler. Search processor for data base management systems. In Proc. 4th Inc. Co&
on Very Large Data Bases, Berlin, Germany, pp. 280-287 (1978).
Y.-D. Liu and K.-Y. Su. MRDBM-a multi-user relational database machine. J. Chin. Inst. Enar 14(l). 79-95 (1991).
S. B. Yaa, F. Tong and Y. Z. Sheng. The system architecture of a data base machine (DBM). IEEE Database ‘Engng
Bull. 4(2), 53-62 (1981).
J.-J. Fan and K.-Y. Su. The design of a DFSA based pattern matcher. J. Chin. Inst. Engr 14(3), 325-331 (1991).
S. Gamerman and M. School. Hardware versus software data filtering: the VERSO experience. In lfrh Int. Workshop
on Dutabase machines, Grand Bahama Island, pp. 110-136 (1985).
R. Gonzalez-Rubio, J. Rohmer and D. Terral. THE SCHUSS FILTER: A processor for non-nume~cal data
processing. In Proc. Filth Zni. Symp. on Computer Architecture, Ann Arbor, MI, pp. 64-73 (1984).
E. D. Lazowska, J. Zahojan, G. S. Graham and K. C. Sevcik. Quantitative System Performance-Computer System
Analy sis Using Queueing Nefwork M odels. Prentice-Hall, Engfewood Cliffs, NJ (1984).
K. S. Trivedi. Probability and Staiistics with Reliability , Queueing, and Computer Science Applications. Prentice-Hall,
Englewood Cliffs, NJ (1981).
H. Boral and D. J. Dewitt. Database machines: An idea whose time has passed? A critique of the future of database
machine. In Znt. Workshop on Database M achines, Munich, pp. 166-187 (1983).
W. Effelsberg and T. Haerder. Principles of database buffer management. ACM Transact. Database Syst. 9(4). S60- 595
(1984).
Microscience. FH-3000 x Series Product Manuat. Microscience International Corporation, Taipei, Taiwan (1991).
B. Sal&erg. FZLE STRUCTURES an Analy tic Approach. Prentice-Hall, Englewood Cliffs, NJ (1977).
Motorola. MC68020 32-bits Microprocessor User’s Manual. Prentice-Hall, Englewood Cliffs, NJ (1984).
H. Boral and D. J. Dewitt. A methodology for database system performance evaluation. In Proc. Znt. Co& of
M unagement of Data, Boston, MA, pp. 176-185 (1984).
D. Bitton and D. J. Dewitt. Benchmarking database systems a systematic approach. In 9th Znt. Conf. on Very Large
Darn Base, Florence, Italy, pp. 8-19 (1983).
Y.-S. Hwang. The Simulation and Performance Analysis of A Multi-user Relational Database Machine. Ms.
dissertation, Department of E. E. National Tsing Hua University, Hsinchu, Taiwan, R.O.C. (1989).
P. J. Bickel and K. A. Doksum. ~athemut~caf Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco
(1977).

You might also like