Qos Management of Real-Time Data Stream Queries in Distributed Environments
Qos Management of Real-Time Data Stream Queries in Distributed Environments
Environments
Abstract size of the intermediate results and final query results change,
even when the input volume remains static. In distributed en-
Many emerging applications operate on continuous un- vironments, the changing intermediate result size also affects
bounded data streams and need real-time data services. the communication cost (CPU time and network bandwidth)
Providing deadline guarantees for queries over dynamic associated with intermediate result propagation.
data streams is a challenging problem due to bursty stream To address these issues, we proposed a prediction-based
rates and time-varying contents. This paper presents a QoS management algorithm [19], which uses online profiling
prediction-based QoS management scheme for real-time and sampling to estimate the cost of the queries on dynamic
data stream query processing in distributed environments. streams. The profiling process is used to calculate the aver-
The prediction-based QoS management scheme features age cost (CPU time) for each operator to process one data
query workload estimators, which predict the query work- tuple. The sampling process is used to estimate the selectiv-
load using execution time profiling and input data sampling. ity of the operators in a query. In this paper, we have used
In this paper, we apply the prediction-based technique to se- this prediction-based QoS management in a distributed en-
lect the proper propagation schemes for data streams and vironment to select the proper propagation schemes for data
intermediate query results in distributed environments. The streams and intermediate query results. To our best knowl-
performance study demonstrates that the proposed solution edge, this is the first work that predicts query workload and
tolerates dramatic workload fluctuations and saves signifi- used workload predictions to control query QoS in a dis-
cant amounts of CPU time and network bandwidth with little tributed environment.
overhead. The rest of the paper is organized as follows: Section 2
gives an overview of the system model and our assumptions.
Section 3 describes the prediction-based QoS management
scheme. In section 4, we describe how this scheme can be
1. Introduction
used in a distributed environment. Section 5 presents the
performance evaluation and experimental results. In section
Many applications need to operate on continuous un- 6, we discuss the related work and section 7 concludes the
bounded data streams. The streaming data may come from paper.
sensor readings, internet router traffic trace, telephone call
records or financial tickers. Many of these new applications
have inherent timing constraints in their tasks. However, due 2. System Model and Assumptions
to the dynamic nature of data streams, the queries on data
streams may have unpredictable execution cost. First, the A data stream is defined as a real-time, continuous, or-
arrival rate of the data streams can be unpredictable, which dered (implicitly by arrival time or explicitly by timestamps)
leads to variable input volumes to the queries. Second, the sequence of data items [8]. A Data Stream Management
content of the data streams may vary with time, which causes System (DSMS) is a system especially constructed to pro-
the selectivity (Sel) of the query operators to change over cess persistent queries on dynamic data streams. DSMSs
time. The selectivity of an operator is defined as: are different from traditional database management systems
Sel = size(output)/size(input) (DBMSs) in that traditional database management systems
It measures the fraction of data input that passes the cur- expect the data to be persistent in the system and the queries
rent operator to the next. As operator selectivity varies, the to be dynamic whereas DSMSs expect dynamic unbounded
data streams and persistent queries. Due to the high volume Final
Results
ket analysis, have brought research related to data streams in Range Relation
focus. These applications inherently generate data streams Window
Op
and DSMSs are well suited for such applications. Relation Lanes
Table 1. Operator Cost and Dependency on The costs Cp and Ci are expected to be fairly constant for a
Selectivity and Synopsis Size particular set of predicates.
the traffic simulator as the system still gets some information 3.1.2 Join Operation Cost Analysis
about the traffic.
For join operations, Symmetric Hash Join (SHJ) [21, 10] is
used. SHJ works by keeping a hash table for each input in
3. Prediction-based QoS Management memory. When a tuple arrives, it is inserted in the hash table
for its input and it is used to probe the hash table for the other
Three parameters are needed for each query to perform input. This probing may generate join results which are then
prediction-based QoS management, namely, the input data inserted in the output buffer. The following notations are
stream volume, the operator selectivity and the execution used for a join operator Ojoin :
time per data tuple for each operator. Since our approach
• the left and right input volume, nL and nR
only considers queries that are ready to execute, the input
data volumes are known.
• the selectivity of the operator, s
3.1. Query Execution Time Estimation • the execution time to probe the left and right hash in-
dices, CLP robe and CRP robe
Table 1 shows the average execution time per tuple of the
operators and their dependency on the current operator se- • the execution time to hash left and right input, nLHash
lectivity and synopsis (e.g. indices) size. The prediction- and nRHash
based QoS management scheme assumes that the incoming
• the execution time to insert the output tuple to buffer,
data streams, the intermediate results and the accessory data
Ci
structures are all stored in memory. Hence the time taken for
one operator to process a data tuple can be estimated effec- For the join operator Ojoin :
tively without considering any additional overhead of access-
ing data from the disk. Here, we briefly discuss the analysis
The number of output tuples = nL × nR × s
for selection and join operators. The reader is referred to [19]
The time for processing left input tuples
for a detailed discussion.
= nL × CRP robe + nL × CLHash
The time for processing right input tuples
3.1.1 Selection Operation Cost Analysis = nR × CLP robe + nR × CRHash
The time for inserting output tuples = nL × nR × s × Ci
The following notations are used for a selection operator The total time
Osel : = nL × (CRP robe + CLHash ) + nR × (CRP robe
+CLHash ) + nL × nR × s × Ci
• the input tuple volume, n
Of the three type of cost factors, hashing cost (CLHash
• the selectivity of the operator, s and CRHash ) and insertion cost (Ci ) are much smaller than
the probing cost (CLP robe and CRP robe ) and generally re-
• the execution time to evaluate the predicates, Cp main constant for a particular system. The probing cost,
however, depends on the contention rate of the hash index,
• the execution time to insert the output tuple to buffer, which in turn depends on the input data volume and allocated
Ci index size.
3.1.3 Cost Constant Calculation Using Profiling example of convex QoS curve) drops slowly with input drop-
ping ratio. A query with convex QoS curve can still calculate
The exponential smoothing algorithm is chosen to give rel-
the average value reasonably well when a small percentage
atively higher weights on recent observations in forecasting
of input data tuples is dropped. The quality of Q1 (an exam-
than the older observations. Suppose that after a query in-
ple of linear QoS curve) drops linearly as input data dropping
stance is executed, the value for the cost parameter C is com-
ratio. On the other hand, Q2 (an example of concave QoS
puted to be Cnew , then C is updated using the single expo-
curve) is the opposite of Q0 as it can not tolerate dropping
nential smoothing formula:
any input data tuples. As an example, if the QoS of the target
C = C × (1 − α) + Cnew × α 0<α<1 system is set to 70%, it translates into dropping 75%, 30%
and 3% of the input data for Q0, Q1 and Q2 respectively.
3.2. Selectivity Estimation Using Sampling These ratios are called drop ratios and denote the fraction of
input tuples dropped.
In the prediction-based approach, a sampler query plan is The algorithm for inter-query QoS negotiation is an it-
constructed for every query in the system. The query plans erative process which keeps dropping the query QoS from
for the sampler queries are exactly the same as their corre- 100% and calculates the cost of all active queries [19].
sponding real queries’ plans. When a query instance is re- The query QoS for which all the queries finish their execu-
leased to the scheduler, the sampler is executed first with tion before their deadline is chosen and the corresponding
sampled data tuples from the input. The data tuples are sam- pseudo-deadlines are calculated for the queries. The pseudo-
pled from the real input according to preset sample ratio Sr . deadlines are assigned proportional to the estimated time of
The sampling process selects a simple random sample with- the queries.
out replacement. The execution of this sampler query plan is
used to estimate the selectivity and hence the execution time 3.4. Intra-Query QoS Refinement
of the operators in the query plan.
Quality Q1
In inter-query QoS management, every query instance is
Q0
Input % Quality
0.9 0.98
Input % Quality
0.9 0.9
assigned a pseudo-deadline, based on the estimated execu-
1.0
0.8
0.7
0.96
0.95
0.8
0.7
0.8
0.7
tion time for the query. The query instances now perform
0.6
0.5
0.92
0.87
0.6
0.5
0.6
0.5 intra-query refinement to fulfill their pseudo-deadlines in-
Q0
Current
QoS
0.4
0.3
0.82
0.75
0.4
0.3
0.4
0.3 stead of their actual deadlines. Before a query starts execut-
0.2 0.65 0.2 0.2
0.1 0.5 0.1 0.1 ing, it drops a fraction of the input data if the estimated ex-
Q1
Input/Quality Tables
Q2
Input % Quality
ecution time of the query exceeds the pseudo-deadline. The
0.9
0.8
0.45
0.3
data tuples are dropped early in the query plan as doing so
Q2
0.7
0.6
0.2
0.14 yields the best system utility [1]. The scheme used for deter-
0.5 0.09
0.4 0.05 mining the drop amounts is discussed in [19]. Furthermore,
0.3 0.03
Input Data Drop
0.2
0.1
0.02
0.01
the progress of the query is monitored periodically to ensure
0
0.03 0.3 0.75 1.0
Ratio
that the query meets its pseudo-deadline. If the query is run-
ning late, data tuples are dropped during execution to ensure
Figure 2. Inter-Query QoS Management with that the query meets its deadline. Finally, for each query, the
Query Quality Curves total number of tuples dropped during its execution are cal-
culated. Then, the input/quality table for the query is used to
find the quality of the query result. This query quality is used
3.3. Inter-Query QoS Management as a measure of system performance. The ultimate objective
is to maximize the average quality of the query results.
Inter-query refinement is performed to ensure that all
query instances get a fair chance to meet their deadline. A
pseudo-deadline is assigned to the queries which are ready 4. QoS Management in a Distributed Environ-
to execute. This pseudo-deadline is based on the estimated ment
execution time and the input/quality table for each query.
The input/quality table is a user-defined table which maps The prediction-based approach performs very well in
the fraction of input tuples used to the quality of the query maintaining query QoS in a centralized environment [19].
results. In this section, we discuss our dynamic data stream propaga-
As illustrated in Figure 2, three queries, Q0, Q1 and Q2, tion algorithm in distributed DSMSs. In distributed DSMSs,
have different requirements in terms of maintaining query one of the key problems is to reduce data stream propagation
quality when system is overloaded. As reflected by the in- cost. Data stream propagation not only consumes network
put/quality table and the curve, the query quality of Q0 (an resources, but also consumes a fair amount of CPU time.
Node B
Join
Op2
Selection or Join
Projection Op1
Node A
Transmitting Op0 Data Stream S2
Unprocessed
Data Streams
Relation R
Data Stream S1 Data Stream S1
Replica
Node B
Join
Node A Op2
From a previous study [11], the network operation costs as ator at remote node for operators like the join operator Op1.
much as 35% of CPU time. However, this cost can be re- The sampler operator samples a small number of data tuples
duced if the remote node has the ability to choose whether from the incoming data streams and processes the join op-
to transmit unprocessed data streams or to process these data eration. If the output size is significantly smaller than the
streams and transmit the intermediate results (which are gen- input size, the data stream source node should process the
erally smaller in size). join operator locally and transmit the intermediate results to
Suppose Node B contains a data stream query that oper- the remote node.
ates on a local stream S2, a local relation R and a remote
stream S1 from node A. As shown in Figure 3, new data
tuples in S1 are sent to node B when they arrive at node A
and the data stream S1 is replicated at node B. One sim-
ple optimization is to move the obviously selective operators
such as Op0 to the remote node A and execute these opera- Sampled Selectivity > 0.5
97.5% Confidence
tors at the remote node. As shown in Figure 4, the operator
Op0 can be moved to node A to reduce the network work-
0.36 0.5 0.64
loads. We call these obviously selective operators, filters, as Selectivity
Relation R
Replica
Data Stream S1
Intermediate Results Data Stream S2
Parameter Value
tivity values from the random sampling follow the binomial Packet Sending Overhead 5 Microseconds
distribution. A binomial distribution with sample size n and Packet Receiving Overhead 10 Microseconds
success probability p approximates normal distribution for Tuples per packet 10
Sampler Operator Overhead 3 Microseconds per tuple
large n and p not too close to 1 or 0 (statistics [14] recom- Data Stream Rate 1000 tuples/sec
mends using this approximation only if np and n(1 − p) are Segment Size 1 second
Significance Level 15%
both at least 5. Otherwise, a continuity correction should
be applied). Since we sample 50 times out of the pool, the Table 2. Distributed DSMS Simulation Settings
distribution of the sample selectivity mean approximates the
normal distribution
√ with mean 0.5 and standard deviation
δ = 0.5/ 50 = 0.0707. As shown in Figure 6, according proper transmission strategy is determined accordingly. As
to the normal distribution property, if the sampled selectivity shown in Figure 7, at the receiving node, unprocessed data
is higher than 0.5 + 0.0707 ∗ 2 = 0.64, the current opera- stream segments are processed first and then the results are
tor selectivity is higher than 0.5 with 97.5% probability. In assembled together with the received intermediate results.
terms of hypothesis testing, the hypothesis that the selectivity The data transmitted needs to be marked up properly so that
value is less than or equals to 0.5 is called the null hypothe- it can be assembled correctly.
sis and the region where the selectivity value is higher than
0.64 is called the rejection region. This is because, if the 5. Performance Evaluations
sampled selectivity falls in that region, the null hypothesis is
very unlikely and so it can be rejected. As the confidence We conducted a set of experiments to test the performance
value increases, the rejection region shrinks and vice versa. of the prediction-based algorithm in reducing the network
workloads. In order to evaluate the performance of our ap-
Node B proach, we developed a Java simulator based on hypothesis
Join
testing. The simulation settings are shown in Table 2. The
Op2 overheads for sending and receiving a packet are set as 5 and
Stream 10 microseconds. According the Linux kernel research re-
Segments
Assembly
port [7], in Linux kernel 2.6.9, it takes 6 microseconds to
Intermediate
Results
+
Data Stream S2
send a packet and 17 microseconds to receive a packet using
From Node A TCP/IP. The overheads are reduced in the simulation settings
to reflect the technology advance. Each packet is set to con-
Selection or Join
Op1
tain 10 data tuples. As the experimental results shown in
Projection
Op0 Table 1, the sampler operator overhead is set at 3 microsec-
Unprocessed onds per tuple for join operator. The data stream arrival rate
Data Stream
From Node A Relation R is set at 1000 tuples per second and each data stream sam-
Data Stream S1
Replica
pling segment contains tuples within one second. The sig-
nificance level for the hypothesis test used in the algorithm
Figure 7. Assembly Data Stream Segments is set at 15%. A significance level of 15% means that the
algorithm chooses to transmit intermediate results instead of
unprocessed data streams only when the sampled results in-
To be able to switch between transmitting unprocessed dicate that the intermediate results are smaller than the input
data streams and transmitting processed intermediate results, with 85% or higher confidence. To evaluate the effectiveness
the receiving node (in our example, node B) has to be able of the proposed algorithm, we also show the performance of
to accept both types of data and use them to process stream the ideal algorithm. The ideal algorithm is marked as the Or-
query together. This is implemented by dividing the data acle algorithm, which always uses the best propagation strat-
stream to segments by either timestamps or sequence num- egy but incurs no overhead. In this section, the results shown
bers of data tuples. Each segment is sampled separately and in the graph are based on at least 10 simulation runs and the
95% confidence interval is within 5% of the value shown in time used by the prediction-based algorithm to the CPU time
the graph. The confidence intervals have been omitted in the used to transmit unprocessed data streams. As we can see
figures to improve readability. in Figure 9, when the join operator selectivity is less than
0.5, the prediction-base algorithm saves significant amount
Network Traffic (Prediction-Based Algorithm / Unprocess Data Stream Replication) )
of CPU time. When the selectivity is higher than 0.5, the
1 prediction-based algorithm costs more CPU time due to the
sampling overhead and rare mispredictions. When sampling
0.8
at 1%, the prediction-based algorithm works pretty well con-
Traffic Ratio
0
0 0.2 0.4 0.6 0.8 1 6. Related Work
Selectivity
0.8
still guaranteeing the adequate answer precisions. The D-
0.6 Sampling Ratio = 1%
Sampling Ratio = 5% CAPE paper [11] discusses the challenges for distributed
Oracle
0.4 data stream management systems and proposes dynamic
0.2 adaptation techniques to alleviate the uneven workloads in
0
the distributed environments. In this paper, we propose the
0 0.2 0.4 0.6 0.8 1 idea of just-in-time sampling to estimate the output size of
Selectivity
query operators and use the estimation results to control the
Figure 9. CPU Overhead Reduction intermediate query result propagation strategy. Compared to
the algorithm for filter operators discussed in [15], our tech-
The network traffic results are shown in Figure 8. The nique handles the join operators, which may have bigger out-
ratio shown in the figure denotes the ratio between the net- put volume than their input volume.
work traffic volume generated by prediction-based algorithm Selectivity estimation has been one of the most stud-
to that generated by unprocessed data stream replication. We ied problem in database community as the query optimiz-
assume that the output tuples of the join operator are twice in ers use the estimation results to determine the most effi-
size as compared to that of the data stream input. As a result, cient query plans. Sampling [9], histogram [16, 17], index
the network traffic volume can only be reduced when the join trees [3, 6], and discrete wavelet transform [18] are the most
selectivity is less than 0.5. As shown in Figure 8, with sam- widely used selectivity estimation methods. Sampling has
pling rate as low as 1% and 5%, the prediction-based algo- been used extensively in tradition database systems [9, 2, 4].
rithm can save significantly when the join selectivity is less Sampling gives a more accurate estimation than paramet-
than 0.5. In fact, in both cases, the algorithm performs very ric and curve fitting methods used in traditional DBMS and
close to the ideal case shown by the Oracle algorithm. As provides a good estimation for a wide range of data types
expected, the algorithm performs better in terms of network [2]. Furthermore, since no data structure is maintained in
traffic volume when the sampling ratio is higher at 5%. As sampling-based approaches as opposed to histogram-based
the selectivity moves higher than 0.5, the algorithm generates approaches, we do not need to worry about the overhead of
slightly more network traffic. With sampling ratio at 1%, the constantly updating and maintaining the data structure. This
algorithm produces 5% more network traffic when selectiv- is a very important point in the context of data streams as the
ity is 0.6. It is caused by the occasional mispredictions as a input rate of the streams is constantly changing. Prediction-
result of low sampling ratio. The problem disappears when based QoS management [19] is the first work to consider
the sampling ratio is set at 5%. a sampling-based approach to estimate data stream query
The total CPU time used by the algorithm is shown in Fig- workload and use those results to manage the query QoS.
ure 9. The total CPU time shown in the figure contains both To the best of our knowledge, this is the first work which
the network packet sending/receiving overhead and sampler uses this approach for prediction-based QoS management in
operator execution overhead. The ratio shown is the CPU distributed environments.
7. Conclusions and Future Work stream processing. In the First Biennial Conference on Inno-
vative Database Systems (CIDR), 2003.
In this paper, we describe the prediction-based QoS man- [6] D. Comer. The ubiquitous b-tree. In Computing Surveys,
agement algorithm for distributed environments where we 1979.
apply dynamic sampling to determine the proper data stream [7] J. Demter, C. Dickmann, H. Peters, N. Steinleitner, and X. Fu.
propagation strategy. Our simulation results show that ad- Performance analysis of the tcp/ip stack of linux kernel 2.6.9.
justing data transmission strategy using prediction results Technical report, University of Goettingen, 2005.
significantly reduces the communication cost with minimal [8] L. Golab and M. Ozsu. Issues in data stream management.
amount of CPU overheads. There are a couple of ways to SIGMOD Record, 32(2), 2003.
extend this work. First, more research is needed to provide
[9] P. Haas, J. Naughton, and A. Swami. On the relative cost of
solutions for the scenario where the selectivity of join op- sampling for join selectivity estimation. In PODS ’94: Pro-
erators is very small. It is a known problem that sampling ceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART
yields high relative error rate when dealing with query opera- symposium on Principles of database systems, pages 14–24,
tors with small selectivities [4]. The high relative estimation New York, NY, USA, 1994. ACM Press.
errors cause the sampling algorithm to select less optimal [10] W. Hong and M. Stonebraker. Optimization of parallel
propagation strategies, thus compromising the performance query execution plans in xprs. In Distributed and Parallel
of the proposed algorithm. Another way to extend the current Databases, 1993.
work is to combine the operator history with the estimation.
[11] B. Liu, Y. Zhu, M. Jbantova, B. Momberger, and E. Run-
The selectivity history of the query operators provides valu- densteiner. A dynamically adaptive distributed system for
able clues about the future operator selectivity. One of the processing complex continuous queries. In Very Large Data
ideas is to utilize the volatility of operator selectivity, i.e., Bases (VLDB), 2005.
using sampling only on those operators with volatile selec-
[12] Y. Liu and B. Plale. Query optimization for distributed data
tivities. In this way, sampling overhead on those less volatile streams. In 15th International Conference on Software Engi-
operators can be saved. Lastly, evaluating this approach on a neering and Data Engineering (SEDE’06), 2006.
prototype system would be very valuable for the performance
[13] M. Mehta. Design and implementation of an interface for the
evaluation of the algorithm. As the performance study in this
integration of DynaMIT with the traffic management center.
paper is via simulations, it is desirable to develop a prototype Master’s thesis, MIT, 2001.
for a distributed DSMS and carry out detailed experiments
to study the overhead associated with the prediction-based [14] S. Milton and J. Arnold. Introduction to Probability and
Statistics: Principles and Applications for Engineering and
algorithm proposed in this paper.
the Computing Sciences. McGraw-Hill, 4th edition, 2003.
[15] C. Olston, J. Jiang, and J. Widom. Adaptive filters for con-
Acknowledgments tinuous queries over distributed data streams. In the ACM Intl
Conf. on Management of Data (SIGMOD), 2003.
The work was supported in part by NSF IIS-0208758,
[16] V. Poosala. Histogram-based estimation techniques in
CCR-0329609, and CNS-0614886. databases. PhD thesis, Univ. of Wisconsin-Madison, 1997.
[17] V. Poosala and Y. Ioannidis. Selectivity estimation without the
References attribute value independence assumption. In 23rd Int. Conf. on
Very Large Databases, Aug 1997.
[1] B. Babcock, M. Datar, and R. Motwani. Load shedding for
[18] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Nu-
aggregation queries over data streams. In Intl. Conference on
merical Recipes in C, The Art of Scientific Computing. Cam-
Data Engineering (ICDE), 2004.
bridge University Press, 1996.
[2] D. Barbara, W. DuMouchel, C. Faloutsos, P. Hass, J. Heller-
[19] Y. Wei, V. Prasad, S. H. Son, and J. A. Stankovic. Prediction-
stein, Y. Ioannidis, H. Jagadish, T. Johnson, R. Ng, V. Poosala,
based QoS management for real-time data streams. In Proc.
K. Ross, and K. Sevcik. The new jersey data reduction report.
27th IEEE Real-Time Systems Symposium (RTSS 06), Dec.
Technical report, Bulletin of the Technical Committee on Data
2006.
Engineering, 1997.
[3] R. Bayer and E. McCreight. Organization and maintenance of [20] Y. Wei, S. H. Son, and J. A. Stankovic. RTSTREAM: Real-
large ordered indices. In Acta Informatica, 1972. time query for data streams. In 9th IEEE International Sympo-
sium on Object and component-oriented Real-time distributed
[4] S. Chaudhuri, G. Das, M. Datar, R. Motwani, and Computing (ISORC), Apr. 2006.
V. Narasayya. Overcoming limitations of sampling for ag-
gregation queries. In ICDE, pages 534–542, 2001. [21] A. Wilschut and P. Apers. Dataflow query execution in a par-
allel main-meory environment. In PDIS, 1991.
[5] M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney,
U. Cetintemel, Y. Xing, and S. Zdonik. Scalable distributed