0% found this document useful (0 votes)
24 views12 pages

Optimizing Error-Bounded Lossy Compression For Scientic Data by Dynamic Spline Interpolation

This paper presents a novel error-bounded lossy compression method for scientific data using dynamic spline interpolation, aimed at addressing the challenges of storing and transferring large datasets generated by scientific simulations. The proposed method significantly improves compression quality and efficiency compared to existing techniques, achieving higher compression ratios while maintaining acceptable data fidelity. The authors provide a thorough evaluation demonstrating that their solution outperforms the current best-in-class compressors across various scientific domains.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Optimizing Error-Bounded Lossy Compression For Scientic Data by Dynamic Spline Interpolation

This paper presents a novel error-bounded lossy compression method for scientific data using dynamic spline interpolation, aimed at addressing the challenges of storing and transferring large datasets generated by scientific simulations. The proposed method significantly improves compression quality and efficiency compared to existing techniques, achieving higher compression ratios while maintaining acceptable data fidelity. The authors provide a thorough evaluation demonstrating that their solution outperforms the current best-in-class compressors across various scientific domains.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Optimizing Error-Bounded Lossy Compression for

Scientific Data by Dynamic Spline Interpolation


Kai Zhao,∗ Sheng Di,† Maxim Dmitriev,‡ Thierry-Laurent D. Tonellot,‡ Zizhong Chen,∗ and Franck Cappello†§

University of California, Riverside, CA, USA
† Argonne National Laboratory, Lemont, IL, USA
‡ EXPEC Advanced Research Center, Saudi Aramco, Saudi Arabia
§ University of Illinois at Urbana-Champaign, Champaign, IL, USA

[email protected], [email protected], [email protected],


[email protected], [email protected], [email protected]

Abstract—Today’s scientific simulations are producing vast To this end, error-bounded lossy compression techniques
volumes of data that cannot be stored and transferred efficiently [5]–[8] have been developed for several years, and they have
because of limited storage capacity, parallel I/O bandwidth, and been widely recognized as an optimal solution to tackle the
network bandwidth. The situation is getting worse over time
because of the ever-increasing gap between relatively slow data scientific big data problem. For example, many researchers [2],
transfer speed and fast-growing computation power in modern [9], [10] have verified that the reconstructed data through error-
supercomputers. Error-bounded lossy compression is becoming bounded lossy compressors is totally acceptable for users’
one of the most critical techniques for resolving the big scientific postanalysis. Many successful stories also show that error-
data issue, in that it can significantly reduce the scientific data bounded lossy compressors not only can significantly reduce
volume while guaranteeing that the reconstructed data is valid for
users because of its compression-error-bounding feature. In this the storage space but also may substantially improve the data-
paper, we present a novel error-bounded lossy compressor based moving performance or reduce the memory footprint, with
on a state-of-the-art prediction-based compression framework. user-acceptable data distortions. For instance, Wu et al. [11]
Our solution exhibits substantially better compression quality showed that a customized error-bounded lossy compressor
than do all the existing error-bounded lossy compressors, with can increase 2∼16 qubits for general quantum circuits while
comparable compression speed. Specifically, our contribution is
threefold. (1) We provide an in-depth analysis of why the best- maintaining 0.985 output fidelity on average. In particular, the
existing prediction-based lossy compressor can only minimally memory requirement of simulating the 61-qubit quantum com-
improve the compression quality. (2) We propose a dynamic spline puting simulation (Grover’s search algorithm) can be lowered
interpolation approach with a series of optimization strategies from 32 exabytes to 768 terabytes of memory on Argonne’s
that can significantly improve the data prediction accuracy, Theta supercomputer using 4,096 nodes. Liang et al. [12]
substantially improving the compression quality in turn. (3) We
perform a thorough evaluation using six real-world scientific showed that an error-bounded lossy compressor can improve
simulation datasets across different science domains to evaluate the overall I/O performance by 60X, with no degradation of
our solution vs. all other related works. Experiments show that visual quality on the reconstructed data. Kukreja et al. [13]
the compression ratio of our solution is higher than that of the showed that using error-bounded lossy compression can get
second-best lossy compressor by 20%∼460% with the same error high compression ratios without significantly impacting the
bound in most of the cases.
convergence or final solution of the full waveform inversion
solver.
I. I NTRODUCTION
The SZ compression library has been recognized by in-
With the ever-increasing execution scale of today’s scientific dependent assessments [7], [10], [14] as the best-in-class
simulations, vast scientific data are produced at every simula- error-bounded lossy compressor for scientific datasets, espe-
tion run. Climate simulation [1], for example, can generate cially because it has gone through many careful optimiza-
hundreds of terabytes of data in tens of seconds [2]. A tions. The first release of SZ (SZ0.1∼1.0) [6] proposed the
cosmology simulation, such as Hardware/Hybrid Accelerated prediction-based lossy compression framework combining a
Cosmology (HACC) [3] can produce dozens of petabytes of one-dimensional curve-fitting method and a customized Huff-
data when it performs an N-body simulation with up to several man encoding algorithm. SZ1.4 [7] improved the compression
trillion particles. Such a vast amount of scientific data needs quality significantly by extending its curve-fitting method
to be stored in a parallel file system for postanalysis, creating to multidimensional prediction and adopting a linear-scale
a huge challenge to the storage space. Many scientists also quantization. SZ2.0 [8] further improved the prediction ac-
need to share the large amounts of data across different sites curacy by adopting the linear regression method especially
(i.e., endpoints) through a data-sharing web service (such for high-compression cases. The compression quality also was
as the Globus toolkit [4]) on the Internet. Thus, the ability improved by integrating ZFP [5] as a candidate predictor [15]
to significantly compress extremely large scientific data with in the SZ framework and by adopting parameter autotuning
controlled data distortion is critical to today’s science work. and second-order prediction algorithms [16].
One important question is whether we still have room to Unlike traditional lossy compression techniques (such as
further improve the error-bounded lossy compression quality Jpeg2000 [21]) that were designed mainly for image data,
significantly for scientific datasets. In fact, the critical chal- error-bounded lossy compression not only can get a fairly
lenge in the current design of SZ comes from its linear- high compression ratio (several dozens, hundreds, or even
regression prediction method, which has two significant draw- higher) but also can guarantee the reconstructed data valid
backs. On the one hand, it suffers from limited accuracy in for the scientific postanalysis in terms of the user-predefined
predicting nonlinear varied datasets. Many scientific simula- compression error bound. Error-bounded lossy compression
tions (such as seismic wave simulation [17] and quantum can be castegorized as higher-order singular value decompo-
chemistry research [18]) , however, may produce a vast amount sition (HOSVD)-based models, transform-based models, and
of data with nonlinear features, such that SZ cannot work very prediction-based models.
effectively on them. On the other hand, the linear-regression- • The HOSVD-based model (such as TTHRESH [22] and
based prediction needs to store several coefficients (e.g., four ThuckerMPI [23]) leverages Tucker decomposition [24]
coefficients per block for 3D compression) in each block to reduce the dimensions in order to lower the data
of data, introducing significant overhead especially when a volume significantly. Such a method may get fairly
relatively high compression ratio is required by users. high compression ratios, especially for high-dimensional
In this paper, we propose a novel, efficient lossy com- datasets, but may suffer from fairly low compression
pression method based on the state-of-the-art SZ compression speed [22] that is not practical for the applications con-
model. Our contribution is threefold. sidered in this paper.
• We provide an in-depth analysis of the latest version of • The transform-based model involves several steps: data
SZ and identify a significant drawback of its prediction preprocessing, lossless data transform, and lossy encod-
method; the analysis also sheds light on the solution. ing. ZFP [5], for example, splits the data into multiple
• We propose a novel efficient approach by multidimen- small blocks (4k , where k is the number of dimensions of
sional spline interpolation that can significantly improve the dataset) and then performs in each block the exponent
the prediction accuracy especially for datasets with non- alignment, orthogonal transform, and embedded encod-
linear data variation features. We further propose dynamic ing. Other transform-based compressors include VAPOR
optimization strategies to improve the overall compres- [25] and SSEM [9].
sion quality while reducing the inevitable overhead as • Compared with transform-based compressors, the
much as possible. prediction-based model first predicts data values for each
• We perform a comprehensive assessment of our solution data point based on local regions and then quantizes
versus five state-of-the-art error-controlled lossy compres- the predicted values to a set of integer numbers that
sors, using multiple real-world simulation datasets across will be further compressed by lossless compression
different scientific domains. Experiments show that our techniques. SZ [8], for example, splits the whole dataset
solution improves the compression ratio by 20%∼460% into many small equal-sized blocks (e.g., 6x6x6 for 3D
over the second-best compressor with the same error datasets) and predicts the data points in each block by a
bound and experiences no degradation in the postanalysis hybrid method (either Lorenzo [26] or linear regression),
accuracy. followed by a customized Huffman encoding and other
The rest of the paper is organized as follows. In Section II lossless compression techniques such as Zstd [20].
we discuss the related work. In Section III we formulate the In our work, we choose the prediction-based model because
research problem. In Section IV we offer an in-depth analysis SZ has been recognized as the leading compressor in the sci-
of the pros and cons of the current SZ design, information entific data compression community. In fact, how to leverage
that is fundamental to the subsequent optimization efforts. In SZ to improve compression quality has been studied for two
Section V we describe our solution and detailed optimization years. Tao et al. [27] developed a strategy that can combine
strategies. In Section VI we present the evaluation results SZ and ZFP to optimize the compression ratios based on a
compared with five other state-of-the-art lossy compressors more significant metric, peak signal-to-noise ratio (PSNR).
using real-world applications. In Section VII we conclude with Liang et al. [15] further analyzed the principles of SZ and
a brief discussion of the future work. ZFP and developed a method integrating ZFP into the SZ
compression model, which can further improve the compres-
II. R ELATED W ORK sion quality. Zhao et al. [16] proposed to adopt second-order
Data compression is becoming a critical technique for Lorenzo+regression in the prediction method and developed an
storing vast volumes of data by high-performance computing autotuning method to optimize the parameters of SZ. Liang et
scientific simulations. Lossless compressors such as Zlib [19] al. [28] accelerated the performance of MultiGrid Adaptive
and Zstd [20] suffer from very limited compression ratios (gen- Reduction of Data (MGARD) [29] and used SZ to compress
erally ∼2 or even less) since lossless compression techniques the nodal points generated by the MGARD framework [29],
rely on repeated byte-stream patterns whereas scientific data is which can improve the compression ratios significantly.
often composed of diverse floating-point numbers. Thus, lossy All these existing SZ-related solutions have to rely on the
compression for scientific data has been studied for years. linear regression prediction to a certain extent, a method in
which the compression quality is restricted significantly in where di and d0i are referred to the value of the ith data point
quite a few cases. In this paper, we first provide an in- in the original dataset D and the decompressed dataset D0
depth analysis of the cause of that reduced quality and then by the new compression solution, respectively. The notations
propose an efficient strategy to solve it. We perform a thorough sc (newsol.) and sd (newsol.) represent the compression speed
evaluation to compare our solution with five state-of-the- and decompression speed of the new solution, respectively,
art lossy compressors using real-world scientific applications. and sc (sz) and sd (sz) represent the compression speed and
Experiments show that our solution is superior to all existing decompression speed of the original SZ compressor, respec-
error-bounded compressors, with much higher compression tively. That is, we are trying to increase the compression
ratios than that of the second-best method. ratio with the same level of data distortion and comparable
compression/decompression performance compared with SZ
III. P ROBLEM F ORMULATION as a baseline (because SZ has been confirmed as a fairly fast
In this section we describe the research problem formu- lossy compressor in many existing studies [12], [28], [30]),
lation. Given a scientific dataset (denoted by D) composed In our evaluation in Section VI, not only do we present the
of N floating-point values (either single precision or double rate distortion results for many different datasets at different
precision) and a user-specified absolute error bound (e), the bit-rate ranges, but we also assess the impact of our lossy com-
objective is to develop an error-bounded lossy compressor pressor on the results of decompressed-data-based postanalysis
that can always meet the error-bounding constraint at each on one production-level seismic simulation research.
data point with optimized compression quality and comparable
IV. D EEPLY U NDERSTANDING THE P ROS AND C ONS OF SZ
performance (i.e., speed).
Rate distortion is arguably the metric most commonly used In this section, we first give a review of the current SZ
by the lossy compression community to assess compression design and then provide an in-depth analysis of a serious
quality. It can be converted to the commonly used statistical problem in the latest version of the SZ compressor (SZ2.1) [8].
data distortion metric known as normalized root mean squared Understanding this problem is fundamental to understanding
error, and it is a good indicator of visual quality. Rate why our new solution can significantly improve the compres-
distortion involves two critical metrics: peak signal-to-noise sion ratio.
ratio and bit rate. PSNR can be written as the following:
A. Review of SZ Lossy Compression Framework
Formula (1).
SZ2.1 [8], the latest version of SZ, has been recognized
PSNR = 20 log10 (vrange(D)−10log10 (mse(D,D0 )),, (1)
as an excellent error-bounded lossy compressor based on
where D0 is the reconstructed dataset after decompression (i.e., numerous experiments with different scientific applications by
decompressed dataset) and vrange(D) represents the value different researchers [15], [16], [27].
range of the original dataset D (i.e., the difference between SZ2.1 involves four stages during the compression: (1) data
its highest value and lowest value). Obviously, the higher the prediction, (2) linear-scale quantization, (3) variable-length
PSNR value is, the smaller the mean squared error, which encoding, and (4) lossless compression such as Zstd [20]. We
means higher precision of the decompressed data. briefly describe the four steps, and we refer readers to read
Bit rate is used to evaluate the compression ratio (the ratio our prior papers [7], [8] for technical details.
of the original data size to the compressed size). Specifically, • Step 1: data prediction. In this step, SZ predicts each
bit rate is defined as the average number of bits used per data point by its nearby data values. The prediction
data point in the compressed data. For example, suppose a methods differ with various versions (from 0.1 through
single-precision original dataset has 100 million data points; 2.1). For example, SZ 0.1∼1.0 [6] adopted a simple one-
its original data size is 100,000,000×4 bytes (i.e., about 400 dimensional adaptive curve-fitting method, which selects
MB). If the compressed data size is 4,000,000 bytes (i.e., a the best predictor for each data point from among three
compression ratio of 100:1), then the bit rate can be calculated candidates: previous-value fitting, linear-curve fitting, and
as 32/100 = 0.32 (one single-precision number takes 32 bits). quadratic-curve fitting. SZ1.4 [7] completely replaced the
Obviously, smaller bit rate means higher compression ratio. curve-fitting method by a multidimensional first-order
Two other important compression assessment metrics are Lorenzo predictor, significantly improving compression
compression speed (denoted by sc ) and decompression speed ratios by over 200% over SZ1.0. SZ2.0∼SZ2.1 further
(denoted by sd ). They are defined as the amount of data improved the prediction method by proposing a blockwise
processed per time unit (MB/s). linear regression predictor that can significantly enhance
In our research, we focus on the optimization of compres- compression ratios by 150%∼800% over SZ1.4, espe-
sion quality (i.e., rate distortion) with high performance, which cially for cases with a high compression-ratio (i.e., when
can be formulated as follows: the error bound is relatively large).
Optimize rate-distortion • Step 2: linear-scale quantization. In this step, SZ com-

subject to |di − d0i | ≤ e putes the difference (denoted ∆) between the predicted
(2) value (calculated in Step 1) and the original data value for
sc (newsol.) ≈ sc (sz)
sd (newsol.) ≈ sd (sz), each data point and then quantizes the difference ∆ based
on the user-predefined error bound (e). The quantization The basic idea is to use linear regression to construct a
bins are equal-sized and are twice as large as the error hyperplane in each block, such that the data inside the block
bound, such that the maximum compression errors must can be approximated by the hyperplane with minimized min
be controlled within the specified error bound. After this squared error (MSE), as illustrated in Fig. 1. In a 3D space,
step, all floating-point values are converted to integer for example, any hyperplane function can be represented as
numbers (i.e., quantization numbers), most of which are f (x, y, z) = β0 + β1 x + β2 y + β3 z. Porting the hyperplane
expected to be close to zero, especially when the data are function on each data point inside the block will produce a
fairly smooth in locality or the predefined error bound is system of equations. Letting the partial derivatives of these
relatively large. equations be 0, we get Formula (3), where ni refers to the
• Step 3: customized Huffman encoding. SZ was tailored block size along dimension i and fijk refers to the data value
an integer-based Huffman encoding algorithm to encode at the relative position (i,j,k) in the block. Based on this, we
the quantization numbers generated by Step 2. can construct the hyperplane with minimized MSE. The details
• Step 4: lossless compression. The last step in SZ is can be found in our prior work [8].
adopting a lossless compressor with a pattern-recognized  6 2Vx
algorithm such as LZ77 [31] to further improve the  β1 = n1 n2 n3 (n1 +1) ( n1 −1 − V0 )

 6 2Vy
n1 n2 n3 (n2 +1) ( n2 −1 − V0 )
 β =
compression ratios significantly. SZ initially chose Zlib 2
(3)
[19] but switched to Zstd [20] thereafter because Zstd is  β3 = n n n 6(n +1) ( n2V−1 z
− V0 )
 1 2 3 3 3
β0 = n1 nV02 n3 − ( n12−1 β1 + n22−1 β2 + n32−1 β3 )

much faster than Zlib.

n1 −1 n2 −1 n3 −1 n1 −1 n2 −1 n3 −1
B. Critical Features of SZ Compression Framework where V0 =
X X X
fijk , Vx =
X X X
i ∗ fijk ,
First, SZ is a very flexible compression framework, in which i=0 j=0 k=0 i=0 j=0 k=0

the data prediction is the most critical step. More accurate n1 −1 n2 −1 n3 −1 n1 −1 n2 −1 n3 −1


X X X X X X
data prediction will result in more uneven distribution of Vy = j ∗ fijk , Vz = k ∗ fijk .
i=0 j=0 k=0 i=0 j=0 k=0
quantization numbers with a large majority being close to zero.
Thus we have explored other more efficient predictors in the f(x,y)=β0 +β1 x + β2 y
past two years (from version 0.1 through the latest released
version 2.1, as well as a few recent prototypes [15], [16]).
Accordingly, we are still focused only on the data prediction
stage in this paper.
Second, SZ has to follow a necessary condition, in order
to guarantee that the compression errors are always within y
x
the user-predefined error bound. For the same data point, its
predicted value during the compression stage has to be exactly
the same as the one predicted in the decompression stage. Fig. 1. Illustration of Linear-regression-based prediction (2D dataset)
Otherwise, the compression errors would be accumulated
easily during the decompression, causing totally uncontrolled D. Serious Dilemma of Linear-Regression Predictor in SZ2.1
compression errors. Thus, in the compression stage, SZ has In order to get a high compression quality (i.e., a very
to predict each data point by its nearby lossy decompressed good rate-distortion result), the four coefficients need to be
values instead of the original values, which will in turn degrade compressed based on a certain error bound, which may
the prediction accuracy (as exemplified in our prior work [7]). introduce a serious dilemma: a higher error bound used on
This problem is fairly serious, especially when the error bound coefficient compression will decrease the overhead of storing
is relatively high. We proposed the linear-regression predictor the coefficients (to be demonstrated in Fig. 2) but also decrease
in SZ2.1 [8], which can mitigate this issue to a certain extent. the regression accuracy of the constructed hyperplane (to
Such a predictor, however, has a significant drawback and be demonstrated in Fig. 3). We confirm this issue by four
may substantially inhibit the compressor from obtaining a high real-world scientific simulations (QMCPack [32], RTM [17],
compression ratio in many cases. We analyze this drawback Hurricane [33], and NYX [34]), which are commonly used
in detail in the following text. by scientists in quantum structure research, seismic imaging
for oil and gas exploration, climate research, and cosmology
C. Review of Linear Regression Predictor in SZ2.1 research, respectively. More details about these applications
In what follows, we first describe the linear regression are given in Section VI. We exemplify the results using specific
predictor used in SZ2.1 and then elaborate on its serious fields (e.g., time step 1500 of RTM data, the W field of
drawback. Hurriane, and velocity z in NYX) because of the space limits
In SZ2.1, the whole dataset is split into equal-sized blocks and similar results in other fields.
(e.g., 6×6×6 for a 3D dataset) and performs a linear- Figure 2 shows that the overhead always increases with
regression-based predictor when the data inside the block decreasing error bounds used on the compression of coef-
is relatively smooth or the error bound is relatively high. ficients. Specifically, we observe that when the error bound
decreases from 0.1 to 0.01, the coefficient overhead in the 2.0
1e−3

0.25
1e−4

compressed data increases from 55% to 68%, from 25% to 1.5

0.00
37%, from 40% to 53%, and from 60% to 70%, for the four 1.0

0.5
test cases, respectively. The compression ratios (the red curve) −0.25

Value

Value
0.0
thus degrade from 179 to 128, from 102 to 86, from 114 to −0.50

−0.5
90, and from 152 to 118, respectively. −0.75
−1.0
Regression Coeff % in Compressed Data

Regression Coeff % in Compressed Data


100 200 120
Orignal 0.005 −1.00 Orignal 0.1
−1.5
0.1 0.002 10 0.05

175 40 37% 115


−2.0
0.01 0.001
−1.25
1 0.01

80 73% 76% 150 110


−2.5

68% 70% Compression Ratio

Compression Ratio
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40

30 29% Index Index

60 55% 125 25% 105 (a) QMCPack (b) RTM (time step 1500)
100 100
20
40
1e6
1e−2

75 95
13% 6

50 10 90 −3.2

20 4

25 3% 85 −3.4
2
0 0.1 0.01 0.005 0.002 0.001 0 0 10 1 0.1 0.05 0.01 80

Value
Value
Regression Coeff Error Bound Regression Coeff Error Bound 0
−3.6

(a) QMCPack (b) RTM (time step 1500)


−2 −3.8
Regression Coeff % in Compressed Data

Regression Coeff % in Compressed Data

140 80 225 Orignal 0.5 Orignal 1

60 70% −4

70 10 0.1 −4.0 10 0.1


53% 130 200
60% 1 0.01 3 0.01
50 60
−6
Compression Ratio

Compression Ratio
175 −4.2

120 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
40% 50
40 150
Index Index

41% (c) Hurricane(W) (d) NYX(velocityz)


30 110 40
27% 125 Fig. 3. Linear Regression Prediction Hyperplane with Different Error Bound
21% 30
20 100 100 Settings of Coefficients
20
10 90 10% 75
10
1% 2% 50
0 10 1 0.5 0.1 0.01 80 0 10 3 1 0.1 0.01
Regression Coeff Error Bound Regression Coeff Error Bound
(c) Hurricane(W) (d) NYX(velocityz)
Fig. 2. Overhead of Linear Regression Coefficients
We present a slice segment of the four application datasets
in Fig. 3 to illustrate that the error bounds of the coefficient
compression would significantly affect the prediction accuracy
of the constructed linear-regression hyperplane. For instance,
when the coefficients’ compression error bound is set to (a) Original data (QMCPack) (b) Decompressed data (QMCPack)
0.001 for QMCPack and 0.01 for RTM (time step 1500),
the constructed hyperplane (the yellow curve) can fit the real
data (the red curve) well, but the fitting will be much worse
with increasing error bounds. In the case with a relatively
large error bound (e.g., 0.1 in QMAPack), the hyperplane
will downgrade to a simple horizontal line (see blue lines in
the figures), because simply using the neighbor data value is
“accurate” enough for the large error-bounded compression of
the coefficients. This will definitely result in large prediction
errors,1 significantly degrading the final compression ratios.
(c) Original data (RTM) (d) Decompressed data (RTM)
In Fig. 4, we demonstrate that the latest version of SZ (v2.1)
Fig. 4. Visualization of SZ Decompressed Data Based on Two Applications:
may cause significant loss of the data visualization, especially (1) QMCPack – PSNR=56.2, CR=196, and (2) RTM – PSNR=50.7, CR=316
when the compression ratio (CR) is relatively high (e.g., 196
and 568 for the two test cases). We observe that SZ2.1 suffers (rate distortion) can be significantly improved for almost all
from a significant undesired block texture artifact, resulting application datasets, with little performance overhead.
from its blockwise linear-regression design.
To address the serious issue of the linear regression pre- V. E RROR -B OUNDED L OSSY C OMPRESSION WITH A
dictor, we developed a novel efficient predictor based on a DYNAMIC M ULTIDIMENSIONAL S PLINE I NTERPOLATION
dynamic spline interpolation, such that compression quality We present the design overview in Fig. 5, with yellow
rectangles indicating the differences between our design and
1 Prediction error is the difference between predicted value and raw value. the classic SZ compressor and with highlighted rectangles
indicating the critical steps. The fundamental idea is to develop interpolation, and spline interpolation. Piecewise constant in-
a dynamic multidimensional spline interpolation-based pre- terpolation always uses the nearest known data points to
dictor (i.e., solution B shown in Fig. 5) to replace the linear- estimate the new data point, so it has a simple implementation
regression-based predictor such that the coefficient overhead and fast speed. However, its ability to estimate complex
can be completely eliminated while still keeping a fairly high curves is limited because it does not consider the surrounding
prediction accuracy. Our newly designed interpolation-based data points. Polynomial interpolation is designed to find a
predictor starts with one data point and performs interpolation polynomial with the lowest possible degree that passes through
and linear-scale quantization alternatively along each dimen- all the known data points. If the number of known data
sion recursively until all data points are processed. Two alter- points is large, the polynomial may suffer highly inaccurate
native approaches can be used to perform the interpolation in oscillation between the data points. This issue is well known
the multidimensional space. We can build a multidimensional as Runge’s phenomenon and could be mitigated by spline
curve to fit all the already-processed data points, or we can interpolation. Spline interpolation uses piecewise polynomials
build multiple 1D curves to do the interpolation. We choose to define the estimation curve. If the degree of the polynomials
the latter because the former is much more expensive. is 1, the spline interpolation turns to linear interpolation.
In what follows, we introduce the background of spline If the degree of polynomials is 3, it is known as cubic
interpolation (Section V-A), followed by our design of dy- spline interpolation. Cubic spline polynomials have different
namic multidimensional spline interpolation based predictor restrictions. In this paper, we use not-a-knot restriction for
(Sections V-B, V-C, and V-D). cubic spline interpolation.
Input raw data:
1.21, 1.23, 2.1, 2.3, 2.2, … B. Spline Interpolation Designed for Scientific Data
Light-weight data block sampling (sampling rate = 3%) In this part, we describe our cubic spline interpolation for
scientific data prediction. Scientific data usually has locality
features in small regions. Piecewise constant interpolation
Trial run A Trial run B cannot utilize such features for estimation. Polynomial inter-
polation will be affected significantly by Runge’s phenomenon
Adaptive Lorenzo-based Adaptive multi-dimensional
compression with spline interpolation with when interpolating across multiple regions with different lo-
optimized setting optimized setting cality features. Cubic spline interpolation can prevent large
oscillation, but its accuracy may still be affected by the diverse
regions if we apply it on the whole dataset, which will also be
Select very expensive because it needs to solve a huge linear system
better predictor based with all the known data points. To overcome those drawbacks,
on compression ratio
(CR)
we propose an efficient cubic spline method by using only four
CR(A) > CR(B) CR(A) ≤ CR(B) surrounding data values to predict any unknown data point. To
avoid high computation cost, we use fixed indexes for both
Run the optimized Run the optimized known and unknown data points and precompute a closed-
adaptive Lorenzo multi-dimensional form interpolation formula to predict the unknown data points.
predictor spline interpolation
In what follows, we mainly use a 1D example to illustrate
how we derive the interpolation formula, but the formula can
be extended to multidimensional cases easily.
Linear-scale Quantization & Entropy encoding Lemma 1: Denote the dataset as d = (d1 , d2 , ..., dn ) with
n as the total number of elements. The prediction values
Output compressed data stream: 0101010001110011010…
are denoted as p=(p1 , p2 , ..., pn ). We consider all elements in
Fig. 5. Design Overview
odd-index positions as preknown and use them to predict the
elements in even-index positions. The prediction formulas of
A. Introduction to Spline Interpolation linear and cubic spline interpolation are shown in Table I.
Interpolation is widely used in the field of engineering and TABLE I
science to construct new data points with a set of known S PLINE E STIMATIONS
data points. Interpolation techniques attempt to build a curve Spline method Prediction Value pi
that goes through all the known data points. It differs from Linear spline pi = 21 di−1 + 12 di+1
1 9 9 1
regression analysis, which usually seeks a curve that most Cubic spline pi = − 16 di−3 + 16 di−1 + 16 di+1 − d
16 i+3
closely fits the known data points according to a specific Proof: The linear formula is easy to derive, so we prove
mathematical criterion such as mean squared error. The curve only the cubic spline formulas as follows. In our designed cu-
generated by regression may not go through all known points. bic spline interpolation, the known data points di−3 ,di−1 ,di+1
The most popular interpolation methods can be categorized and di+3 are used to predict the data point pi . Three spline
into three types: piecewise constant interpolation, polynomial curves correspond to the known data points:
f1(x) f2(x) f3(x) Table II). The reason is that our interpolation method relies
value on the reconstructed data values generated after a linear-scale
di+3 quantization step, so that the reconstructed data is lossy to a
certain extent. When the error bound is relatively large, the
di–3 pi
Known points loss of these reconstructed data would degrade the prediction
di+1 Unknown points
di–1 accuracy, and the more data points used in the interpolation,
Interpolation
the higher the impact on the accuracy. Since linear spline
i–3 i–1 i i+1 i+3 index adopts fewer data points, it could be superior to cubic splint
Fig. 6. Illustration of Cubic Spline Interpolation especially when the error bound is relatively large. This
possibility motivated us to design a dynamic method selecting
f1 (x) = a1 (x−(i−3))3+b1 (x−(i−3))2+c1 (x−(i−3))+δ1 the better interpolation type (linear or cubic) in practice (to be
f2 (x) = a2 (x−(i−1))3+b2 (x−(i−1))2+c2 (x−(i−1))+δ2 (4) detailed later).
f3 (x) = a3 (x−(i+1))3+b3 (x−(i+1))2+c3 (x−(i+1))+δ3 TABLE II
The scope of f1 , f2 , and f3 is [i−3,i−1], [i−1,i+1], and C OMPARISON OF S PLINE M ETHODS P REDICTION E RROR
 = 1E − 2  = 1E − 4
[i+1,i+3] (as shown in Fig. 6). The spline curves should pass Dataset
Linear Spline Cubic Spline Linear Spline Cubic Spline
through the known data points, so we have RTM (time step 1500) 1.20E-4 1.27E-4 2.0E-5 8.3E-6
Miranda (velocityz) 0.0026 0.0025 0.0061 0.0020
f1 (i − 3) = di−3 ; f1 (i − 1) = di−1 QMCPACK 0.05 0.06 0.008 0.004
f2 (i − 1) = di−1 ; f2 (i + 1) = di+1 (5) SCALE (QS)
NYX (velocityz)
0.076
123486
0.078
134820
0.040
22453
0.041
19978
f3 (i + 1) = di+1 ; f3 (i + 3) = di+3 Hurricane (W) 0.04 0.05 0.023 0.022
0
The first derivatives of f1 (x) is f1 (x) = 3a1 (x − (i − 3))2 +
00
2b1 (x−(i−3))+c1 . The second derivative is f1 (x) = 6a1 (x− C. Multilevel Multidimensional Spline Interpolation
000
(i−3))+2b1 . The third derivative is f1 (x) = 6a1 . Derivatives The previous derivation works in the 1D case with 50% of
of f2 and f3 are similar with f1 . preknown data points, based on which we predict the other
To have a smooth curve, we should let the adjacent spline 50%. In this section, we extend this interpolation method to
functions have the same first derivatives and the same second support data prediction on the entire multidimensional dataset.
derivatives on the joint data points.
0 0 0 0
f1 (i − 1) = f2 (i − 1); f2 (i + 1) = f3 (i + 1) Known data points Unknown data points (to be predicted)
00 00 00 00 (6) di Original raw data di' Reconstructed data
f1 (i − 1) = f2 (i − 1); f2 (i + 1) = f3 (i + 1)
The not-a-knot restriction requires the third derivative of f d1 d2 d3 d4 d5 d6 d7 d8 d9
Use 0 to predict d1
to be equal on locations i − 1 and i + 1. Level 0
000 000 000 000
f1 (i − 1) = f2 (i − 1); f2 (i + 1) = f3 (i + 1) (7) Level 1 Use d1' to predict d9

Using the system of Equations (5), (6), and (7), we can Level 2 Use d1' and d9' to predict d5
derive Level 3 Use d1', d5', and d9' to
1 1 1 1
a2 = − 48 di−3 + 16 di−1 − 16 di+1 + 48 di+3 predict d3 and d7
1 1 1
b2 = 8 di−3 − 4 di−1 + 8 di+1 Level 4 Use d1', d3', d5', d7', and d9'
(8) to predict d2, d4, d6, d8
c2 = − 61 di−3 − 14 di−1 + 12 di+1 − 12 1
di+3 # of levels = ⌈log2(n)⌉ +1
δ2 = di−1 . Fig. 7. Illustration of Multilevel Linear Spline Interpolation
Thus the prediction value of pi will be We use Fig. 7 to demonstrate the multilevel solution with
1 9 9 1
pi = f2 (i) = − 16 di−3 + 16 di−1 + 16 di+1 − 16 di+3 . (9) linear interpolation; cubic interpolation has the same multilevel
Equation (9) is the cubic formula in Table I. design. Suppose the dataset has n elements in one dimension.
We next discuss why we adopt only four known data points The number of levels required to cover all elements in this
in our interpolation instead of six or more data points. If we dimension is l = 1 + dlog2 ne. At level 0, we use 0 to predict
use six data points—di−5 , di−3 , di−1 , di+1 , di+3 , and di+5 — d1 , followed by the error-bounded linear-scale quantization.
to predict pi , the formula of pi by not-a-knot spline turns out We perform a series of interpolations from level 1 to level l−1
to be recursively, as shown in Fig. 7. At each level, we use error-
pi = d80 i−5
− d10
i−3
+ 47 47 di+3
80 di−1 + 80 di+1 − 10 + 80 .
di+5
(10) bounded linear-scale quantization to process the predicted
Compared with Equation (9), Equation (10) involves two value such that the corresponding reconstructed data must be
additional data points di−5 and di+5 , but the weight of the two within the error bound. We denote the reconstructed data as
data points is only 1/80, which means a very limited effect d01 , d02 , ..d0n , as shown in the figure.
on the prediction. Moreover, it has 50% higher computation Such a multilevel interpolation is applied on a multidi-
cost compared with Equation 9. Hence, we choose to use four mensional dataset, illustrated in Fig. 8 with a 2D dataset
data points for prediction, as shown in Table I. as an example. We perform interpolation separately along
In addition, we note that the linear spline interpolation all dimensions at each level, with a fixed sequence of di-
may exhibit better prediction accuracy than the cubic spline mensions. A 2D dataset, for example, has two possible se-
does when setting a relatively large error bound (as shown in quences: dim0 →dim1 and dim1 →dim0 . A 3D dataset has
dim0 dim0 Level 1 dim0
Level 2 We use a uniform sampling method to determine the best
interpolation settings for the input dataset. There are two
dim1

dim1

dim1
settings to optimize for the multidimensional interpolation
predictor: the interpolation type and the dimension sequence.
We adopt a uniform sampling method with only 3% total data
points to select the better interpolation type with the higher
Dim0 interpolation Dim1 interpolation Dim0 interpolation
compression ratio.
dim0 dim0 Level 3
dim0
We note that the spline interpolation predictor does not work
as effectively as the multilayer Lorenzo predictor [7], [16] in
dim1

dim1

dim1
the relatively nonsmooth dataset, especially when the user’s
error bound is relatively small (as shown in Table IV). As a
result, our final solution is selecting the better predictor from
Dim1 interpolation Dim0 interpolation Dim1 interpolation
our spline interpolation method and Lorenzo method.
TABLE IV
Known data points Unknown data points (to be predicted) interpolation
P REDICTION E RROR OF M ULTIDIMENSIONAL S PLINE I NTERPOLATION
P REDICTOR (S), R EGRESSION P REDICTOR (R), AND L ORENZO
Fig. 8. Illustration of Multidimensional Linear Spline Interpolation
P REDICTOR (L)
 = 1E − 2  = 1E − 7
Dataset
6 possible sequences. In our solution, we propose to check S R L S R L
RTM (time step 1500) 1.2E-4 1.3E-4 2.0E-4 6.9E-6 1.0E-4 1.8E-7
only two sequences, dim0 →dim1 →dim2 (sequence 1) and Miranda (velocityz) 0.02 0.03 0.05 0.001 0.02 6E-5
dim2 →dim1 → dim0 (sequence 2) instead of all 6 possible QMCPACK 0.05 0.06 0.13 0.004 0.03 6E-4
SCALE (QS) 0.07 0.16 0.11 0.04 0.15 0.01
combinations. On the one hand, the last interpolation dimen- NYX (velocityz) 121436 132071 410083 15237 51963 16965
sion involves about 50% of the data points (much more than Hurricane (W) 0.04 0.05 0.06 0.01 0.04 0.004
other dimensions), so which dimension to be put in the end
of the sequence determines the overall prediction accuracy.
VI. E XPERIMENTAL E VALUATION
On the other hand, we note that either the highest or lowest
dimension in scientific datasets tends to be smoother than other In this section we present the experimental setup and discuss
dimensions without loss of generality, as confirmed by the first the evaluation results and our analysis.
three columns of Table III (with all 6 applications), which
presents the autocorrelation (AC) of each dimension (higher
AC means smoother data). Accordingly, putting either dim0 A. Experimental Setup
or dim2 in the end of the sequence at each level will get lower 1) Execution Environment: We perform the experiments on
overall prediction errors, as validated in Table III. Hence, we one execution node of the Argonne Bebop supercomputer.
also develop a dynamic strategy to select the best-fit sequence Each node in Bebop is driven by two Intel Xeon E5-2695
of dimensions from among the two candidates, as detailed in v4 processors with 128 GB of DRAM.
the next subsection. 2) Applications: We perform the evaluation using six real-
TABLE III world scientific applications from different domains:
AUTOCORRELATION AND P REDICTION E RROR OF C UBIC S PLINE
I NTERPOLATION WITH D IFFERENT S EQUENCES OF D IMENSION • QMCPack: An open source ab initio quantum Monte
S ETTINGS , =1E−3 Carlo package for the electronic structure of atoms,
Dataset
Autocorrelation (Lag=4) Prediction Error molecules, and solids [32].
dim2 dim1 dim0 0→1→2 0→2→1 2→1→0
RTM (time step 1500) 0.88 0.58 0.45 2.17E-5 2.32E-5 2.51E-5 • RTM: Reverse time migration code for seismic imaging
Miranda (velocityz) 0.84 0.82 0.96 0.004 0.004 0.003 in areas with complex geological structures [17].
QMCPACK 0.83 0.83 0.75 0.010 0.010 0.013
SCALE (QS) 0.987 0.986 0.872 0.0447 0.0448 0.10 • NYX: An adaptive mesh, cosmological hydrodynamics
NYX (velocityz) 0.9818 0.99 0.99 31668 29903 28975 simulation code.
Hurricane (W) 0.19 0.027 0.86 0.024 0.025 0.016
• Hurricane: A simulation of a hurricane from the National
Center for Atmospheric Research in the United States.
• Scale-LETKF: Local Ensemble Transform Kalman Filter
D. Dynamic Optimization Strategies
(LETKF) data assimilation package for the SCALE-RM
In this section we propose a dynamic design with two weather model.
adaptive strategies: (1) automatically optimizing the spline • Miranda: A radiation hydrodynamics code designed for

interpolation predictor (Trial run B in Fig. 5) by select- large-eddy simulation of multicomponent flows with tur-
ing the best-fit interpolation type (either linear or cubic) bulent mixing.
and optimizing the sequence of interpolation dimensions and Detailed information about the datasets (all using single preci-
(2) automatically selecting the better predictor between the sion) is presented in Table V. Some data fields are transformed
Lorenzo-based predictor (Trial run A in Fig. 5) and the to their logarithmic domain before compression for better
interpolation predictor. visualization, as suggested by domain scientists.
TABLE V
BASIC INFORMATION ABOUT APPLICATION DATASETS than other compressors by 20%∼460% in most cases. For
App. # files Dimensions Total Size Domain example, when setting the error bound to 1E-3 for compressing
RTM 3600 449×449×235 635GB Seismic Wave
Miranda 7 256×384×384 1GB Turbulence
RTM data, the second-best compressor ((SZ(SP+PO)) gets
QMCPACK 1 288×115×69×69 612MB Quantum Structure a compression ratio of 114.4, while our compressing ratio
Scale-LETKF 13 98×1200×1200 6.4GB Climate
NYX 6 512×512×512 3.1GB Cosmology
reaches up to 397.6 (with a 247.5% improvement). The key
Hurricane 48×13 100×500×500 58GB Weather reason our solution can get a significantly higher compression
ratio is twofold: (1) we significantly improve the prediction
3) State-of-the-Art Lossy Compressors in Our Evaluation: accuracy by a dynamic spline interpolation, and (2) some
In our experiment we compare our new compressor with five other compressors such as ZFP and MGARDx suffer from
other state-of-the-art error-bounded lossy compressors (SZ2.1 the precision-overpreservation issue (i.e., the actual maximum
[8], ZFP0.5.5 [5], SZ(Hybrid) [15], SZ(SP+PO)1 [16] and errors are smaller than the required error bound, as verified
MGARDx [28]), which have been recognized as the best in by prior works [6], [7], [28].
class (validated by different researchers [8], [10], [14], [35]). TABLE VI
4) Evaluation Metrics: We evaluate the six compressors C OMPRESSION R ATIO C OMPARISON BASED ON THE S AME E RROR B OUND
based on five critical metrics, as described below. Dataset 
SZ
2.1
SZ
(Hybrid)
SZ
(SP+PO)
ZFP MGARDx OurSol
OurSol
Improve %
• Compression ratio (CR) based on the same error bound: 1E-2 271.7 195.7 358.1 111.0 229.7 1997.5 457%
RTM 1E-3 109.8 101.4 114.4 59.3 78.1 397.6 247%
The descriptions of CR and absolute error bound are de- 1E-4 57.3 44.4 63.0 34.9 38.3 116.3 84%
1E-2 125.6 130.4 188.4 46.6 113.7 582.1 209%
fined in Section III. Without loss of generality, we adopt Miranda 1E-3 59.9 55.4 58.4 25.6 38.0 160.7 168%
1E-4 30.6 23.4 33.9 14.5 20.0 47.1 39%
value-range-based error bound (denoted as ), which takes 1E-2 196.2 144.8 174.5 39.4 159.8 675.5 244%
QMCPack 1E-3 51.1 53.4 68.0 21.2 47.1 204.3 200%
the same effect with absolute error bound (denoted e) 1E-4 18.9 24.9 23.6 10.4 14.9 63.7 155%
because e = (max(D) − min(D)). SCALE
1E-2
1E-3
84.3
26.6
94.2
27.1
108.2
31.8
14.5
7.8
52.8
20.2
157.0
40.5
45%
27%
• Compression speed and decompression speed: 1E-4 14.0 13.2 14.1 4.6 10.4 14.9 5%
original size reconstructed size 1E-2 43.6 33.2 48.7 12.0 24.7 59.4 22%
compression time (MB/s) and decompression time (MB/s). NYX 1E-3 16.8 16.3 17.4 6.0 11.2 21.1 21%
1E-4 7.6 8.0 8.1 3.7 5.5 9.1 12%
• Rate-distortion: The detailed description is in Section III. 1E-2 49.4 44.6 65.4 11.3 28.1 69.3 6%
Hurricane 1E-3 17.6 17.9 19.8 6.7 12.7 22.5 14%
• Visualization with the same CR: Compare the visual 1E-4 9.8 10.1 10.5 4.3 7.4 10.8 3%
quality of the reconstructed data based on the same CR. Table VII compares the compression/decompression speed
• Precision of final execution results of RTM data with
among all six lossy compressors for all six applications. It
lossy compression. clearly shows that our solution exhibits compression per-
B. Evaluation Results and Analysis formance similar to that of SZ2.1 and MGARDx, and its
decompression performance is also comparable to that of
First, we verified the maximum compression errors for
SZ2.1 and is about 30% higher than that of MGARDx.
all six compressors based on all the application datasets
with different error bounds. Experiments confirm that they TABLE VII
C OMPRESSION /D ECOMPRESSION S PEEDS (MB/ S ) WITH =1E-3
all respect the error bound constraint very well. Figure 9
SZ SZ SZ
shows the distribution of compression errors of our solution Type Dataset
2.1 (Hybrid) (SP+PO)
ZFP MGARDx OurSol
on two error bounds (=1E-3 and =1E-4, in other words, RTM 207 76 97 549 128 149
Compression

Miranda 125 73 91 201 140 128


e=0.033&0.0033 for QMCPACK and e=8.2E-5&8.2E-6 for QMCPack 146 63 78 158 136 133
RTM). We can clearly see that the compression errors are SCALE 145 59 75 101 122 128
NYX 123 81 86 131 117 110
100% within the absolute error bound (e) for all data points. Hurricane 115 63 78 115 122 131
RTM 385 299 298 984 173 276
Decompression

Miranda 285 221 206 531 177 232


QMCPack 327 232 282 367 168 241
SCALE 271 184 192 295 164 215
NYX 222 172 215 244 145 136
Hurricane 222 186 200 257 163 193
0.033 8.2E-5
8.2E-6
0.0033 As discussed in Section V-D, we designed a dynamic
strategy to optimize the compression quality throughout the
entire bit-rate range. Figure 10 demonstrates that the dy-
(a) QMCPack (b) RTM (time step 1500)
namic strategy has a critical effect in the compression quality
Fig. 9. Compression Error Distribution of Our Solution improvement. For instance, as shown in Fig. 10 (a), our
Table VI presents the compression ratios of the six com- solution always exhibits the best compression quality when
pressors based on the six real-world applications with the the bit rate is lower than 2.5 because it adopts a dynamic
same error bounds. We can clearly observe that our solution interpolation method with optimized dimension sequences on
always exhibits the highest compression ratio in all cases. a multilevel interpolation, whereas both linear interpolation
In particular, the compression ratio of our solution is higher and tricubic interpolation (shown in the figure) use a fixed
sequence. (z→y→x). On the other hand, Fig. 10 (a) shows
1 SZ(SP+PO) represents the SZ compression model with 2nd-order predic-
that our solution also keeps the best rate-distortion level when
tion (SP) and parameter optimization (PO), suffering 1X slower compression.
225 Interp(Linear) Lorenzo Interp(Linear) Lorenzo SZ2.1 ZFP
Interp(Cubic) OurSol 160 Interp(Cubic) OurSol 140 SZ(Hybrid) MGARDx
200 SZ(SP+PO) OurSol

175 140 120


120
PSNR (dB)

PSNR (dB)
150

PSNR (dB)
Compr. ratio
100
100
improved by 133%
125 over the second best
under the same PSNR
80
100 80 80 70
100
75 60 70 60 60
80 50
50 40
60
40 40
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
0 2 4 6 8 10 0 2 4 6 8 10 12 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(a) RTM (time step 1500) (b) NYX (field velocityz) (a) RTM (b) NYX
200 Interp(Linear) Lorenzo
180 Interp(Linear) Lorenzo 160 SZ2.1 ZFP
Interp(Cubic) OurSol SZ(Hybrid) MGARDx
180
Interp(Cubic) OurSol
160 140 SZ(SP+PO) OurSol

160 140
120
140
PSNR (dB)

120

PSNR (dB)
PSNR (dB)

120
Compr. ratio
100
100 improved by 85%
over the second best
100 80
under the same PSNR
80 80
60
80 100
60
60 50 60
60 80
40 40
40
40 0.5 1.0 1.5 2.0 0.25 0.50 0.75 1.00 0.5 1.0 1.5
0 1 2 3 4 5 6 7 8 0 2 4 6 8 10 12 14 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(c) Miranda (field velocityz) (d) Scale-LETKF (field QS) (c) Miranda (d) Scale-LETKF
160 160
225 Interp(Linear) Lorenzo Interp(Linear) Lorenzo SZ2.1
SZ(Hybrid)
ZFP
MGARDx
Interp(Cubic) OurSol Interp(Cubic) OurSol
200 140 140 SZ(SP+PO) OurSol

175 120 120

PSNR (dB)
PSNR (dB)

PSNR (dB)

150 Compr. ratio


100
100 improved by 91%

125 over the second best

80
under the same PSNR

100 100
80 70 60
75 60
60
80 60 40
50 60
50 40
0.5 1.0 1.5 2.0
0.5 1.0 1.5 2.0 40 0.5 1.0 1.5 2.0
0 2 4 6 8 10
0 2 4 6 8 10 12 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(e) QMCPack (f) Hurricane(field W) (e) QMCPack (f) Hurricane
Fig. 10. Our Solution Compared with Interpolation and Lorenzo Fig. 11. Overall Evaluation (Lower Bit Rate and Higher PSNR → Better
Quality)

the bit rate is higher than 2.5, a result that is attributed to


quality in the decompressed data with a compression ratio
our accurate predictor selection algorithm (selecting a better
even up to 64 for QMCPack and 315 for RTM. In contrast,
predictor between interpolation and Lorenzo at runtime).
other compressors suffer from prominent degradation in visual
Figure 11 presents the overall compression quality (i.e., rate
quality to different extents with the same compression ratios.
distortion). One can see that our solution is the best in class
In particular, SZ and ZFP suffer from undesired blockwise
from among all the related works for all six applications. In
texture artifacts.
particular, with the same data-distortion level (PSNR), the
compressed data size under our solution is about 50% of We show in Fig. 14 and Fig. 15 that the final RTM image for
the compressed data size under the second-best compressor a single shot is not degraded at all using our lossy compressor
in most of the cases for RTM, Miranda, and QMCPack. with very high compression ratios (about 2∼4× higher than
We demonstrate the visual quality of the decompressed data that of other compressors). We use value-range-based error
of four error-bounded lossy compressors in Fig. 12 and Fig. 13 bound 1.25E-3 in our solution for each time step. The RTM
for QMCPack and RTM, respectively, using one slice image application requires propagating waves generated by a source
(one orbit for QMCPack and slice 340 for RTM) in the 3D signal, in a given subsurface model. At the beginning of the
datasets. The original visualization is shown in Fig. 4. The two propagation the compression ratios are very high (10k+) when
figures clearly show that our solution keeps an excellent visual the waves are close to the source locations. Over time, the
throughput requirements and enable either faster turnaround or
higher-fidelity simulations for production-level seismic imag-
ing.
VII. C ONCLUSION AND F UTURE W ORK
In this paper we present a novel error-bounded lossy com-
pressor based on the SZ framework. We identify a significant
linear-regression coefficient overhead issue in the current re-
leased version of SZ (v2.1). To address this issue, we develop
(a) OurSol (PSNR:89.4,CR:64) (b) SZ (PSNR:66.7,CR:62) a dynamic spline interpolation approach with adaptive opti-
mization strategies. We thoroughly evaluate the compression
quality and performance of our solution compared with that
of five other lossy compressors on six real-world scientific
simulations. The key findings are summarized below.
• Our analysis shows that the linear regression predictor has
a significant problem because its coefficient overhead is
non-negligible (25%∼70% in compressed data).
• Our dynamic spline interpolation solution can improve
the compression ratio by 457%, 244%, and 209% com-
(c) ZFP (PSNR:67.6,CR:62) (d) MGARDx (PSNR:79.9,CR:64) pared with the second-best compressor on RTM, QMC-
Fig. 12. Visualization of Decompressed Data (QMCPack) PACK, and Miranda datasets, respectively.
• Our solution has high compression/decompression per-
formance comparable to that of SZ2.1. Its compression
speed is 28%∼100% faster than other SZ-based methods
such as SZ(Hybrid) and SZ(SP+PO).
• Our solution keeps an extremely high visual quality in
the decompressed data, whereas other lossy compressors
suffer from prominent degradation in visualization with
the same compression ratios.
We plan to improve the compression quality by exploring
(a) OurSol (PSNR:69.3,CR:315) (b) SZ (PSNR:50.7,CR:315)
more effective prediction models. We will also study the
impact of our compressor on multishot RTM simulations with
hundreds of thousands of shot images.
R EFERENCES
[1] J. Kay, C. Deser, A. Phillips, A. Mai, C. Hannay, G. Strand, J. Arblaster,
S. Bates, G. Danabasoglu, J. Edwards et al., “The community earth sys-
tem model (CESM), large ensemble project: A community resource for
studying climate change in the presence of internal climate variability,”
Bulletin of the American Meteorological Society, vol. 96, no. 8, pp.
1333–1349, 2015.
[2] A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A. Mick-
elson, J. Edwards, M. Vertenstein, and A. Wegener, “A methodology for
(c) ZFP (PSNR:51.7,CR:258) (d) MGARDx (PSNR:62.5,CR:310) evaluating the impact of data compression on climate simulation data,” in
Fig. 13. Visualization of Decompressed Snapshot Data (RTM) Proceedings of the 23rd International Symposium on High-performance
Parallel and Distributed Computing, ser. HPDC ’14. New York, NY,
USA: ACM, 2014, pp. 203–214.
waves are propagating further in the model, resulting in more [3] S. Habib, V. Morozov, N. Frontiere, H. Finkel, A. Pope, K. Heitmann,
complex images and compression ratios dropping to about 70. K. Kumaran, V. Vishwanath, T. Peterka, J. Insley et al., “HACC: extreme
The overall compression ratio is 274 because the compression scaling and performance across diverse architectures,” Communications
of the ACM, vol. 60, no. 1, pp. 97–104, 2016.
ratio at most time steps can reach 300+ (e.g., CR=315 at time [4] Globus. [Online]. Available: https://www.globus.org/
step 1500 as shown in Fig. 13 (a)). In this simulation we used [5] P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE
one shot to generate the final image in Fig. 15. One can see a Transactions on Visualization and Computer Graphics, vol. 20, no. 12,
pp. 2674–2683, 2014.
very good preservation of amplitudes and main structures of [6] S. Di and F. Cappello, “Fast error-bounded lossy HPC data compression
the lossy-compression-based final result, which is acceptable with SZ,” in IEEE International Parallel and Distributed Processing
for postanalysis as confirmed by the seismic researchers. Our Symposium (IEEE IPDPS). IEEE, 2016, pp. 730–739.
[7] D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly improving lossy
lossy compressor dramatically decreases the size of the RTM compression for scientific data sets based on multidimensional prediction
snapshots while not increasing the computation time compared and error-controlled quantization,” in 2017 IEEE International Parallel
with SZ 2.1. This decrease can significantly lower the I/O and Distributed Processing Symposium. New York, NY, USA: IEEE,
2017, pp. 1129–1139.
250000
SZ2.1 80

Compression Ratio
ZFP
200000 SZ(Hybrid) 60
MGARDx
150000 SZ(SP+PO) 40
OutSol
100000 20
0
50000 3400 3420 3440 3460 3480 3500 3520 3540 3560 3580 3600

0
0 500 1000 1500 2000 2500 3000 3500
Time Steps
Fig. 14. Compression Ratio of RTM Data for Different Time Steps (with Value-Range-Based Error Bound 1.25E-3)

platform for seismic applications,” First Break, vol. 38, no. 2, pp. 97–
100, 2020.
[18] A. M. Gok, S. Di, A. Yuri, D. Tao, V. Mironov, X. Liang, and F. Cap-
pello, “PaSTRI: A novel data compression algorithm for two-electron
integrals in quantum chemistry,” in IEEE International Conference on
Cluster Computing (CLUSTER). New York, NY, USA: IEEE, 2018,
pp. 1–11.
[19] Zlib, http://www.zlib.net/, online.
[20] Zstd, https://github.com/facebook/zstd/releases, online.
[21] D. Taubman and M. Marcellin, JPEG2000 Image Compression Fun-
damentals, Standards and Practice. New York, NY, USA: Springer
Publishing Company, Incorporated, 2013.
(a) Original Final Result (b) Compression-based Final Result [22] R. Ballester-Ripoll, P. Lindstrom, and R. Pajarola, “TTHRESH: Tensor
Fig. 15. Visualization of RTM Image for One Shot compression for multidimensional visual data,” IEEE Transactions on
Visualization & Computer Graphics, vol. 26, no. 09, pp. 2891–2903,
sep 2020.
[8] X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, [23] G. Ballard, A. Klinvex, and T. G. Kolda, “TuckerMPI: A Parallel
“Error-controlled lossy compression optimized for high compression C++/MPI software package for large-scale data compression via the
ratios of scientific datasets,” 2018 IEEE International Conference on Tucker tensor decomposition,” ACM Trans. Math. Softw., vol. 46, no. 2,
Big Data (Big Data), pp. 438–447, 2018. Jun. 2020.
[9] N. Sasaki, K. Sato, T. Endo, and S. Matsuoka, “Exploration of lossy [24] W. Austin, G. Ballard, and T. G. Kolda, “Parallel tensor compression
compression for application-level checkpoint/restart,” in Proceedings for large-scale scientific data,” in 2016 IEEE International Parallel and
of the 2015 IEEE International Parallel and Distributed Processing Distributed Processing Symposium, 2016, pp. 912–922.
Symposium, ser. IPDPS ’15. Washington, DC, USA: IEEE Computer [25] S. Li, S. Jaroszynski, S. Pearse, L. Orf, and J. Clyne, “Vapor: A
Society, 2015, pp. 914–922. visualization package tailored to analyze simulation data in Earth system
[10] A. H. Baker, D. M. Hammerling, and T. L. Turton, “Evaluating image science,” Atmosphere, vol. 10, p. 488, 08 2019.
quality measures to assess the impact of lossy data compression applied [26] L. Ibarria, P. Lindstrom, J. Rossignac, and A. Szymczak, “Out-of-core
to climate simulation data,” Computer Graphics Forum, vol. 38, no. 3, compression and decompression of large n-dimensional scalar fields,”
pp. 517–528, 2019. in Computer Graphics Forum, vol. 22, no. 3. Wiley Online Library,
[11] X.-C. Wu, S. Di, E. M. Dasgupta, F. Cappello, H. Finkel, Y. Alexeev, 2003, pp. 343–348.
and F. T. Chong, “Full-state quantum circuit simulation by using data [27] D. Tao, S. Di, X. Liang, Z. Chen, and F. Cappello, “Optimizing lossy
compression,” in Proceedings of the International Conference for High compression rate-distortion from automatic online selection between
Performance Computing, Networking, Storage and Analysis, ser. SC ’19. SZ and ZFP,” IEEE Transactions on Parallel and Distributed Systems,
New York, NY, USA: Association for Computing Machinery, 2019. vol. 30, no. 8, pp. 1857–1871, 2019.
[12] X. Liang, S. Di, D. Tao, S. Li, B. Nicolae, Z. Chen, and F. Cappello, [28] X. Liang et al., “Optimizing multi-grid based reduction for efficient
“Improving performance of data dumping with lossy compression for scientific data management,” https://arxiv.org/abs/2010.05872, 2020, on-
scientific simulation,” in 2019 IEEE International Conference on Cluster line.
Computing (CLUSTER), 2019, pp. 1–11. [29] M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel tech-
[13] N. Kukreja, J. H. uuckelheim, M. Louboutin, J. Washbourne, P. H. Kelly, niques for compression and reduction of scientific data—the univariate
and G. J. Gorman, “Lossy checkpoint compression in full waveform case,” Computing and Visualization in Science, vol. 19, no. 5, pp. 65–76,
inversion,” https://arxiv.org/pdf/2009.12623.pdf, 2020, online. Dec 2018.
[30] X. Zou, T. Lu, W. Xia, X. Wang, W. Zhang, H. Zhang, S. Di, D. Tao,
[14] T. Lu, Q. Liu, X. He, H. Luo, E. Suchyta, J. Choi, N. Podhorszki,
and F. Cappello, “Performance optimization for relative-error-bounded
S. Klasky, M. Wolf, T. Liu et al., “Understanding and modeling
lossy compression on scientific data,” IEEE Transactions on Parallel
lossy compression schemes on HPC scientific data,” in 2018 IEEE
and Distributed Systems, vol. PP, p. 1, 02 2020.
International Parallel and Distributed Processing Symposium. IEEE,
[31] J. Ziv and A. Lempel, “A universal algorithm for sequential data
2018, pp. 348–357.
compression,” IEEE Transactions on information theory, vol. 23, no. 3,
[15] X. Liang, S. Di, S. Li, D. Tao, B. Nicolae, Z. Chen, and F. Cappello, pp. 337–343, 1977.
“Significantly improving lossy compression quality based on an opti- [32] J. Kim et al., “QMCPACK: an open sourceab initioquantum monte carlo
mized hybrid prediction model,” in Proceedings of the International package for the electronic structure of atoms, molecules and solids,”
Conference for High Performance Computing, Networking, Storage and Journal of Physics: Condensed Matter, vol. 30, no. 19, p. 195901, apr
Analysis, 2019, pp. 1–26. 2018.
[16] K. Zhao, S. Di, X. Liang, S. Li, D. Tao, Z. Chen, and F. Cappello, [33] Hurricane ISABEL simulation data,
“Significantly improving lossy compression for HPC datasets with http://vis.computer.org/vis2004contest/data.html, 2019, online.
second-order prediction and parameter optimization,” in Proceedings [34] Scientific Data Reduction Benchmark, https://sdrbench.github.io/, on-
of the 29th International Symposium on High-Performance Parallel line.
and Distributed Computing, ser. HPDC ’20. New York, NY, USA: [35] Y. Shapira, Matrix-Based Multigrid Theory and Applications, 2nd ed.,
Association for Computing Machinery, 2020, pp. 89–100. ser. Numerical Methods and Algorithms, 2. New York, NY: Springer
[17] S. Kayum, T. Tonellot, V. Etienne, A. Momin, G. Sindi, M. Dmitriev, US, 2008.
and H. Salim, “GeoDRIVE - a high performance computing flexible

You might also like