Optimizing Error-Bounded Lossy Compression For Scientic Data by Dynamic Spline Interpolation
Optimizing Error-Bounded Lossy Compression For Scientic Data by Dynamic Spline Interpolation
Abstract—Today’s scientific simulations are producing vast To this end, error-bounded lossy compression techniques
volumes of data that cannot be stored and transferred efficiently [5]–[8] have been developed for several years, and they have
because of limited storage capacity, parallel I/O bandwidth, and been widely recognized as an optimal solution to tackle the
network bandwidth. The situation is getting worse over time
because of the ever-increasing gap between relatively slow data scientific big data problem. For example, many researchers [2],
transfer speed and fast-growing computation power in modern [9], [10] have verified that the reconstructed data through error-
supercomputers. Error-bounded lossy compression is becoming bounded lossy compressors is totally acceptable for users’
one of the most critical techniques for resolving the big scientific postanalysis. Many successful stories also show that error-
data issue, in that it can significantly reduce the scientific data bounded lossy compressors not only can significantly reduce
volume while guaranteeing that the reconstructed data is valid for
users because of its compression-error-bounding feature. In this the storage space but also may substantially improve the data-
paper, we present a novel error-bounded lossy compressor based moving performance or reduce the memory footprint, with
on a state-of-the-art prediction-based compression framework. user-acceptable data distortions. For instance, Wu et al. [11]
Our solution exhibits substantially better compression quality showed that a customized error-bounded lossy compressor
than do all the existing error-bounded lossy compressors, with can increase 2∼16 qubits for general quantum circuits while
comparable compression speed. Specifically, our contribution is
threefold. (1) We provide an in-depth analysis of why the best- maintaining 0.985 output fidelity on average. In particular, the
existing prediction-based lossy compressor can only minimally memory requirement of simulating the 61-qubit quantum com-
improve the compression quality. (2) We propose a dynamic spline puting simulation (Grover’s search algorithm) can be lowered
interpolation approach with a series of optimization strategies from 32 exabytes to 768 terabytes of memory on Argonne’s
that can significantly improve the data prediction accuracy, Theta supercomputer using 4,096 nodes. Liang et al. [12]
substantially improving the compression quality in turn. (3) We
perform a thorough evaluation using six real-world scientific showed that an error-bounded lossy compressor can improve
simulation datasets across different science domains to evaluate the overall I/O performance by 60X, with no degradation of
our solution vs. all other related works. Experiments show that visual quality on the reconstructed data. Kukreja et al. [13]
the compression ratio of our solution is higher than that of the showed that using error-bounded lossy compression can get
second-best lossy compressor by 20%∼460% with the same error high compression ratios without significantly impacting the
bound in most of the cases.
convergence or final solution of the full waveform inversion
solver.
I. I NTRODUCTION
The SZ compression library has been recognized by in-
With the ever-increasing execution scale of today’s scientific dependent assessments [7], [10], [14] as the best-in-class
simulations, vast scientific data are produced at every simula- error-bounded lossy compressor for scientific datasets, espe-
tion run. Climate simulation [1], for example, can generate cially because it has gone through many careful optimiza-
hundreds of terabytes of data in tens of seconds [2]. A tions. The first release of SZ (SZ0.1∼1.0) [6] proposed the
cosmology simulation, such as Hardware/Hybrid Accelerated prediction-based lossy compression framework combining a
Cosmology (HACC) [3] can produce dozens of petabytes of one-dimensional curve-fitting method and a customized Huff-
data when it performs an N-body simulation with up to several man encoding algorithm. SZ1.4 [7] improved the compression
trillion particles. Such a vast amount of scientific data needs quality significantly by extending its curve-fitting method
to be stored in a parallel file system for postanalysis, creating to multidimensional prediction and adopting a linear-scale
a huge challenge to the storage space. Many scientists also quantization. SZ2.0 [8] further improved the prediction ac-
need to share the large amounts of data across different sites curacy by adopting the linear regression method especially
(i.e., endpoints) through a data-sharing web service (such for high-compression cases. The compression quality also was
as the Globus toolkit [4]) on the Internet. Thus, the ability improved by integrating ZFP [5] as a candidate predictor [15]
to significantly compress extremely large scientific data with in the SZ framework and by adopting parameter autotuning
controlled data distortion is critical to today’s science work. and second-order prediction algorithms [16].
One important question is whether we still have room to Unlike traditional lossy compression techniques (such as
further improve the error-bounded lossy compression quality Jpeg2000 [21]) that were designed mainly for image data,
significantly for scientific datasets. In fact, the critical chal- error-bounded lossy compression not only can get a fairly
lenge in the current design of SZ comes from its linear- high compression ratio (several dozens, hundreds, or even
regression prediction method, which has two significant draw- higher) but also can guarantee the reconstructed data valid
backs. On the one hand, it suffers from limited accuracy in for the scientific postanalysis in terms of the user-predefined
predicting nonlinear varied datasets. Many scientific simula- compression error bound. Error-bounded lossy compression
tions (such as seismic wave simulation [17] and quantum can be castegorized as higher-order singular value decompo-
chemistry research [18]) , however, may produce a vast amount sition (HOSVD)-based models, transform-based models, and
of data with nonlinear features, such that SZ cannot work very prediction-based models.
effectively on them. On the other hand, the linear-regression- • The HOSVD-based model (such as TTHRESH [22] and
based prediction needs to store several coefficients (e.g., four ThuckerMPI [23]) leverages Tucker decomposition [24]
coefficients per block for 3D compression) in each block to reduce the dimensions in order to lower the data
of data, introducing significant overhead especially when a volume significantly. Such a method may get fairly
relatively high compression ratio is required by users. high compression ratios, especially for high-dimensional
In this paper, we propose a novel, efficient lossy com- datasets, but may suffer from fairly low compression
pression method based on the state-of-the-art SZ compression speed [22] that is not practical for the applications con-
model. Our contribution is threefold. sidered in this paper.
• We provide an in-depth analysis of the latest version of • The transform-based model involves several steps: data
SZ and identify a significant drawback of its prediction preprocessing, lossless data transform, and lossy encod-
method; the analysis also sheds light on the solution. ing. ZFP [5], for example, splits the data into multiple
• We propose a novel efficient approach by multidimen- small blocks (4k , where k is the number of dimensions of
sional spline interpolation that can significantly improve the dataset) and then performs in each block the exponent
the prediction accuracy especially for datasets with non- alignment, orthogonal transform, and embedded encod-
linear data variation features. We further propose dynamic ing. Other transform-based compressors include VAPOR
optimization strategies to improve the overall compres- [25] and SSEM [9].
sion quality while reducing the inevitable overhead as • Compared with transform-based compressors, the
much as possible. prediction-based model first predicts data values for each
• We perform a comprehensive assessment of our solution data point based on local regions and then quantizes
versus five state-of-the-art error-controlled lossy compres- the predicted values to a set of integer numbers that
sors, using multiple real-world simulation datasets across will be further compressed by lossless compression
different scientific domains. Experiments show that our techniques. SZ [8], for example, splits the whole dataset
solution improves the compression ratio by 20%∼460% into many small equal-sized blocks (e.g., 6x6x6 for 3D
over the second-best compressor with the same error datasets) and predicts the data points in each block by a
bound and experiences no degradation in the postanalysis hybrid method (either Lorenzo [26] or linear regression),
accuracy. followed by a customized Huffman encoding and other
The rest of the paper is organized as follows. In Section II lossless compression techniques such as Zstd [20].
we discuss the related work. In Section III we formulate the In our work, we choose the prediction-based model because
research problem. In Section IV we offer an in-depth analysis SZ has been recognized as the leading compressor in the sci-
of the pros and cons of the current SZ design, information entific data compression community. In fact, how to leverage
that is fundamental to the subsequent optimization efforts. In SZ to improve compression quality has been studied for two
Section V we describe our solution and detailed optimization years. Tao et al. [27] developed a strategy that can combine
strategies. In Section VI we present the evaluation results SZ and ZFP to optimize the compression ratios based on a
compared with five other state-of-the-art lossy compressors more significant metric, peak signal-to-noise ratio (PSNR).
using real-world applications. In Section VII we conclude with Liang et al. [15] further analyzed the principles of SZ and
a brief discussion of the future work. ZFP and developed a method integrating ZFP into the SZ
compression model, which can further improve the compres-
II. R ELATED W ORK sion quality. Zhao et al. [16] proposed to adopt second-order
Data compression is becoming a critical technique for Lorenzo+regression in the prediction method and developed an
storing vast volumes of data by high-performance computing autotuning method to optimize the parameters of SZ. Liang et
scientific simulations. Lossless compressors such as Zlib [19] al. [28] accelerated the performance of MultiGrid Adaptive
and Zstd [20] suffer from very limited compression ratios (gen- Reduction of Data (MGARD) [29] and used SZ to compress
erally ∼2 or even less) since lossless compression techniques the nodal points generated by the MGARD framework [29],
rely on repeated byte-stream patterns whereas scientific data is which can improve the compression ratios significantly.
often composed of diverse floating-point numbers. Thus, lossy All these existing SZ-related solutions have to rely on the
compression for scientific data has been studied for years. linear regression prediction to a certain extent, a method in
which the compression quality is restricted significantly in where di and d0i are referred to the value of the ith data point
quite a few cases. In this paper, we first provide an in- in the original dataset D and the decompressed dataset D0
depth analysis of the cause of that reduced quality and then by the new compression solution, respectively. The notations
propose an efficient strategy to solve it. We perform a thorough sc (newsol.) and sd (newsol.) represent the compression speed
evaluation to compare our solution with five state-of-the- and decompression speed of the new solution, respectively,
art lossy compressors using real-world scientific applications. and sc (sz) and sd (sz) represent the compression speed and
Experiments show that our solution is superior to all existing decompression speed of the original SZ compressor, respec-
error-bounded compressors, with much higher compression tively. That is, we are trying to increase the compression
ratios than that of the second-best method. ratio with the same level of data distortion and comparable
compression/decompression performance compared with SZ
III. P ROBLEM F ORMULATION as a baseline (because SZ has been confirmed as a fairly fast
In this section we describe the research problem formu- lossy compressor in many existing studies [12], [28], [30]),
lation. Given a scientific dataset (denoted by D) composed In our evaluation in Section VI, not only do we present the
of N floating-point values (either single precision or double rate distortion results for many different datasets at different
precision) and a user-specified absolute error bound (e), the bit-rate ranges, but we also assess the impact of our lossy com-
objective is to develop an error-bounded lossy compressor pressor on the results of decompressed-data-based postanalysis
that can always meet the error-bounding constraint at each on one production-level seismic simulation research.
data point with optimized compression quality and comparable
IV. D EEPLY U NDERSTANDING THE P ROS AND C ONS OF SZ
performance (i.e., speed).
Rate distortion is arguably the metric most commonly used In this section, we first give a review of the current SZ
by the lossy compression community to assess compression design and then provide an in-depth analysis of a serious
quality. It can be converted to the commonly used statistical problem in the latest version of the SZ compressor (SZ2.1) [8].
data distortion metric known as normalized root mean squared Understanding this problem is fundamental to understanding
error, and it is a good indicator of visual quality. Rate why our new solution can significantly improve the compres-
distortion involves two critical metrics: peak signal-to-noise sion ratio.
ratio and bit rate. PSNR can be written as the following:
A. Review of SZ Lossy Compression Framework
Formula (1).
SZ2.1 [8], the latest version of SZ, has been recognized
PSNR = 20 log10 (vrange(D)−10log10 (mse(D,D0 )),, (1)
as an excellent error-bounded lossy compressor based on
where D0 is the reconstructed dataset after decompression (i.e., numerous experiments with different scientific applications by
decompressed dataset) and vrange(D) represents the value different researchers [15], [16], [27].
range of the original dataset D (i.e., the difference between SZ2.1 involves four stages during the compression: (1) data
its highest value and lowest value). Obviously, the higher the prediction, (2) linear-scale quantization, (3) variable-length
PSNR value is, the smaller the mean squared error, which encoding, and (4) lossless compression such as Zstd [20]. We
means higher precision of the decompressed data. briefly describe the four steps, and we refer readers to read
Bit rate is used to evaluate the compression ratio (the ratio our prior papers [7], [8] for technical details.
of the original data size to the compressed size). Specifically, • Step 1: data prediction. In this step, SZ predicts each
bit rate is defined as the average number of bits used per data point by its nearby data values. The prediction
data point in the compressed data. For example, suppose a methods differ with various versions (from 0.1 through
single-precision original dataset has 100 million data points; 2.1). For example, SZ 0.1∼1.0 [6] adopted a simple one-
its original data size is 100,000,000×4 bytes (i.e., about 400 dimensional adaptive curve-fitting method, which selects
MB). If the compressed data size is 4,000,000 bytes (i.e., a the best predictor for each data point from among three
compression ratio of 100:1), then the bit rate can be calculated candidates: previous-value fitting, linear-curve fitting, and
as 32/100 = 0.32 (one single-precision number takes 32 bits). quadratic-curve fitting. SZ1.4 [7] completely replaced the
Obviously, smaller bit rate means higher compression ratio. curve-fitting method by a multidimensional first-order
Two other important compression assessment metrics are Lorenzo predictor, significantly improving compression
compression speed (denoted by sc ) and decompression speed ratios by over 200% over SZ1.0. SZ2.0∼SZ2.1 further
(denoted by sd ). They are defined as the amount of data improved the prediction method by proposing a blockwise
processed per time unit (MB/s). linear regression predictor that can significantly enhance
In our research, we focus on the optimization of compres- compression ratios by 150%∼800% over SZ1.4, espe-
sion quality (i.e., rate distortion) with high performance, which cially for cases with a high compression-ratio (i.e., when
can be formulated as follows: the error bound is relatively large).
Optimize rate-distortion • Step 2: linear-scale quantization. In this step, SZ com-
subject to |di − d0i | ≤ e putes the difference (denoted ∆) between the predicted
(2) value (calculated in Step 1) and the original data value for
sc (newsol.) ≈ sc (sz)
sd (newsol.) ≈ sd (sz), each data point and then quantizes the difference ∆ based
on the user-predefined error bound (e). The quantization The basic idea is to use linear regression to construct a
bins are equal-sized and are twice as large as the error hyperplane in each block, such that the data inside the block
bound, such that the maximum compression errors must can be approximated by the hyperplane with minimized min
be controlled within the specified error bound. After this squared error (MSE), as illustrated in Fig. 1. In a 3D space,
step, all floating-point values are converted to integer for example, any hyperplane function can be represented as
numbers (i.e., quantization numbers), most of which are f (x, y, z) = β0 + β1 x + β2 y + β3 z. Porting the hyperplane
expected to be close to zero, especially when the data are function on each data point inside the block will produce a
fairly smooth in locality or the predefined error bound is system of equations. Letting the partial derivatives of these
relatively large. equations be 0, we get Formula (3), where ni refers to the
• Step 3: customized Huffman encoding. SZ was tailored block size along dimension i and fijk refers to the data value
an integer-based Huffman encoding algorithm to encode at the relative position (i,j,k) in the block. Based on this, we
the quantization numbers generated by Step 2. can construct the hyperplane with minimized MSE. The details
• Step 4: lossless compression. The last step in SZ is can be found in our prior work [8].
adopting a lossless compressor with a pattern-recognized 6 2Vx
algorithm such as LZ77 [31] to further improve the β1 = n1 n2 n3 (n1 +1) ( n1 −1 − V0 )
6 2Vy
n1 n2 n3 (n2 +1) ( n2 −1 − V0 )
β =
compression ratios significantly. SZ initially chose Zlib 2
(3)
[19] but switched to Zstd [20] thereafter because Zstd is β3 = n n n 6(n +1) ( n2V−1 z
− V0 )
1 2 3 3 3
β0 = n1 nV02 n3 − ( n12−1 β1 + n22−1 β2 + n32−1 β3 )
much faster than Zlib.
n1 −1 n2 −1 n3 −1 n1 −1 n2 −1 n3 −1
B. Critical Features of SZ Compression Framework where V0 =
X X X
fijk , Vx =
X X X
i ∗ fijk ,
First, SZ is a very flexible compression framework, in which i=0 j=0 k=0 i=0 j=0 k=0
0.25
1e−4
0.00
37%, from 40% to 53%, and from 60% to 70%, for the four 1.0
0.5
test cases, respectively. The compression ratios (the red curve) −0.25
Value
Value
0.0
thus degrade from 179 to 128, from 102 to 86, from 114 to −0.50
−0.5
90, and from 152 to 118, respectively. −0.75
−1.0
Regression Coeff % in Compressed Data
Compression Ratio
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
60 55% 125 25% 105 (a) QMCPack (b) RTM (time step 1500)
100 100
20
40
1e6
1e−2
75 95
13% 6
50 10 90 −3.2
20 4
25 3% 85 −3.4
2
0 0.1 0.01 0.005 0.002 0.001 0 0 10 1 0.1 0.05 0.01 80
Value
Value
Regression Coeff Error Bound Regression Coeff Error Bound 0
−3.6
60 70% −4
Compression Ratio
175 −4.2
120 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
40% 50
40 150
Index Index
Using the system of Equations (5), (6), and (7), we can Level 2 Use d1' and d9' to predict d5
derive Level 3 Use d1', d5', and d9' to
1 1 1 1
a2 = − 48 di−3 + 16 di−1 − 16 di+1 + 48 di+3 predict d3 and d7
1 1 1
b2 = 8 di−3 − 4 di−1 + 8 di+1 Level 4 Use d1', d3', d5', d7', and d9'
(8) to predict d2, d4, d6, d8
c2 = − 61 di−3 − 14 di−1 + 12 di+1 − 12 1
di+3 # of levels = ⌈log2(n)⌉ +1
δ2 = di−1 . Fig. 7. Illustration of Multilevel Linear Spline Interpolation
Thus the prediction value of pi will be We use Fig. 7 to demonstrate the multilevel solution with
1 9 9 1
pi = f2 (i) = − 16 di−3 + 16 di−1 + 16 di+1 − 16 di+3 . (9) linear interpolation; cubic interpolation has the same multilevel
Equation (9) is the cubic formula in Table I. design. Suppose the dataset has n elements in one dimension.
We next discuss why we adopt only four known data points The number of levels required to cover all elements in this
in our interpolation instead of six or more data points. If we dimension is l = 1 + dlog2 ne. At level 0, we use 0 to predict
use six data points—di−5 , di−3 , di−1 , di+1 , di+3 , and di+5 — d1 , followed by the error-bounded linear-scale quantization.
to predict pi , the formula of pi by not-a-knot spline turns out We perform a series of interpolations from level 1 to level l−1
to be recursively, as shown in Fig. 7. At each level, we use error-
pi = d80 i−5
− d10
i−3
+ 47 47 di+3
80 di−1 + 80 di+1 − 10 + 80 .
di+5
(10) bounded linear-scale quantization to process the predicted
Compared with Equation (9), Equation (10) involves two value such that the corresponding reconstructed data must be
additional data points di−5 and di+5 , but the weight of the two within the error bound. We denote the reconstructed data as
data points is only 1/80, which means a very limited effect d01 , d02 , ..d0n , as shown in the figure.
on the prediction. Moreover, it has 50% higher computation Such a multilevel interpolation is applied on a multidi-
cost compared with Equation 9. Hence, we choose to use four mensional dataset, illustrated in Fig. 8 with a 2D dataset
data points for prediction, as shown in Table I. as an example. We perform interpolation separately along
In addition, we note that the linear spline interpolation all dimensions at each level, with a fixed sequence of di-
may exhibit better prediction accuracy than the cubic spline mensions. A 2D dataset, for example, has two possible se-
does when setting a relatively large error bound (as shown in quences: dim0 →dim1 and dim1 →dim0 . A 3D dataset has
dim0 dim0 Level 1 dim0
Level 2 We use a uniform sampling method to determine the best
interpolation settings for the input dataset. There are two
dim1
dim1
dim1
settings to optimize for the multidimensional interpolation
predictor: the interpolation type and the dimension sequence.
We adopt a uniform sampling method with only 3% total data
points to select the better interpolation type with the higher
Dim0 interpolation Dim1 interpolation Dim0 interpolation
compression ratio.
dim0 dim0 Level 3
dim0
We note that the spline interpolation predictor does not work
as effectively as the multilayer Lorenzo predictor [7], [16] in
dim1
dim1
dim1
the relatively nonsmooth dataset, especially when the user’s
error bound is relatively small (as shown in Table IV). As a
result, our final solution is selecting the better predictor from
Dim1 interpolation Dim0 interpolation Dim1 interpolation
our spline interpolation method and Lorenzo method.
TABLE IV
Known data points Unknown data points (to be predicted) interpolation
P REDICTION E RROR OF M ULTIDIMENSIONAL S PLINE I NTERPOLATION
P REDICTOR (S), R EGRESSION P REDICTOR (R), AND L ORENZO
Fig. 8. Illustration of Multidimensional Linear Spline Interpolation
P REDICTOR (L)
= 1E − 2 = 1E − 7
Dataset
6 possible sequences. In our solution, we propose to check S R L S R L
RTM (time step 1500) 1.2E-4 1.3E-4 2.0E-4 6.9E-6 1.0E-4 1.8E-7
only two sequences, dim0 →dim1 →dim2 (sequence 1) and Miranda (velocityz) 0.02 0.03 0.05 0.001 0.02 6E-5
dim2 →dim1 → dim0 (sequence 2) instead of all 6 possible QMCPACK 0.05 0.06 0.13 0.004 0.03 6E-4
SCALE (QS) 0.07 0.16 0.11 0.04 0.15 0.01
combinations. On the one hand, the last interpolation dimen- NYX (velocityz) 121436 132071 410083 15237 51963 16965
sion involves about 50% of the data points (much more than Hurricane (W) 0.04 0.05 0.06 0.01 0.04 0.004
other dimensions), so which dimension to be put in the end
of the sequence determines the overall prediction accuracy.
VI. E XPERIMENTAL E VALUATION
On the other hand, we note that either the highest or lowest
dimension in scientific datasets tends to be smoother than other In this section we present the experimental setup and discuss
dimensions without loss of generality, as confirmed by the first the evaluation results and our analysis.
three columns of Table III (with all 6 applications), which
presents the autocorrelation (AC) of each dimension (higher
AC means smoother data). Accordingly, putting either dim0 A. Experimental Setup
or dim2 in the end of the sequence at each level will get lower 1) Execution Environment: We perform the experiments on
overall prediction errors, as validated in Table III. Hence, we one execution node of the Argonne Bebop supercomputer.
also develop a dynamic strategy to select the best-fit sequence Each node in Bebop is driven by two Intel Xeon E5-2695
of dimensions from among the two candidates, as detailed in v4 processors with 128 GB of DRAM.
the next subsection. 2) Applications: We perform the evaluation using six real-
TABLE III world scientific applications from different domains:
AUTOCORRELATION AND P REDICTION E RROR OF C UBIC S PLINE
I NTERPOLATION WITH D IFFERENT S EQUENCES OF D IMENSION • QMCPack: An open source ab initio quantum Monte
S ETTINGS , =1E−3 Carlo package for the electronic structure of atoms,
Dataset
Autocorrelation (Lag=4) Prediction Error molecules, and solids [32].
dim2 dim1 dim0 0→1→2 0→2→1 2→1→0
RTM (time step 1500) 0.88 0.58 0.45 2.17E-5 2.32E-5 2.51E-5 • RTM: Reverse time migration code for seismic imaging
Miranda (velocityz) 0.84 0.82 0.96 0.004 0.004 0.003 in areas with complex geological structures [17].
QMCPACK 0.83 0.83 0.75 0.010 0.010 0.013
SCALE (QS) 0.987 0.986 0.872 0.0447 0.0448 0.10 • NYX: An adaptive mesh, cosmological hydrodynamics
NYX (velocityz) 0.9818 0.99 0.99 31668 29903 28975 simulation code.
Hurricane (W) 0.19 0.027 0.86 0.024 0.025 0.016
• Hurricane: A simulation of a hurricane from the National
Center for Atmospheric Research in the United States.
• Scale-LETKF: Local Ensemble Transform Kalman Filter
D. Dynamic Optimization Strategies
(LETKF) data assimilation package for the SCALE-RM
In this section we propose a dynamic design with two weather model.
adaptive strategies: (1) automatically optimizing the spline • Miranda: A radiation hydrodynamics code designed for
interpolation predictor (Trial run B in Fig. 5) by select- large-eddy simulation of multicomponent flows with tur-
ing the best-fit interpolation type (either linear or cubic) bulent mixing.
and optimizing the sequence of interpolation dimensions and Detailed information about the datasets (all using single preci-
(2) automatically selecting the better predictor between the sion) is presented in Table V. Some data fields are transformed
Lorenzo-based predictor (Trial run A in Fig. 5) and the to their logarithmic domain before compression for better
interpolation predictor. visualization, as suggested by domain scientists.
TABLE V
BASIC INFORMATION ABOUT APPLICATION DATASETS than other compressors by 20%∼460% in most cases. For
App. # files Dimensions Total Size Domain example, when setting the error bound to 1E-3 for compressing
RTM 3600 449×449×235 635GB Seismic Wave
Miranda 7 256×384×384 1GB Turbulence
RTM data, the second-best compressor ((SZ(SP+PO)) gets
QMCPACK 1 288×115×69×69 612MB Quantum Structure a compression ratio of 114.4, while our compressing ratio
Scale-LETKF 13 98×1200×1200 6.4GB Climate
NYX 6 512×512×512 3.1GB Cosmology
reaches up to 397.6 (with a 247.5% improvement). The key
Hurricane 48×13 100×500×500 58GB Weather reason our solution can get a significantly higher compression
ratio is twofold: (1) we significantly improve the prediction
3) State-of-the-Art Lossy Compressors in Our Evaluation: accuracy by a dynamic spline interpolation, and (2) some
In our experiment we compare our new compressor with five other compressors such as ZFP and MGARDx suffer from
other state-of-the-art error-bounded lossy compressors (SZ2.1 the precision-overpreservation issue (i.e., the actual maximum
[8], ZFP0.5.5 [5], SZ(Hybrid) [15], SZ(SP+PO)1 [16] and errors are smaller than the required error bound, as verified
MGARDx [28]), which have been recognized as the best in by prior works [6], [7], [28].
class (validated by different researchers [8], [10], [14], [35]). TABLE VI
4) Evaluation Metrics: We evaluate the six compressors C OMPRESSION R ATIO C OMPARISON BASED ON THE S AME E RROR B OUND
based on five critical metrics, as described below. Dataset
SZ
2.1
SZ
(Hybrid)
SZ
(SP+PO)
ZFP MGARDx OurSol
OurSol
Improve %
• Compression ratio (CR) based on the same error bound: 1E-2 271.7 195.7 358.1 111.0 229.7 1997.5 457%
RTM 1E-3 109.8 101.4 114.4 59.3 78.1 397.6 247%
The descriptions of CR and absolute error bound are de- 1E-4 57.3 44.4 63.0 34.9 38.3 116.3 84%
1E-2 125.6 130.4 188.4 46.6 113.7 582.1 209%
fined in Section III. Without loss of generality, we adopt Miranda 1E-3 59.9 55.4 58.4 25.6 38.0 160.7 168%
1E-4 30.6 23.4 33.9 14.5 20.0 47.1 39%
value-range-based error bound (denoted as ), which takes 1E-2 196.2 144.8 174.5 39.4 159.8 675.5 244%
QMCPack 1E-3 51.1 53.4 68.0 21.2 47.1 204.3 200%
the same effect with absolute error bound (denoted e) 1E-4 18.9 24.9 23.6 10.4 14.9 63.7 155%
because e = (max(D) − min(D)). SCALE
1E-2
1E-3
84.3
26.6
94.2
27.1
108.2
31.8
14.5
7.8
52.8
20.2
157.0
40.5
45%
27%
• Compression speed and decompression speed: 1E-4 14.0 13.2 14.1 4.6 10.4 14.9 5%
original size reconstructed size 1E-2 43.6 33.2 48.7 12.0 24.7 59.4 22%
compression time (MB/s) and decompression time (MB/s). NYX 1E-3 16.8 16.3 17.4 6.0 11.2 21.1 21%
1E-4 7.6 8.0 8.1 3.7 5.5 9.1 12%
• Rate-distortion: The detailed description is in Section III. 1E-2 49.4 44.6 65.4 11.3 28.1 69.3 6%
Hurricane 1E-3 17.6 17.9 19.8 6.7 12.7 22.5 14%
• Visualization with the same CR: Compare the visual 1E-4 9.8 10.1 10.5 4.3 7.4 10.8 3%
quality of the reconstructed data based on the same CR. Table VII compares the compression/decompression speed
• Precision of final execution results of RTM data with
among all six lossy compressors for all six applications. It
lossy compression. clearly shows that our solution exhibits compression per-
B. Evaluation Results and Analysis formance similar to that of SZ2.1 and MGARDx, and its
decompression performance is also comparable to that of
First, we verified the maximum compression errors for
SZ2.1 and is about 30% higher than that of MGARDx.
all six compressors based on all the application datasets
with different error bounds. Experiments confirm that they TABLE VII
C OMPRESSION /D ECOMPRESSION S PEEDS (MB/ S ) WITH =1E-3
all respect the error bound constraint very well. Figure 9
SZ SZ SZ
shows the distribution of compression errors of our solution Type Dataset
2.1 (Hybrid) (SP+PO)
ZFP MGARDx OurSol
on two error bounds (=1E-3 and =1E-4, in other words, RTM 207 76 97 549 128 149
Compression
PSNR (dB)
150
PSNR (dB)
Compr. ratio
100
100
improved by 133%
125 over the second best
under the same PSNR
80
100 80 80 70
100
75 60 70 60 60
80 50
50 40
60
40 40
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
0 2 4 6 8 10 0 2 4 6 8 10 12 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(a) RTM (time step 1500) (b) NYX (field velocityz) (a) RTM (b) NYX
200 Interp(Linear) Lorenzo
180 Interp(Linear) Lorenzo 160 SZ2.1 ZFP
Interp(Cubic) OurSol SZ(Hybrid) MGARDx
180
Interp(Cubic) OurSol
160 140 SZ(SP+PO) OurSol
160 140
120
140
PSNR (dB)
120
PSNR (dB)
PSNR (dB)
120
Compr. ratio
100
100 improved by 85%
over the second best
100 80
under the same PSNR
80 80
60
80 100
60
60 50 60
60 80
40 40
40
40 0.5 1.0 1.5 2.0 0.25 0.50 0.75 1.00 0.5 1.0 1.5
0 1 2 3 4 5 6 7 8 0 2 4 6 8 10 12 14 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(c) Miranda (field velocityz) (d) Scale-LETKF (field QS) (c) Miranda (d) Scale-LETKF
160 160
225 Interp(Linear) Lorenzo Interp(Linear) Lorenzo SZ2.1
SZ(Hybrid)
ZFP
MGARDx
Interp(Cubic) OurSol Interp(Cubic) OurSol
200 140 140 SZ(SP+PO) OurSol
PSNR (dB)
PSNR (dB)
PSNR (dB)
80
under the same PSNR
100 100
80 70 60
75 60
60
80 60 40
50 60
50 40
0.5 1.0 1.5 2.0
0.5 1.0 1.5 2.0 40 0.5 1.0 1.5 2.0
0 2 4 6 8 10
0 2 4 6 8 10 12 0 2 4 6 8 10
Bit Rate Bit Rate Bit Rate
(e) QMCPack (f) Hurricane(field W) (e) QMCPack (f) Hurricane
Fig. 10. Our Solution Compared with Interpolation and Lorenzo Fig. 11. Overall Evaluation (Lower Bit Rate and Higher PSNR → Better
Quality)
Compression Ratio
ZFP
200000 SZ(Hybrid) 60
MGARDx
150000 SZ(SP+PO) 40
OutSol
100000 20
0
50000 3400 3420 3440 3460 3480 3500 3520 3540 3560 3580 3600
0
0 500 1000 1500 2000 2500 3000 3500
Time Steps
Fig. 14. Compression Ratio of RTM Data for Different Time Steps (with Value-Range-Based Error Bound 1.25E-3)
platform for seismic applications,” First Break, vol. 38, no. 2, pp. 97–
100, 2020.
[18] A. M. Gok, S. Di, A. Yuri, D. Tao, V. Mironov, X. Liang, and F. Cap-
pello, “PaSTRI: A novel data compression algorithm for two-electron
integrals in quantum chemistry,” in IEEE International Conference on
Cluster Computing (CLUSTER). New York, NY, USA: IEEE, 2018,
pp. 1–11.
[19] Zlib, http://www.zlib.net/, online.
[20] Zstd, https://github.com/facebook/zstd/releases, online.
[21] D. Taubman and M. Marcellin, JPEG2000 Image Compression Fun-
damentals, Standards and Practice. New York, NY, USA: Springer
Publishing Company, Incorporated, 2013.
(a) Original Final Result (b) Compression-based Final Result [22] R. Ballester-Ripoll, P. Lindstrom, and R. Pajarola, “TTHRESH: Tensor
Fig. 15. Visualization of RTM Image for One Shot compression for multidimensional visual data,” IEEE Transactions on
Visualization & Computer Graphics, vol. 26, no. 09, pp. 2891–2903,
sep 2020.
[8] X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, [23] G. Ballard, A. Klinvex, and T. G. Kolda, “TuckerMPI: A Parallel
“Error-controlled lossy compression optimized for high compression C++/MPI software package for large-scale data compression via the
ratios of scientific datasets,” 2018 IEEE International Conference on Tucker tensor decomposition,” ACM Trans. Math. Softw., vol. 46, no. 2,
Big Data (Big Data), pp. 438–447, 2018. Jun. 2020.
[9] N. Sasaki, K. Sato, T. Endo, and S. Matsuoka, “Exploration of lossy [24] W. Austin, G. Ballard, and T. G. Kolda, “Parallel tensor compression
compression for application-level checkpoint/restart,” in Proceedings for large-scale scientific data,” in 2016 IEEE International Parallel and
of the 2015 IEEE International Parallel and Distributed Processing Distributed Processing Symposium, 2016, pp. 912–922.
Symposium, ser. IPDPS ’15. Washington, DC, USA: IEEE Computer [25] S. Li, S. Jaroszynski, S. Pearse, L. Orf, and J. Clyne, “Vapor: A
Society, 2015, pp. 914–922. visualization package tailored to analyze simulation data in Earth system
[10] A. H. Baker, D. M. Hammerling, and T. L. Turton, “Evaluating image science,” Atmosphere, vol. 10, p. 488, 08 2019.
quality measures to assess the impact of lossy data compression applied [26] L. Ibarria, P. Lindstrom, J. Rossignac, and A. Szymczak, “Out-of-core
to climate simulation data,” Computer Graphics Forum, vol. 38, no. 3, compression and decompression of large n-dimensional scalar fields,”
pp. 517–528, 2019. in Computer Graphics Forum, vol. 22, no. 3. Wiley Online Library,
[11] X.-C. Wu, S. Di, E. M. Dasgupta, F. Cappello, H. Finkel, Y. Alexeev, 2003, pp. 343–348.
and F. T. Chong, “Full-state quantum circuit simulation by using data [27] D. Tao, S. Di, X. Liang, Z. Chen, and F. Cappello, “Optimizing lossy
compression,” in Proceedings of the International Conference for High compression rate-distortion from automatic online selection between
Performance Computing, Networking, Storage and Analysis, ser. SC ’19. SZ and ZFP,” IEEE Transactions on Parallel and Distributed Systems,
New York, NY, USA: Association for Computing Machinery, 2019. vol. 30, no. 8, pp. 1857–1871, 2019.
[12] X. Liang, S. Di, D. Tao, S. Li, B. Nicolae, Z. Chen, and F. Cappello, [28] X. Liang et al., “Optimizing multi-grid based reduction for efficient
“Improving performance of data dumping with lossy compression for scientific data management,” https://arxiv.org/abs/2010.05872, 2020, on-
scientific simulation,” in 2019 IEEE International Conference on Cluster line.
Computing (CLUSTER), 2019, pp. 1–11. [29] M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Multilevel tech-
[13] N. Kukreja, J. H. uuckelheim, M. Louboutin, J. Washbourne, P. H. Kelly, niques for compression and reduction of scientific data—the univariate
and G. J. Gorman, “Lossy checkpoint compression in full waveform case,” Computing and Visualization in Science, vol. 19, no. 5, pp. 65–76,
inversion,” https://arxiv.org/pdf/2009.12623.pdf, 2020, online. Dec 2018.
[30] X. Zou, T. Lu, W. Xia, X. Wang, W. Zhang, H. Zhang, S. Di, D. Tao,
[14] T. Lu, Q. Liu, X. He, H. Luo, E. Suchyta, J. Choi, N. Podhorszki,
and F. Cappello, “Performance optimization for relative-error-bounded
S. Klasky, M. Wolf, T. Liu et al., “Understanding and modeling
lossy compression on scientific data,” IEEE Transactions on Parallel
lossy compression schemes on HPC scientific data,” in 2018 IEEE
and Distributed Systems, vol. PP, p. 1, 02 2020.
International Parallel and Distributed Processing Symposium. IEEE,
[31] J. Ziv and A. Lempel, “A universal algorithm for sequential data
2018, pp. 348–357.
compression,” IEEE Transactions on information theory, vol. 23, no. 3,
[15] X. Liang, S. Di, S. Li, D. Tao, B. Nicolae, Z. Chen, and F. Cappello, pp. 337–343, 1977.
“Significantly improving lossy compression quality based on an opti- [32] J. Kim et al., “QMCPACK: an open sourceab initioquantum monte carlo
mized hybrid prediction model,” in Proceedings of the International package for the electronic structure of atoms, molecules and solids,”
Conference for High Performance Computing, Networking, Storage and Journal of Physics: Condensed Matter, vol. 30, no. 19, p. 195901, apr
Analysis, 2019, pp. 1–26. 2018.
[16] K. Zhao, S. Di, X. Liang, S. Li, D. Tao, Z. Chen, and F. Cappello, [33] Hurricane ISABEL simulation data,
“Significantly improving lossy compression for HPC datasets with http://vis.computer.org/vis2004contest/data.html, 2019, online.
second-order prediction and parameter optimization,” in Proceedings [34] Scientific Data Reduction Benchmark, https://sdrbench.github.io/, on-
of the 29th International Symposium on High-Performance Parallel line.
and Distributed Computing, ser. HPDC ’20. New York, NY, USA: [35] Y. Shapira, Matrix-Based Multigrid Theory and Applications, 2nd ed.,
Association for Computing Machinery, 2020, pp. 89–100. ser. Numerical Methods and Algorithms, 2. New York, NY: Springer
[17] S. Kayum, T. Tonellot, V. Etienne, A. Momin, G. Sindi, M. Dmitriev, US, 2008.
and H. Salim, “GeoDRIVE - a high performance computing flexible