0% found this document useful (0 votes)

25 views

A Compression-Boosting Transform For Two-Dimension

This document summarizes a research paper that introduces a novel invertible transform for two-dimensional data with the goal of improving lossless compression. The transform searches for the largest constant submatrix and reorders the matrix to move this submatrix to the top-left corner. While the inverse transform is fast, the forward transform is computationally expensive. Preliminary experiments show the transform improves compression of images with gzip and bzip2.

Uploaded by

Meter matters

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

A Compression-Boosting Transform For Two-Dimension

Uploaded by

Meter matters

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220789026

A Compression-Boosting Transform for Two-

Dimensional Data

Conference Paper in Lecture Notes in Computer Science June 2006

DOI: 10.1007/11775096_13 Source: DBLP

CITATION READS

1 16

3 authors, including:

Avraham A. Melkman
Ben-Gurion University of the Negev
47 PUBLICATIONS 682 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-
mers View project

Hardware Acceleration View project

All content following this page was uploaded by Stefano Lonardi on 26 May 2014.

The user has requested enhancement of the downloaded file.

A compression-boosting transform for
two-dimensional data?

Qiaofeng Yang1 , Stefano Lonardi1 , and Avraham Melkman2

1
Dept. of Computer Science & Engineering
University of California
Riverside, CA 92521
2
Department of Computer Science
Ben-Gurion University of the Negev
Beer-Sheva, Israel 84105

Abstract. We introduce a novel invertible transform for two-dimensional

data which has the objective of reordering the matrix so it will improve
its (lossless) compression at later stages. The transform requires to solve
a computationally hard problem for which a randomized algorithm is
used. The inverse transform is fast and can be implemented in linear
time in the size of the matrix. Preliminary experimental results show
that the reordering improves the compressibility of digital images.

1 Introduction
Every day massive quantities of two-dimensional data are produced, stored and
transmitted. Digital images are the most prominent type of data in this cat-
egory. However, matrices over finite alphabets are used to represent all sorts
of information, like graphs, database tables, geometric objects, etc. From the
compression standpoint, two-dimensional data has to be treated differently than
one-dimensional data. In order to obtain good compression of 2D data, one has
to exploit the dependencies (or equivalently, expose the redundancies) both be-
tween the rows and the columns of the matrix.
Lossless compression algorithms are typically composed by a pipeline of in-
dependent stages, usually ending with a statistical encoder. For example, the
celebrated bzip2 employs a pipeline composed by the Burrows-Wheeler trans-
form (BWT), a move-to-front encoder, and finally an Huffman compressor. Each
step somewhat reorder the data so that redundancies get exposed and removed
by the subsequent stages. The objective of the BWT is exactly that of elucidating
the dependencies between adjacent symbols in the original text string.
In this paper we propose an invertible transform for two-dimensional data
over an alphabet . For simplicity, we assume = {0, 1}. The extension to larger
?
This project was supported in part by NSF CAREER IIS-0447773, and NSF DBI-
0321756. AM was supported in part by the Paul Ivanier Center for Robotics Research
and Production Management. A one-page abstract about this work appeared in the
Proceedings of Data Compression Conference, Snowbird, Utah, 2005.
alphabets is immediate and is not pursued here. The goal of the transform is to
boost the compression achieved by later stages. The transform is described by
a simple recursive algorithm, which can be outlined as follows.
Given the matrix to be transformed, first search for the largest columnwise-
constant (resp., rowwise-constant) submatrix, that is, a submatrix identified by
a subset of rows and the columns (which are not necessarily contiguous) whose
columns (resp., rows) are constant (i.e., either all 0 or 1). Reorder the rows and
the columns such that the columnwise-constant (or rowwise-constant) submatrix
is moved to the left-upper corner of the matrix. Recursively apply the transform
on the rest of the matrix. Stop the recursion when the partition produces a
matrix which is smaller than a predetermined threshold (see Figure 3 for an
illustration of this process).
The intriguing question is whether this somewhat simple matrix transfor-
mation helps compression. Arguments can be made in favor or against. On one
hand, each columnwise-constant (or rowwise-constant) submatrix can be rep-
resented compactly in a canonical form (first all the 0-columns, then all the
1-columns) by the list of its rows and columns. If a matrix can be decomposed
into a small number of large constant submatrices, one would expect an im-
provement in compressibility. On the other hand, while the transform groups
together portions of the matrix which are similar, the reordering can also break
the local dependencies that exist in the original matrix between adjacent rows
and columns. Breaking these dependencies could increase the entropy and have
a negative impact on the compressibility.
The contribution of this paper is twofold. First, we present a novel invertible
transform for 2D data. The design of the transform went through a series of
refinements, and here we present the result of such process (Section 5). We
also studied the computational cost of the transform, which turns out to be
unbalanced. The inverse transform is extremely fast and simple, whereas the
direct transform is very expensive. Our compression-boosting phase is therefore
suitable to applications in which the data is compressed once and decompressed
many times, like large repositories of digital images where images are stored and
rarely modified.
The computational cost of the forward transform depends on the complexity
of finding the largest columnwise-constant/rowwise-constant submatrix. In [1] we
studied theoretically the general version of this problem. Although the problem
turns out to be NP-hard, a relatively simple randomized algorithm that has good
performance in practice was introduced in [1]. For completeness of presentation,
we will briefly outline the algorithm in Section 4. The interested reader can refer
to the original paper for more details.
Second, we study empirically the effects of the transform on the compressibil-
ity of two-dimensional data by comparing the performance of gzip and bzip2
before and after the application of the transform on synthetic data and digital
images. The preliminary results in Section 6 show that the transform boosts
compression.
In closing, we want to point out that since our transform simply changes
the representation of the data and it does not deal with the encoding problem
(i.e., assigning bits to symbols), here we are not proposing a complete data
compression software tool. Also, since we our transform is not optimized for
digital images, the transform is not an image compression tool either. As said
above, the primary use of our transform primary is as a preprocessing step to
reorder the data so that the downstream compression with standard lossless
encoder would be more efficient.

2 Related works

Since we are not proposing a new compression method, we will not delve into the
vast literature on lossless image compression. There are however, a few related
works on the idea of reordering the columns and/or the rows of a matrix with
the objective of reducing the storage space or the access time to the element of
the matrix.
In [35], the main concern is to compress database tables by exploiting the
dependencies between the columns. In [3], Buchsbaum et al. observe that parti-
tioning the table into blocks of columns and compressing each block of columns
individually could improve compression. The key problem is to find the optimal
partition of columns. In the follow-up paper [4], the authors add the possibil-
ity of rearranging the columns. Their tool, called pzip outperforms gzip by a
factor of two on database tables. Along the same line of research, the authors
of [5] introduce the k-transform which captures the dependencies between k + 1
columns of a matrix. Although the problem of computing the k-transform for
k 2 is NP-hard, the proposed polynomial-time heuristic for the 2-transform
performs remarkably well compared to pzip, bzip2 and gzip.
The task of compressing boolean (sparse) matrices is also addressed in [6
10]. For example, in [9] the objective is to reorder the columns of a matrix such
that the 1s in each row appear consecutively. Again, the problem of finding a
reordering which minimizes the number of runs of 1s is NP-hard. This problem
reduces to solving a traveling salesman problem for which the authors propose an
heuristic algorithm. In [10] the objective is to find a reordering of both rows and
columns of a boolean matrix so that the matrix can be broken into homogeneous
rectangles and the description complexity involved in describing those rectangles
(called cross-association) is minimized. The problem is defined in an information
theoretical context and a two-stage heuristics algorithm is proposed.

3 Notations and problem definition

The input to the transform is a two-dimensional n m matrix X {0, 1}nm.

The element (i, j) of X is denoted by X[i,j] . A contiguous submatrix from row
i1 to row i2 and from column j1 to column j2 is denoted by X[i1 :i2 ,j1 :j2 ] .
A row selection of size k of X is defined as a subset of the rows R =
{i1 , i2 , . . . , ik }, where 1 is n for all 1 s k. Similarly, a column se-
lection of size l of X is defined as a subset of the columns C = {j1 , j2 , . . . , jl },
where 1 jt m for all 1 t l. Given a row selection R, we say that a
column j, 1 j m, is constant with respect to R if the symbols in the j-th
column of X restricted to the rows in R, are identical.
The submatrix X[R,C] induced by the pair (R, C) is defined as the matrix

X[i1 ,j1 ] X[i1 ,j2 ] X[i1 ,jl ]

X[i2 ,j1 ] X[i2 ,j2 ] X[i2 ,jl ]
X[R,C] =

X[ik ,j1 ] X[ik ,j2 ] X[ik ,jl ]

A submatrix induced by a pair (R, C) is called columnwise-constant (resp.,

rowwise-constant) if all its columns (resp., rows) are constant. Hereafter, for
brevity we will use the term constant submatrix to denote either a columnwise-
constant or a rowwise-constant submatrix.

001011
101101
100110
Example. Given the 6 6 matrix X = over the alphabet = {0, 1},
101001
111101
110110
a selection (R, C) = ({2, 4, 5}, {1, 3, 5, 6}) results in the columnwise-constant
1101
submatrix X[R,C] = 1101 . X[R,C] is the largest area columnwise-constant sub-
1101
matrix in X.
The main computational problem is the following. Given a matrix X
{0, 1}nm, find a constant submatrix with the largest area. This problem is
strongly related to the Maximum Edge Biclique problem since a n m bi-
nary matrix can also be interpreted as the adjacency matrix of a bipartite graph.
The biclique problem requires to find the biclique which has the maximum num-
ber of edges which corresponds to the largest constant submatrix composed only
of 1s. This problem was proved to be NP-hard in [11] by reduction to 3Sat.
The weighted version of this problem was shown to be NP-hard by Dawande et
al. [12]. A 2-approximation algorithm based on LP-relaxation was given in [13].

4 Finding the largest columnwise-constant submatrix

Given that the problem of finding the largest constant submatrix of 1s is NP-
hard, it is unlikely that a polynomial time algorithm could be found. In [1] we
introduced a randomized algorithm which is able to find the optimal solution
with probability 1 , where 0 < < 1. For completeness of presentation,
S S C*
U C

1 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1
0 1 0 1 1 1 0 1 0 1 1 1 11:3 0 1 0 1 1 1 0 1 0 1 1 1
0 1 1 0 0 0 0 1 1 0 0 0 10:1 0 1 1 0 0 0 0 1 1 0 0 0
1 1 0 1 1 1 1 1 0 1 1 1 01:1 1 1 0 1 1 1 R 1 1 0 1 1 1
0 0 1 1 0 0 0 0 1 1 0 0 00:1 0 0 1 1 0 0 0 0 1 1 0 0
0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1

Fig. 1. An illustration of a recovery of a constant submatrix (shaded boxes), assuming

r = 3

next we give a brief outline of the algorithm for the largest columnwise-constant
submatrix. Rowwise-constant submatrices can be found along the same lines.
Recall that we are given a matrix X {0, 1}nm and the objective of the
algorithm is to discover a columnwise-constant submatrix X(R ,C ) . Let us as-
sume that the submatrix X(R ,C ) is maximal. To simplify the notation, let us
call r |R | and c |C |.
The key idea is the following. Observe that if we knew R , then C could be
determined by selecting the constant columns with respect to R . If instead we
knew C , then R could be obtained by taking the maximal set of rows which
read the same symbol on the columns C . Unfortunately, neither R nor C is
known. Our approach is to sample the matrix by randomly selecting subsets
of columns (or rows), expecting that eventually one of the subsets will overlap
with the solution (R , C ).
In the following we describe how to retrieve the solution by sampling columns
(one has also the choice to sample the rows). First, select a subset S of size k
uniformly at random from the set of columns {1, 2, . . . , m}. Assume for the time
being that S C 6= . If we knew S C , then (R , C ) could be determined
by the following three steps (1) select the string(s) w that appear exactly r
times in the rows of X[1:n,SC ] , (2) set R to be the set of rows in which w
appears and (3) set C to be the set of constant columns corresponding to R .
An example is illustrated in Figure 1.
The algorithm would work, but there are a few problems that need to be
solved. First, the set S C could be empty. The solution is to try several
different sets S, relying on the argument that the probability that S C 6= at
least once will approach one with more and more selections. The second problem
is that we do not really know S C . But, certainly S C S, so our approach
is to check all possible subsets of S. The final problem is that we assumed that we
knew r , but we do not. The solution is to introduce a row threshold parameter,
called r, that replaces r .
As it turns out, we need another parameter to avoid producing submatrices
with small area which could potentially degrade the compressibility at later
Largest Columnwise Constant Submatrix(X, t, k, r, c)
Input: X is a n m matrix over {0, 1}
t is the number of iterations
k is the selection size
r, c are the thresholds on the number of rows and columns, resp.
1 repeat t times
2 select randomly a subset S of columns such that |S| = k
3 for all subsets U S do
4 D all strings composed of either 0 or 1 induced by X[1:n,U ] that
appear at least r times
5 for each string w in D
6 V rows corresponding to w
7 Z all constant columns corresponding to V
8 if |Z| c then save (V, Z)
9 return the (V, Z) that maximizes |V | |Z|

Fig. 2. A sketch of the algorithm that discovers large columnwise-constant submatrices

stages as discussed in Section 5. The column threshold parameter c is used to

discard submatrices whose number of columns is smaller than c. The algorithm
considers all the submatrices which satisfy the user-defined row and column
thresholds as candidates. Among all candidate submatrices, only the ones that
maximize the total area are kept.
A sketch of the algorithm is shown in Figure 2. The algorithm depends on
four key parameters, namely the selection size k, the row threshold r, the col-
umn threshold c, and the number of iterations t. A detailed discussion on how to
choose each of these can be found in [1]. The worst case time complexity of the al-

gorithm Largest Columnwise Constant Submatrix is O tk2k (kn + nm) .
Because of the randomized nature of the approach, there is no guarantee
that the algorithm will find the solution after a fixed number of iterations. We
therefore need to choose t so that the probability that the algorithm will recover
the solution in at least one of the t trials is 1 , where 0 < < 1. In [1], we
proved that the algorithm is able to find the maximal solution with probability
1 when the number of random selections t satisfies

log
t nr (1)
mc
Pk 1 c
mc
m

log k + i=1 1 1 ||i i ki log k

5 Forward transform
As mentioned in the introduction, our strategy to boost the compressibility of
two-dimensional data is to recursively decompose the input matrix based on the
presence of large columnwise-constant or rowwise-constant submatrices found by
the randomized search described above. The input to the recursive decomposition
algorithm is the original matrix X along with user-defined thresholds (r and c)
and the number of iterations t. If one fixes , then the number of iterations t can
be computed using equation (1).
The recursive decomposition is carried out as follows. First, the procedure
Largest Columnwise Constant Submatrix (and possibly also the proce-
dure Largest Rowwise Constant Submatrix) is ran on X. If a constant
submatrix is found, the rest of the matrix is partitioned into two submatrices
depending on the size of the constant submatrices discovered at the next recur-
sion level, as illustrated in Figure 3.
The decision whether to choose the partition (a+c, b) or the partition (a, b+c)
depends on the size of the constant submatrices found in the resulting matrices
a+c, b, b+c, and a. Let us call A1 , A2 , B1 , B2 the areas of the constant subma-
trices found in a+c, b, a, b+c, respectively. Based on the values of A1 , A2 , B1 ,
and B2 we studied three distinct criteria to determine the partition. The first is
based on the condition A1 + A2 > B1 + B2 (hereafter called sum). The second
and the third tests are max{A1 , A2 } > max{B1 , B2 } (called max) and A1 > B2
(called indiv3). In all cases, if the test is true the algorithm chooses the partition
(a + c, b). Otherwise, the algorithm chooses the partition (a, b + c). A discussion
on how the test type affects the final compressed size is reported in Section 6.
Once the partition is determined, the randomized search is performed re-
cursively on the newly formed matrices in the same manner. The recursion
stops when the matrix becomes non-decomposable. We say that a matrix is
non-decomposable if either it has less than r rows or less than c columns, or if
the largest constant submatrix contained in it is smaller than r c.
The reason behind our choice of splitting in (a + c, b) or (a, b + c), instead of
(a, b, c) is the following. Each time the algorithms partitions the matrix, we risk
to split large constant submatrices that we could have potentially found later.
The smaller is the number of matrices we split, the higher are the chances of
finding large constant submatrices. Experimental results (not shown) confirmed
our choice.
It should be noted that the user-defined thresholds (r and c) play an im-
portant role in the transform. If the thresholds are too low, there is a danger
of having a deep recursion tree and potentially finding a large number of tiny
constant submatrices. If the thresholds are too high, there will be just a few con-
stant submatrices. Both cases will have a negative impact on the compression.
An experimental study regarding the choice of these thresholds is reported in
Section 6.

6 Implementation, Experiments and Results

We now describe how the transformed data is represented. Clearly, each constant
submatrix can be represented very succinctly. The column indices of columnwise-
constant submatrices are reordered so that each row reads 00. . . 0011. . . 11. Thus,
3
note that in this latter case we do not need to search in b and a.
find b
original largest reorder
matrix uniform a c
submatrix

a b

b c
a c

(a+c,b) (a,b+c)
decomposition decomposition

Fig. 3. Illustration of one step of the forward transform. Depending on the size of the
constant submatrices in a+c, b, a, b+c either the decomposition (a + c, b) or (a, b + c)
is chosen.

each constant submatrix can be represented by the list of rows and column
indices, and where the transition from 0 to 1 takes place. Non-decomposable
submatrices are saved contiguously in row-major order. The content of non-
decomposable matrices is saved in a file called string.
Row and column indices of constant and non-decomposable submatrices are
saved in another file called index. For each set of row and column indices, the
first index is saved as it is, while the rest is saved as differences between adjacent
indices. The length file is used to record the number of rows and the number of
columns for constant and non-decomposable submatrices, along with a binary
flag to indicate whether the submatrix is constant or non-decomposable.
The information contained in the files string, index and length allows one
to invert the transform and reconstruct the original matrix. The inverse trans-
form is simple and extremely fast. Basically, the matrix is reconstructed element
by element in the order of the indices stored in index. The inverse transform was
implemented and tested to make sure that we had all the information necessary
to recover the original matrix. The time complexity of the inverse transform is
linear in the size of input.
In order to determine whether the transform improves compression, we com-
pared the size of the file obtained by compressing the original matrix against the
overall size of the files string, index and length compressed with the same pro-
gram. We employed two popular lossless compression algorithms, namely gzip
and bzip2.
We tested the three criteria (sum, max, indiv) discussed in Section 5 on sev-
eral images and simulated data. The result on the image bird is reported in
Figure 4 for different choices of the thresholds. In the majority of our experi-
2200
sum + gzip sum + bzip 2

2100 indiv + gzip indiv + bzip 2

max + gzip max + bzip 2

2000

Final size (bytes)

1900

1800

1700

1600

1500
10 20 30 40 50 60 70 80 90 100 110 120
Threshold

Fig. 4. The performance of the algorithm on the image bird for different strategies
(sum, max, indiv) in choosing how to partition the matrix

filename gzip transform+gzip bzip2 transform+bzip2

matrix1 11,121 10,041 11,014 10,197
matrix2 11,111 9,536 11,051 9,712
matrix3 11,094 8,989 10,951 9,194
matrix4 11,061 8,395 10,919 8,530

Table 1. Results on 256 256 synthetic data. File matrixi contains i embedded
columnwise-constant submatrices of size 6464. Parameters: r = 10, c = 10, t = 10, 000

ments, the strategy indiv appeared to be the best. Therefore, all experimental
tests that follow employ the indiv test.

6.1 Simulations on synthetic data

We generated several datasets, each composed of four random matrices of size

256 256 over a binary alphabet. In each of the four matrices we embedded one,
two, three, and four columnwise-constant submatrices of size 6464, respectively.
The position and the content of each embedded submatrix were randomly chosen
under the condition that the submatrices did not overlap with each other.
For each matrix we compared the compression size obtained with gzip and
bzip2 before and after the transform. The performance of the transform was
measured on several datasets. Table 1 shows the results averaged on all datasets.
The results are very stable with respect to the choices of r and c. Any choice in
the range 10 to 60 produces almost identical results. The number of iterations t
was set to 10, 000.
The goal of these simulations was twofold. First, it allowed us to test the
ability of our randomized search to recover the embedded submatrices. Failures
in recovering all the submatrices are typically due to the recursive partitioning.
If the partitioning process happens to split an embedded submatrix, there is no
filename size gzip transform+gzip bzip2 transform+bzip2
bird 65,792 1,978 1521 1,778 1581
camera 65,792 4,330 3693 3,839 3664
lena 65,792 3,450 3186 3,026 2930
peppers 65,792 3,186 2941 2,757 2671
tulips 65,792 5,133 4695 4,483 4329

Table 2. Comparing the compressibility of 256 256 binary images before and after
the transform. Threshold parameters r = 60, c = 60. Iterations t = 10, 000

hope of recovering it as a single piece. This observation is behind the idea of

partitioning into (a + c, b) or (a, b + c) instead of partitioning into. (a, b, c). That
may avoid splitting a potentially large constant submatrix that lies in a + c or
b+c
Second, this synthetic data is arguably the most favorable type of data for
our transform. The background of the matrix is random, and therefore there
are very few dependencies between rows and columns. The large majority of the
dependencies are the ones created by the embedding of the constant submatrices.
In some sense, this data represents the best case scenario. This is shown by a
considerable improvement in the file size after the transform is applied.

6.2 Experiments on digital binary images

In order to determine whether the transform can boost the compressibility of

general data, we tested the transform on five 256 256 images downloaded from
the Internet (namely bird, camera, lena, peppers, and tulips, all of which
are commonly used in the data compression community). Each 8 bpp greyscale
image was converted to binary by setting to black each pixel whose brightness
was below 128 and to white each pixel above 128. Table 2 shows the compression
results before and after the transform (for r = 60, c = 60, t = 10, 000). In all the
images we tested, the transform improves the lossless compression downstream.
In these experiments, we considered only columnwise-constant matrices. We
tested an implementation of the transform that also searches for rowwise-constant
matrices, but it did not boost the compressibility further.
Although a comparison of the results in Table 2 against a specialized lossless
image tool, say JBIG, would appear appropriate, it is not. Our transform is (1)
general purposed (i.e., not optimized for digital images), and (2) not a complete
compression tool. As said in the introduction, we do not deal with the encoding
problem (we are in fact relying on gzip and bzip2), nor we are necessarily bound
to process digital images. If we were to compare the results of the table against
JBIG, the performance of gzip/bzip2 would also part of the equation, and these
tools have not been designed specifically to compress digital images.
Next, we tested how sensitive is the transform to the choice of the param-
eters r and c, and to the number of iteration t. We selected the image bird,
45 18000

40 t=1,000 t=10,000 16000 t=1,000 t=10,000

35 14000

30 12000

Average area
25 10000
Count

20 8000

15 6000

10 4000

5 2000

0 0
10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Threshold Threshold

Fig. 5. LEFT: Number of columnwise-constant submatrices in image bird. RIGHT:

Average area of columnwise-constant submatrices in image bird

1 2300

0.9 t=1,000 t=10,000 gzip t=1,000

2200
gzip t=10,000
0.8 bzip t=1,000
2100
0.7 bzip t=10,000

Final size (bytes)

2000
0.6
Proportion

0.5 1900

0.4
1800
0.3
1700
0.2
1600
0.1

0 1500
10 20 30 40 50 60 70 80 90 100 110 120 10 20 30 40 50 60 70 80 90 100 110 120
Threshold Threshold

Fig. 6. LEFT: Proportion of columnwise-constant submatrices in image bird. RIGHT:

Comparing the final compression size for the image bird for different choices of the
threshold parameters

and we ran the transform on different parameter choices. We computed the to-
tal number and the average area of the columnwise-constant submatrices found
(Figure 5), for several choices of r = c and for two values for t. We also recorded
the total proportion of the matrix which was covered by columnwise-constant
submatrices (the rest is non-decomposable), and the final size of the files after
compression (Figure 6). Observe that when the thresholds are low, the propor-
tion of the matrix covered by columnwise-constant submatrices is quite high.
However with low thresholds, the transform finds a large number of columnwise-
constant submatrices which average area is low (Figure 5), which in turn results
in large file sizes for index and length. Compared to the file string, files index
and length are considerably harder to compress. Therefore, the consequence of
choosing thresholds too low is poor compression boosting. Good compression
relies on finding a balance between the gain of representing a portion of the
matrix a single bit and the cost of adding the extra information necessary to
reconstruct the original matrix.
The optimal value of the thresholds r = c for the image bird is around 40, but
other values in the range 40 to 70 achieve very similar results. We carried out the
same analysis on other 256 256 images, and the same general considerations
apply. With respect to the final compression, in most cases the larger is the
number of iterations t, the better is the compression.

References
1. Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections.
In: Proceedings of Symposium on Combinatorial Pattern Matching (CPM04). Vol-
ume 3109 of LNCS., Istanbul, Turkey, Springer (2004) 7488
2. Storer, J.A., Helfgott, H.: Lossless image compression by block matching. Comput.
J. 40(2/3) (1997) 137145
3. Buchsbaum, A.L., Caldwell, D.F., Church, K.W., Fowler, G.S., Muthukrishnan,
S.: Engineering the compression of massive tables: an experimental approach. In:
Proceedings of the ACM-SIAM Annual Symposium on Discrete Algorithms, San
Francisco, CA (2000) 213222
4. Buchsbaum, A.L., Fowler, G.S., Giancarlo, R.: Improving table compression with
combinatorial optimization. In: Proceedings of the ACM-SIAM Annual Symposium
on Discrete Algorithms, San Francisco, CA (2002) 175184
5. Vo, B.D., Vo, K.P.: Using column dependency to compress tables. In Storer, J.A.,
Cohn, M., eds.: Data Compression Conference, Snowbird, Utah, IEEE Computer
Society Press, TCC (2004) 92101
6. Galli, N., Seybold, B., Simon, K.: Compression of sparse matrices: Achieving
almost minimal table sizes. In: Proceedings of Conference on Algorithms and
Experiments (ALEX98), Trento, Italy (1998) 2733
7. Bell, T., McKenzie, B.: Compression of sparse matrices by arithmetic coding. In
Storer, J.A., Cohn, M., eds.: Data Compression Conference, Snowbird, Utah, IEEE
Computer Society Press, TCC (1998) 2332
8. McKenzie, B., Bell, T.: Compression of sparse matrices by blocked Rice coding.
IEEE Trans. Inf. Theory 47(3) (2001) 1223 1230
9. Johnson, D.S., Krishnan, S., Chhugani, J., Kumar, S., Venkatasubramanian, S.:
Compressing large boolean matrices using reordering techniques. In: To appear in
Proceedings of International Conference on Very Large Data Bases (VLDB 2004),
Toronto, Canada (2004)
10. Chakrabarti, D., Papadimitriou, S., Modha, D., Faloutsos, C.: Fully automatic
cross-assocations. In: Proceedings of the Eighth ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining (KDD-04), ACM Press (2004)
8998
11. Peeters, R.: The maximum-edge biclique problem is NP-complete. Technical Re-
port 789, Tilberg University: Faculty of Economics and Business Adminstration
(2000)
12. Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S.: On bipartite and
multipartite clique problems. Journal of Algorithms 41 (2001) 388403
13. Hochbaum, D.S.: Approximating clique and biclique problems. Journal of Algo-
rithms 29(1) (1998) 174200

View publication stats

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Computer Science Extended Essay
No ratings yet
Computer Science Extended Essay
15 pages
DC 3
No ratings yet
DC 3
15 pages
Image Compression Through Combination Advantages From Existing Techniques
No ratings yet
Image Compression Through Combination Advantages From Existing Techniques
7 pages
Sort Comp
No ratings yet
Sort Comp
10 pages
A Survey On Different Text Data Compress
No ratings yet
A Survey On Different Text Data Compress
4 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
MM Unit-III - 0
No ratings yet
MM Unit-III - 0
22 pages
Image Compression Techniques: H.S Samra
No ratings yet
Image Compression Techniques: H.S Samra
4 pages
Image Compression Techniques: H.S Samra
No ratings yet
Image Compression Techniques: H.S Samra
4 pages
Project 1 - Image Compression
No ratings yet
Project 1 - Image Compression
13 pages
Data Compression
No ratings yet
Data Compression
6 pages
SMT Project 3-1
No ratings yet
SMT Project 3-1
17 pages
Thesis On Jpeg Image Compression
100% (3)
Thesis On Jpeg Image Compression
7 pages
Literature Review On Investigation of Artificial Intelligence Methods in Image Analytics and Computer Vision
No ratings yet
Literature Review On Investigation of Artificial Intelligence Methods in Image Analytics and Computer Vision
10 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
An Approach For Compressing Digital Images by Using Run Length Encoding
No ratings yet
An Approach For Compressing Digital Images by Using Run Length Encoding
3 pages
Image Compression Using DCT
100% (1)
Image Compression Using DCT
10 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ossy Compression Schemes Based On Transforms A Literature Review On Medical Images
No ratings yet
Ossy Compression Schemes Based On Transforms A Literature Review On Medical Images
7 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Lossless Image Compression Algorithm For Transmitting Over Low Bandwidth Line
No ratings yet
Lossless Image Compression Algorithm For Transmitting Over Low Bandwidth Line
6 pages
Tipe
No ratings yet
Tipe
7 pages
Imagemanipulation PDF
No ratings yet
Imagemanipulation PDF
8 pages
(IJCST-V2I4P27) Author: Ritu, Puneet Sharma
No ratings yet
(IJCST-V2I4P27) Author: Ritu, Puneet Sharma
5 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Implementation of Image and Audio Compression Techniques Using
No ratings yet
Implementation of Image and Audio Compression Techniques Using
26 pages
Information Theory and Coding: Submitted by
No ratings yet
Information Theory and Coding: Submitted by
12 pages
farhan-2018-ijca-916406
No ratings yet
farhan-2018-ijca-916406
5 pages
Performance Analysis of Image Compression Using Discrete Wavelet Transform
No ratings yet
Performance Analysis of Image Compression Using Discrete Wavelet Transform
6 pages
Haweel2014 Ess
No ratings yet
Haweel2014 Ess
6 pages
IJCST V4I3P43 With Cover Page v2
No ratings yet
IJCST V4I3P43 With Cover Page v2
7 pages
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Improvised GZIP Published Eai.1!10!2019.160599
No ratings yet
Improvised GZIP Published Eai.1!10!2019.160599
8 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Content-Based Textual Big Data Analysis and Compression: Fei Gao Ananya Dutta Jiangjiang Liu
No ratings yet
Content-Based Textual Big Data Analysis and Compression: Fei Gao Ananya Dutta Jiangjiang Liu
6 pages
Discrete Wavelet Transform For Image Processing: International Journal of Emerging Technology and Advanced Engineering
No ratings yet
Discrete Wavelet Transform For Image Processing: International Journal of Emerging Technology and Advanced Engineering
5 pages
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
From Everand
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
Fouad Sabry
No ratings yet
Liang 2008
No ratings yet
Liang 2008
9 pages
Digital: A Block-Sorting Lossless Data Compression Algorithm
No ratings yet
Digital: A Block-Sorting Lossless Data Compression Algorithm
24 pages
Mfcs 99 Self Arch
No ratings yet
Mfcs 99 Self Arch
15 pages
Efficient Content Extraction in Compressed Images
No ratings yet
Efficient Content Extraction in Compressed Images
7 pages
A19 Image Compression Using Pca Technique
No ratings yet
A19 Image Compression Using Pca Technique
14 pages
Image Compression Using Huffman Coding
No ratings yet
Image Compression Using Huffman Coding
25 pages
Thesis On Image Compression Using Wavelet Transform
100% (2)
Thesis On Image Compression Using Wavelet Transform
5 pages
An Approach For Compressing Digital Images by Using Run Length Encoding
No ratings yet
An Approach For Compressing Digital Images by Using Run Length Encoding
14 pages
V3i1 0198
No ratings yet
V3i1 0198
5 pages
Data Compresion 1
No ratings yet
Data Compresion 1
2 pages
Image Compression 2
No ratings yet
Image Compression 2
24 pages
Learning-Driven Lossy Image Compression A Comprehensive Survey
No ratings yet
Learning-Driven Lossy Image Compression A Comprehensive Survey
14 pages
Compressed Sparse Row (CSR) : Kiarash Torkian
No ratings yet
Compressed Sparse Row (CSR) : Kiarash Torkian
4 pages
Data Compression Report
No ratings yet
Data Compression Report
10 pages
My Project
No ratings yet
My Project
89 pages
Image Compression Algorithm PDF
No ratings yet
Image Compression Algorithm PDF
6 pages
An Approach For Compressing Digital Images by Using Run Length Encoding
No ratings yet
An Approach For Compressing Digital Images by Using Run Length Encoding
6 pages
Journal of Visual Communication and Image Representation Volume 59 Issue 2019 (Doi 10.1016 - J.jvcir.2018.12.043) Yuan, Shuyun Hu, Jianbo - Research On Image Compression Technology Based On Huffman
No ratings yet
Journal of Visual Communication and Image Representation Volume 59 Issue 2019 (Doi 10.1016 - J.jvcir.2018.12.043) Yuan, Shuyun Hu, Jianbo - Research On Image Compression Technology Based On Huffman
6 pages
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
The Power of Algorithms - From BWT To Bzip
No ratings yet
The Power of Algorithms - From BWT To Bzip
21 pages
unit 5 data compression
No ratings yet
unit 5 data compression
98 pages
Thesis - Project Report
No ratings yet
Thesis - Project Report
45 pages
Chapter 10
No ratings yet
Chapter 10
235 pages
Image Compression-Decompression Technique Using Arithmetic Coding
No ratings yet
Image Compression-Decompression Technique Using Arithmetic Coding
12 pages
UCI 303 LESSON NOTES-1
No ratings yet
UCI 303 LESSON NOTES-1
49 pages
Complete CS
No ratings yet
Complete CS
50 pages
Natural Language: Anguage Odels
No ratings yet
Natural Language: Anguage Odels
28 pages
Computer Architecture 1.1. Basic Computer Arch...
No ratings yet
Computer Architecture 1.1. Basic Computer Arch...
14 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
CS Principles Curriculum Guide 2019 - 2020
No ratings yet
CS Principles Curriculum Guide 2019 - 2020
60 pages
Multimedia Data Processing Questions
No ratings yet
Multimedia Data Processing Questions
3 pages
Huffman Coding Using MATLAB (PoojaS)
75% (4)
Huffman Coding Using MATLAB (PoojaS)
20 pages
EmSys Module 3
No ratings yet
EmSys Module 3
70 pages
EC8093 Notes KSN - by WWW - Easyengineering.net 4
No ratings yet
EC8093 Notes KSN - by WWW - Easyengineering.net 4
137 pages
Multimedia Multimedia Bible
0% (2)
Multimedia Multimedia Bible
163 pages
Chapter Three Source Coding: 1-Sampling Theorem
No ratings yet
Chapter Three Source Coding: 1-Sampling Theorem
19 pages
D2: The Impact That File Format, Compression Techniques, Image Resolution and Colour Depth Have On File Size and Image Quality
No ratings yet
D2: The Impact That File Format, Compression Techniques, Image Resolution and Colour Depth Have On File Size and Image Quality
5 pages
Autoencoder Report 1
No ratings yet
Autoencoder Report 1
34 pages
Lossless Image Compression Using Matlab: Bachelor Thesis Electrical Engineering June 2020
No ratings yet
Lossless Image Compression Using Matlab: Bachelor Thesis Electrical Engineering June 2020
49 pages
Text Data Compression
No ratings yet
Text Data Compression
13 pages
Notes Chapter 1 Data Representation
No ratings yet
Notes Chapter 1 Data Representation
32 pages
DAA Viva Questions
No ratings yet
DAA Viva Questions
4 pages
Data Representation
No ratings yet
Data Representation
21 pages
Cambridge IGCSE Computer Science Notes P
No ratings yet
Cambridge IGCSE Computer Science Notes P
5 pages
Unit 2
No ratings yet
Unit 2
37 pages
FitGirl Repacks - Page 9 of 358 - The ONLY Offici
No ratings yet
FitGirl Repacks - Page 9 of 358 - The ONLY Offici
2 pages
AP Exam & Final Study Guide
No ratings yet
AP Exam & Final Study Guide
55 pages
Lossless Data Compression Using Neural Networks
No ratings yet
Lossless Data Compression Using Neural Networks
5 pages
21bce9616 Dip Lab9
No ratings yet
21bce9616 Dip Lab9
6 pages
Data Compression Online MCQ Test
No ratings yet
Data Compression Online MCQ Test
6 pages

A Compression-Boosting Transform For Two-Dimension

Uploaded by

A Compression-Boosting Transform For Two-Dimension

Uploaded by

See

A Compression-Boosting Transform for Two-

Conference Paper in Lecture Notes in Computer Science June 2006

Hardware Acceleration View project

The user has requested enhancement of the downloaded file.

Qiaofeng Yang1 , Stefano Lonardi1 , and Avraham Melkman2

Abstract. We introduce a novel invertible transform for two-dimensional

3 Notations and problem definition

The input to the transform is a two-dimensional n m matrix X {0, 1}nm.

X[i1 ,j1 ] X[i1 ,j2 ] X[i1 ,jl ]

A submatrix induced by a pair (R, C) is called columnwise-constant (resp.,

4 Finding the largest columnwise-constant submatrix

Fig. 1. An illustration of a recovery of a constant submatrix (shaded boxes), assuming

Fig. 2. A sketch of the algorithm that discovers large columnwise-constant submatrices

stages as discussed in Section 5. The column threshold parameter c is used to

6 Implementation, Experiments and Results

2100 indiv + gzip indiv + bzip 2

max + gzip max + bzip 2

Final size (bytes)

filename gzip transform+gzip bzip2 transform+bzip2

6.1 Simulations on synthetic data

We generated several datasets, each composed of four random matrices of size

hope of recovering it as a single piece. This observation is behind the idea of

6.2 Experiments on digital binary images

In order to determine whether the transform can boost the compressibility of

40 t=1,000 t=10,000 16000 t=1,000 t=10,000

Fig. 5. LEFT: Number of columnwise-constant submatrices in image bird. RIGHT:

0.9 t=1,000 t=10,000 gzip t=1,000

Final size (bytes)

Fig. 6. LEFT: Proportion of columnwise-constant submatrices in image bird. RIGHT:

View publication stats

You might also like