Integrated Digital Architecture For JPEG Image Compression: Luciano Agostini and Sergio Bampi
Integrated Digital Architecture For JPEG Image Compression: Luciano Agostini and Sergio Bampi
coding to decrease the number of bits used to represent the image [1]. The JPEG compression is a lossy compression, since downsampling and quantization operations are irreversible [3]. But the losses can be controlled in order to keep the necessary image quality. The paper sections present the architectures used in each one of the four parts mentioned above, their VHDL description and their synthesis results. The compressor architecture operates in pipeline, whose design is also addressed.
The Join Photographic Expert Group proposed the JPEG compression standard and the complete standard documentation can be found in [1]. This paper focuses only in the hardware implementation of a subset of the JPEG standard called baseline [2,3]. The baseline is the mode widely used in both software and hardware versions of the JPEG compression. The JPEG baseline can be divided into five main steps, as shown in Fig. 1: color space conversion, downsampling, 2-D DCT, quantization and entropy coding. This paper will present the architectures for these five modules. The first two operations are integrated in a single architecture.
Figure 1 Steps of a JPEG baseline compression The color space conversion transforms the RGB coding to the YCbCr color coding. The downsampling operation reduces the sampling rate of the color information (Cb and Cr). The 2-D DCT transform the pixel data from the spatial domain to the frequency domain. The quantization operation eliminates the high frequency components and the small amplitude coefficients of the co-sine expansion. Finally, the entropy coding uses run-length encoding (RLE), Huffman, variable length coding (VLC) and differential
Universidade Federal do Rio Grande do Sul, Microeletronics Group, P.O.Box 15 064, Porto Alegre, Brazil. E-mail: {agostini, bampi}@inf.ufrgs.br Tel: +55 (51) 316-6812, Fax: +55 (51) 319-1576.
III-181
and
Figure 2 Integrated architecture for color space conversion and downsampling Table 1 presents the VHDL description results. The mapping in Altera [4] FPGAs was made using Flex10KE devices.
Logic Cells Color Space Converter and Downsampler 441 Period (ns) 49,7 Code Lines 869
f0 = e0 f3 = e5 e6 f6 = e2 + e7
f1 = e1 f4 = e3 + e8 f7 = e4 + e7 Step 6
f2 = e5 + e6 f5 = e8 e3
Table 1 VHDL mapping results for the integrated color space conversion and downsampling module.
S0 = f0 S3 = f5 f6 S6 = f3
S1 = f4 + f7 S4 = f1 S7 = f4 f7
S2 = f2 S5 = f5 + f6
Table 2 Scaled 1-D DCT algorithm The input data in each step of the scaled algorithm is stored in ping-pong buffers to make possible the use of just one operator per step. This architecture operates in a 48-stages pipeline. One 8x8 input matrix is calculated at every 64 clock cycles with the pipeline full and with a pipeline latency of 48 cycles. A transpose buffer connects the two 1-D DCT architectures. This buffer was designed with two small 64-word RAMs. When the first 1-D DCT architecture writes the results line by line in one memory, the second 1-D DCT architecture reads the input values column by column from the other memory. The 2-D DCT VHDL results are presented in Table 3. Altera FPGA of Flex 10KE family was used.
Logic Cells 1-D DCT 1 1-D DCT 2 Trans.Buf. 2-D DCT 2051 2473 274 4792 Period (ns) 73,2 80,8 36,5 78,1 Memory Bits 0 0 1408 1408 Lines of Code 2446 2468 439 5353
III-182
4 Quantization
The quantization operation is an integer division of the 2-D DCT coefficients by pre-defined values. These pre-defined values are stored in tables called quantization tables. In JPEG baseline mode there are two quantization tables: one for luminance components (Y) and another for chrominance components (Cb and Cr). The optimum values of the components in quantization tables are dependent on the application, but the JPEG standard suggests typical tables that have a good efficiency for any application. This operation eliminates the 2-D DCT coefficients that are less perceptible to the human eye. The result of this operation in an 8x8 matrix of 2-D coefficients is a sparse matrix. The quantization architecture designed in this paper is presented in Fig. 4 and uses two ROMs and one multiplier to calculate the quantized coefficients. The values in the standard quantization tables used for divisions were transformed into multiplier values. The multiplier in the quantization has similar architecture to that used in the color space conversion and in the 2-D DCT modules. The barrel shifter control words for each value in the quantization table are stored in ROM.
The mapping of the quantization architecture was done in an Altera Flex 10KE FPGA device. The results of this mapping are presented in Table 4.
Logic Cells Quantization 293 Period Memory Lines of (ns) Bits Code 36,9 1536 676
5 Entropy Coder
The last stage of the JPEG compression is the entropy coding. After the quantization process, the resulting matrix will have a large amount of zero occurrences. This matrix is read in zigzag order to increase the sequences of zeros that are compressed by RLE. In entropy coding the DC and AC coefficients are handled separately, as shown in Fig. 5. The DC component is the first component in an 8x8 matrix (index 0,0) and the AC components are the remaining 63 elements.
Figure 5 Entropy coder Figure 4 Quantization architecture The DC components of successive 8x8 windows in an image have a high degree of correlation. Then the first step in a DC entropy coding is a differential coding
III-183
between the actual DC component and the DC component of the previous matrix. This differential code is coded by VLC were all non-significant bits are discarded (including the signal bit). The differential value is used also to calculate the number of significant bits that are generated by the VLC coder. This operation is made by a size calculation and generates the size field (Fig. 5). The size field is Huffmancoded. There are four Huffman tables in the JPEG baseline operation: one for DC luminance components, one for DC chrominance components, one for AC luminance components and the last one for the AC chrominance components. The values generated by Huffman coder and by VLC coder are concatenated to generate the JPEG DC code (Fig. 5) [7]. The first step in AC components coding is the RLE, which is simplified to be only a zeros counter. The RLE coder generates the run field that represents the number of zeros that precedes a non-zero value. The non-zeros values pass by a VLC coder and by a size calculation. The resulting concatenation of the fields run and size are Huffman-coded. Finally the Huffman coded fields run/size are concatenated with the field amplitude, generated by the VLC coder, to generate the JPEG AC code [7]. The entropy coder architecture is being designed. Architecture implementation has been finished for the differential coder, the size calculation and the RLE encoder. Differential coder is a single subtrator associated with two registers. The size calculation is made with a combinational logic and does not use memory to store the size table [1]. The RLE coder is shown in Fig. 6 and was designed using a zero comparator and one counter. The Huffman coder will use the pre-determined Huffman tables suggested by the JPEG standard [1]. These tables will be stored in the FPGA internal memory. The first VHDL description results for the entropy coder are presented in Table 5.
Logic Cells RLE Size calculation Diferential coding Total (preliminary) 26 18 46 90 Period (ns) 5,8 14,3 4 14,3 Lines of Code 97 70 108 275
6 Conclusions
This paper presented the architecture of the five main modules of the JPEG compression: color space conversion, downsampling, 2-D DCT, quantization and entropy coding. The final results of the synthesis of the modules were also presented. Future work calls for the reutilization of the IP modules herein presented into a full compressor chip (with I/O and memory control functions also embedded in the system). The overall bus and interconnection architecture is to be proposed, while the JPEG modules addressed in this paper will compose a single, dedicated functional unit for this compressor chip. We expect to use a Flex10KE200 and up to 7,000 logic cells for this functional unit. The target clock period for the complete architecture is 80ns, which is a reasonable target, given the results already presented.
References
[1] The International Telegraph and Telephone Consultative Committee (CCITT). Information Technology Digital Compression and Coding of Continuous-Tone Still Images Requirements and Guidelines. Rec. T.81, 1992. [2] W. Pennebaker and J. Mitchell. JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, USA, 1992. [3] J. Miano. Compressed Image File Formats JPEG, PNG, GIF, XBM, BMP, Addison Wesley Longman Inc, USA, 1999. [4] Altera Data Book, Altera Corporation, 1995. [5] Y. Arai, T. Agui and M. Nakajima. A Fast DCT-SQ Scheme for Images. Transactions of IEICE, vol. E71, n. 11, 1988, pp. 1095-1097. [6] M. Kovac and N. Ranganathan. JAGAR: A Fully Pipeline VLSI Architecture for JPEG Image Compression Standard. Proceedings of the IEEE, vol. 83, n. 2, 1995, pp. 247-258. [7] V. Bhaskaran and K. Konstantinides. Image and Video Compression Standards Algorithms and Architectures Second Edition, Kluwer Academic Publishers, USA, 1999.
III-184