0% found this document useful (0 votes)

61 views6 pages

(C) (2004) (SIPS) (Yung-Chi Chang)

This document describes the design of a platform-based MPEG-4 video encoder system-on-chip (SOC). The design uses a RISC core with hardware accelerators and efficient memory organization. A motion estimator supporting predictive diamond search and spiral full search is implemented to balance compression performance and design cost. Key modules are integrated into an efficient platform using hardware/software co-design. The SOC consumes 256.8mW at 40MHz while encoding 30 CIF frames per second.

Uploaded by

binazhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views6 pages

(C) (2004) (SIPS) (Yung-Chi Chang)

Uploaded by

binazhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PLATFORM-BASED MPEG-4 VIDEO ENCODER SOC DESIGN

Yung-Chi Chang, Wei-Min Chao and Liang-Gee Chen

DSP/IC Design Lab

Department of Electrical Engineering and Graduate Institute of Electronics Engineering
National Taiwan University, Taipei, Taiwan, R.O.C.

ABSTRACT heavily relied for efﬁcient implementations and data access

An MPEG-4 video coding SOC design is presented in this reduction. For other coding tasks, including DCT/IDCT,
paper. We adopt platform-based architecture with an em- Q/IQ, and MC, dedicated architectures can be adopted for
bedded RISC core and efficient memory organization. A these highly regular tasks. Programmable architectures are
motion estimator supporting predictive diamond search and suitable for the other less-demanding but high-level task,
spiral full search is implemented for compromise between such as system control.
compression performance and design cost. The proposed In this paper, a RISC-based platform with hardware ac-
data reuse scheme reduces required memory access band- celerators is presented to implement MPEG-4 video encod-
width. Several key modules are integrated into an efficient ing algorithms. The optimization in both algorithm and ar-
platform in hardware/software co-design fashion. The cost- chitecture level is applied. Not only the key components
efficient video encoder SOC consumes 256.8mW at 40MHz but also the connection optimization and memory organiza-
and achieves real-time encoding of 30 CIF (352x288) frames tion are discussed in this paper. The whole system is di-
per second. vided into three main subsystems. In motion subsystem,
the hybrid motion estimator supporting both predictive di-
amond search and spiral full search with halfway termina-
1. INTRODUCTION
tion for real-time or high compression quality applications
MPEG-4 standard is becoming the main technique of the are proposed to reduce the dominant cost in the typical cod-
mobile devices and streaming video applications such as ing system. In texture subsystem, the efficient interleaving
smart phone and handheld PDA devices. The improved schedule and substructure sharing technique among quanti-
coding efficiency and advanced functionalities of MPEG- zation and DC/AC prediction are proposed [7] to reduce the
4 come with much higher computational complexity com- cost further. In bitstream subsystem, to handle the complex
pared with previous standards. Several MPEG-4 video chips bitstream syntax and avoid inefficient bit-level storage, the
have been reported. To satisfy rich functionality of future hardware/software co-operations scheme is applied for the
multimedia, some are implemented in software [3] based on bitstream generation. By applying these optimization ap-
the low-power DSP platform. They have highest flexibility proaches, a low cost and high performance MPEG-4 video
but degraded quality due to the fast algorithms of ME and encoder SOC is implemented.
DCT. Some [4] use the dedicated hardware methodology to This paper is organized as follows. In Sec. 2, the whole
achieve low power and low area cost. Lack of potential for system architecture will be explored. The system memory
future modification of advanced algorithms and higher de- organization will be discussed in Sec. 3. In Sec. 4, we will
sign effort are disadvantages. Hence, some [5] [6] adopted present the algorithm, architecture, and performance of the
the hybrid software/hardware co-design to compromise the motion estimator. The implementation result will be shown
performance and flexibility. in Sec. 5. In Sec. 6, a brief conclusion will be given.
According to the computational complexity analysis re-
ported in [1] and [2], the dominating computation-intensive 2. SYSTEM ARCHITECTURE
tasks in MPEG-4 core profile coding are motion estima-
tion(ME) and shape encoding, which together contribute Fig. 1 depicts the proposed platform-based MPEG-4 video
more than 90% of the overall complexity. For simple profile coding system. RISC takes responsibility for MB level hard-
without shape coding tools, ME becomes the most signifi- ware scheduling, coding mode decision, motion vector cod-
cant one. It belongs to highly regular low-level task, and ing, and other high level procedures. Other hardware ac-
a huge amount of data access through frame buffer is also celerators improve the system performance by parallel pro-
required. So, dedicated architectures and local buffers are cessing according to the parallelism of algorithms. Mo-

0-7803-8504-7/04/$20.00 ©2004 IEEE 251 SIPS 2004

BITSTREAM Frame-based map
PROGRAM DATA SHARE Word-based map
Coding
Source Coding
Processing Bitstream
frame Processing

RISC Cache DMA MEM IF

RISC Bus Block-based map

Motion Motion Share Texture Bitstream Reconstructed

Estimaotr Compensator Memory Block Engine Generator frame

Share Bus
Data Bus Soruce Other
Reconstructed frame Bitstream
frame Information
Physical address of external memory

Fig. 1. System Architecture

Fig. 2. Heterogeneous memory organizations

tion estimator (ME) carries out motion estimation with the

search range -16.0 to +15.5 pixel unit. Motion compensator store it from the register to the memory, it results in signifi-
(MC) interpolates pixels in reference frames into compen- cant reduction in the code size. To achieve cycle-accurately
sated blocks by specified motion vectors. Texture block controlling, an inner-timer and polling technique are intro-
engine (TBE) carries out discrete cosine transform (DCT), duced. A special instruction, WAIT, is used to support this
inverse cosine transform (IDCT), quantization (Q), inverse functionality. While the RISC encounters the WAIT instruc-
quantization (IQ), and AC/DC prediction on texture pixels tions, it waits until the next trigger events.
in block unit. Bitstream generator (BTS) produces headers,
motion information, and texture information in the format 3. MEMORY ORGANIZATION
of variable length codes. In addition, share memory builds
the direct channels from MC to TBE and BE to BTS to de- We have off-chip memory and several on-chip memory blocks.
crease the traffic of the data bus. DMA involved in dedi- Off-chip memory contains source frames, reconstructed frames,
cated commands efficiently generates the proper addresses and AC/DC information. On-chip memory is used as local
issued by RISC. Four global bus channels are used in this buffers to reduce the bus bandwidth. Due to the penalty
system. First, RISC bus broadcasts controlling information of irregular accessing to and from off-chip RAM, we ac-
to each hardware modules. After applying operations is- cess off-chip RAM more successively by using random ac-
sued by RISC, hardware modules respond processed side cess on-chip RAM. For MPEG-4 video coding, block-based
information for MB coding mode decision at RISC. At the memory organization is efficient to burst reading a block of
same time source, reference, and reconstructed frames re- data for video processing. However, the common video in-
quired by hardware modules are passed through DMA and put/output devices usually adopt the raster scan direction.
then provided by DATA bus. Hardware modules efficiently It makes addressing more regular if frame data is arranged
access the data automatically according to pre-determined in frame-based scheme. Therefore, we use heterogeneous
scheduling. These parts are integrated into a single chip memory organization for off-chip RAM as shown in Fig. 2.
with the firmware stored outside for programmability through The source frames are stored in the frame-based way, while
PROGRAM bus after taped out. SHARE bus can transfer the reconstructed frames are store in the block-based way
DCT coefficients, quantized coefficients, or other immedi- for processing in the future. The bitstream data and AC/DC
ate information in the testing mode. The developing time information is arranged as traditional 1-D addressing. After
and effort can be reduced through this information. this arrangement, the data access to/from off-chip will more
The RISC core contains four stages pipeline with sep- consecutive.
arated program and data memory. Its instruction set is 21 Fig. 3 shows four types off-chip memory access. Each
bits. The special 2-operand MAX and MIN instruction is in- store unit (word) consists of four pixels. In case (a) the
cluded for the median operations for MV predictor decision. search window (SW) data of 48x48 pixels for ME are loaded
Besides, a hardwired datapath for multiplication and divi- from the previous reconstructed frame. In case (b) the 9x9
sion is also provided. We also propose an immediate store reference blocks are required for half-pixel MC. Reading
instruction (SWI) to send a specified data to memory. Com- in vertical direction can reduce the frequency of crossing
pared to traditional approach, which requires one instruc- neighboring blocks. In case (c) data are read out from source
tion to move data into a register and then the other one to frames in frame-based organization. In case (d) the 8x8 re-

252
(1) determine the motion vector predictor

A B

C X

predictor of macroblock X = median of motion vectors of

macroblock A, B, and C
(a) (b) (c) (d)

(2) integer-pel motion estimation (16x16 block size) and then half-pixel refinement
Fig. 3. Memory Access Scheme (a) SW for ME (b) Block
for MC (c) Block to TBE (d) Block from TBE FFS PDS

half-pel refinement
predictor
or
constructed block is burst written to the block-based organi-
zation region of external memory without any segmentation. initial phase
The input video source, reconstructed frames, and trans- refinement phase (large diamond)
spiral to all search area last phase (small diamond)
formed coefficients for AC/DC prediction are stored in the
external memory. Direct Memory Access (DMA) plays a
role to control memory interface (MIF) to read data from or (3) local motion estimation (8x8 block size) and then half-pel refinement
write data to the external memory in a specified sequence af-
ter being initialized by RISC. For this kind of data-intensive
applications, DMA always have a heavy load to handle the
traffic through the data bus. Therefore, three special func-
tions are involved in DMA to reduce addressing overhead
half-pel refinement
and to provide pixel data more efficiently. It not only can
improve the data access but also decrease the complexity
of address generation in other hardware modules. First, the
addressing generation combines the conversion process of
2-D to 1-D address. Second, the advanced prediction mode
allows motion vectors to point out of the VOP and the data Fig. 4. Algorithms of motion estimation
is padded from the boundary pixels in this situation. DMA
handles this problem of boundary data for ME and MC units
that can focus on the current processing MB. Third, spe-
archy scheme is applied for the motion estimation for four
cial addressing for half-pixel precision compensation is sup-
8x8 pixels blocks in a MB around +2 to -2 positions of the
ported. Due to the half-pixel precision for motion compen-
previous best motion vector. The half-pixel refinement is
sation, the compensated block is read out in 9 by 9 pixels
also applied for all found integer-pixel motion vectors. The
and may occupy the four blocks in the block-based memory
whole stages of motion estimation is described as follows.
organization. This kind of fixed addressing is designed in
The predictor is determined from neighboring MBs. The
the control unit of the DMA to improve the performance.
PDS mode or FFS mode is employed to find the integer
pixel motion vectors. The half-pixel refinement is applied
4. MOTION ESTIMATOR DESIGN around the motion vector found in the phase 2. For four 8x8
pixel blocks in a MB, the spiral search around -2 to +2 is ap-
4.1. Algorithm plied to obtain four optimal motion vectors. Four times of
To meet the requirement of various applications under the half-pixel refinement is applied around the motion vectors
acceptable cost, we adopt two kinds of algorithms for the found in the previous phases.
motion estimation of 16x16 block size at integer-pixel pre- Fig. 4 depicts the whole stages of motion estimation
cision. One is the spiral full search with halfway termina- and describes as follows. The predictor is determined from
tion (called fast full search, FFS) which can achieve the neighboring MBs. The PDS mode or FFS mode is em-
same compression efficiency as the full search algorithm. ployed to find the integer pixel motion vectors. The half-
The other is the diamond search starting from the predic- pixel refinement is applied around the motion vector found
tor derived from neighboring MBs (called predictive dia- in the phase 2. For four 8x8 pixel blocks in a MB, the spiral
mond search, PDS) and it meets the real-time specification search around -2 to +2 is applied to obtain four optimal mo-
under the visual quality degradation. Afterwards, the hier- tion vectors. Four times of half-pixel refinement is applied

253
data in
mode Controller Data loading path
SW MEM
48
A B C D

0 1 2 0 1 2
Adjustable ROM-based

48
Reference Frame
Spiral Diamond Strip Strip Strip
Pattern Pattern 0 1 2

Range MUX Pattern 00 10 20 30 40 50 60 70 01 11 21 31 41 51 61 71

Checker (id,u,v) Generation 06 16 26 36 46 56 66 76 07 17 27 37 47 57 67 77

012 112 212 312 412 512 612 712 013 113 213 313 413 513 613 713
valid FIFO
terminate 018 118 218 318 418 518 618 718 019 119 219 319 419 519 619 719
fetch
024 124 224 324 424 524 624 724 025 125 225 325 425 525 625 725
AG MB SW
Y 030 130 230 330 430 530 630 730 031 131 231 331 431 531 631 731
RAM MEM
Adder #A Pixel in RAM# with address A Half-row for address (3,2)
64 bits
Tree

Rate-biased Termination Fig. 6. Memory organization of motion estimator

Distortion
Accumulator detector
Calculation
Min. motion
vector with SAD
collision in the same memory bank. Fig. 6 depicts this or-
ganization with the search range -16 to +15 pixels. Before
Fig. 5. Architecture of motion estimator performing motion estimation for each MB labeled as A, B,
or C, it needs to load colocated 48 by 48 pixel size of search
windows in the reference frame to search window memory
around the motion vectors found in the previous phases. (SWMEM) which is divided into three strips and each one
contains exclusive sixteen pixels. Under the leftmost MB of
the frame, all data located in these three strips are required
4.2. Architecture
to be loaded. However, the left MBs in the same row can
Fig. 5 depicts the hardware architecture of the motion esti- reuse two-thrid of search window of the immediately previ-
mator supporting PDS and FFS. This architecture mainly in- ous MB.
cludes three processing stages and two buffers to store cur- With rotation and modulation operations for addressing,
rent MB and the search window. Before performing motion this column-by-column data reuse scheme is applied to this
estimation, the video coding system transfers data from ex- motion estimation architecture. The bus traffic for loading
ternal memory into these buffers to eliminate the bus band- search window is then reduced from 26.10 to 9.49 Mbytes
width for calculating of sum of absolute difference in the per second for CIF format with the search range of -16 to
following. Meanwhile, the adder tree accumulates the sum +15 pixel. In each strip, eight horizontal neighboring pixels
of the pixels in the current MB to save it into a register (a half row) are stored into eight separated memory with the
for the mode decision in the future. To speed up the data linear addressing. While reading a half-row of pixels ran-
loading and reduce the bus traffic, the search window buffer domly, two consecutive addresses are calculated first from
can be loaded using column-by-column data-reuse scheme. the two-dimension coordinates. Then the proper circular
After motion estimation starts, the pattern generation (PG) rotative operations are applied to the data read out from the
stage generates the valid candidate positions. Then these memory banks.
positions are passed through the FIFO stage and fetched by
the distortion calculation (DC) stage. The DC stage is re- 4.4. Performance
sponsible for calculating SAD of candidate positions and
finds the minimum one. The accumulation comparison elim- The PDS mode can satisfy the real-time specification while
ination (ACE) unit performs the PDE algorithm to reduce the FFS mode can achieve the same compression quality as
the computational complexity. MPEG-4 software verified model (VM) [8]. To explore the
degradation in the PDS mode, four sequences with different
features are used as test patterns. The average difference
4.3. Data Reuse Scheme
between PDS and VM in PSNR is only 0.136 dB and the
The eight-way interleaved memory organization is used to maximum PSNR drop through the testing sequences is only
dynamically fetch eight pixels in one cycle without reading 0.618 dB. Even in the frames whose the difference in PSNR

254
40 PSNR Y (dB) SRAM SRAM
weather - PDS
w
38
weather - FFS
foreman - PDS
foreman - FFS
f
36 stefan - PDS MEMIF MEMIF
stefan - FFS
34 coastguard - PDS Host

PROGRAM

SM BUS
coastguard - FFS DMA
Computer

BUS
32 c
DMA
30
s
28
PCI PCI
Arbiter MPEG-4 Video Encoder
26 Connector Controller

BITSTREAM
24

FRAME
DMA

BUS

BUS
22
Bit rate (bps)
DMA
20
0 200 400 600 800 1000
MEMIF MEMIF

Xilinx FPGA
Fig. 7. RD curves with PDS and FFS modes SRAM SRAM

Fig. 8. Conﬁgurable platform

are maximum, it is still indistinguishable between these two
in subject view. While encoding in the FFS mode, the PSNR
and bit-rate of the reconstructed frames are almost the same
as that encoded by VM. The average PSNR are even better Table 1. Characteristics of the encoder chip
than 0.00625 dB. The general R-D curves for testing se- Technology TSMC 0.35 µm 1P4M CMOS
quence are simulated and shown in Fig. 7. Die Size 5.02 x 5.13 mm 2
Transistor count 828,692 trans.
On-chip memory 39,080 bits
5. IMPLEMENTATION
Off-chip memory 2,027,527 bits
A configurable platform shown in Fig. 8 is used to ver- Clock frequency 40 MHz
ify the functionality of our architecture design. This proto- Voltage 3.3V
typing board is connected through the PCI interface to the Power consumption 256.8mW
host computer. Four separated memory with DMA mod- Package 208 CQFP
ules are used to handle PROGRAM, DATA, SHARE, and ME algorithm PDS/FFS, 4MV mode
BITSTREAM bus from our design. An arbiter is responsi- Search range -16.0 to +15.5
ble for the memory access through PCI and memory. The Encoding complexity 352 x 288 at 30 fps
MPEG-4 video encoder design is synthesized and placed
on the FPGA chip. The RISC program is compiled to ma-
chine codes by the host computer and then sent to the pro-
16∼+15.5. In [5], it is a platform-based video/speech codec
gram memory. Raw image data is transferred from the host
design. It uses 3-step hierarchical search for ME with search
computer to the frame memory on the prototyping board.
range -32∼+31.5. In [6], it is a platform-based video codec
Video encoding is processed concurrently. Afterwards, bit-
design with ARM/AMBA. It uses a coarse ME with search
stream data are stored in the bitstream memory and then
range -8∼+7.5. All chip designs adopts fast algorithms for
read from the host computer. Besides, the share memory
motion estimation. In the viewpoint of video encoder parts,
can record the immediate information for debugging in the
our work has highest encoding complexity and the lowest
testing mode.
cost meanwhile.
Fig. 9 shows a micrograph of the encoder and Table
1 depicts its characteristics. It contains 828K transistors
and is fabricated on a 5.02 x 5.13 mm 2 with 0.35 µ m and 6. CONCLUSION
single-poly quadruple-metal CMOS process. The chip is
tested and works successfully. The supply voltage is 3.3V An efficient platform architecture design with hardware ac-
and consumes 256.8mW at 40MHz working frequency. Ta- celerators for MPEG-4 Simple Profile@Level 3 video en-
ble 2 shows the number of transistors, the area, and the size coder SOC is proposed in this paper. With the proposed
ratio to the chip of each unit. hybrid motion estimation and efficient memory organiza-
Table 3 gives a comparison of some MPEG-4 video codec tion, the system are implemented with 0.35 µm CMOS tech-
proposed before. In [4], it is a full dedicated hardware video nology. It works at 40MHz and consumes 256.8mW with
codec design. It uses MVFAST for ME with search range - 5.03x5.13 mm 2 die size to meet the real-time encoding spec-

255
Table 3. Architectures Comparison

Designer [4] [5] [6] Proposed

Encoding CIF, QCIF, CIF, CIF,
Complexity 15fps 15fps 15fps 30fps
Frequency
13.5 60 27 40
(MHz)
Power
29 240 500 256.8
(mW)
Transistor 20,500
3,150 1,700 829
(K) (DRAM)
Process
0.18 0.25 0.35 0.35
(µm)
Chip area
28.048 117.506 110.25 25.801
(mm2 )

Fig. 9. Micrograph of this encoder [3] A. Hatabu, T. Miyazaki, and I. Kuroda, “QVGA/CIF
resolution MPEG-4 video codec based on a low-power
and general-purpose DSP,” in IEEE Workshop on Sig-
Table 2. Cost distribution nal Processing Systems (SiPS), 2002, pp. 15–20.
Trans. Area Size ratio
(k) (mm2 ) (%) [4] H. Nakayama, T. Yoshitake, H. Komazaki, Y. Watan-
abe, H. Araki, K. Morioka, J. Li, L. Peilin, S. Lee,
ME 288 5.8 22.6
MC 53 0.3 1.2 H. Kubosawa, and Y. Otobe, “An MPEG-4 Video LSI
DCT/IDCT in TBE 126 1.6 6.2 with an Error-Resilient Codec Core Based on a Fast
Motion Estimation Algorithm,” in IEEE International
Q/IQ in TBE 64 0.7 2.9
ACDCP in TBE 22 0.8 3.0 Solid-State Circuits Conference (ISSCC), 2002, vol. 1,
pp. 368–474.
RISC 112 1.8 7.0
DMA 19 0.3 1.2 [5] M. Takahashi, T. Nishikawa, M. Hamada,
VLC 95 0.7 2.7 T. Takayanagi, H. Arakida, N. Machida, H. Ya-
Share MEM 68 2.8 10.9 mamoto, T. Fujiyoshi, Y. Ohashi, O. Yamagishi,
Others (PAD etc.) 49 10.9 42.3 T. Samata, A. Asano, T. Terazawa, K. Ohmori,
Total 829 25.8 100.0 Y. Watanabe, H. Nakamura, S. Minami, T. Kuroda,
and T. Furuyama, “A 60-MHz 240-mW MPEG-4
Videophone LSI with 16-Mb Embedded DRAM,”
iﬁcation. The proposed design achieves high performance IEEE Journal of Solid-State Circuit, vol. 35, no. 11, pp.
with low design cost, which proves that a cost-effective MPEG- 1713–1721, Nov 2000.
4 coding system implementation is realized. [6] J. H. Park, I. K. Kim, S. M. Kim, S. M. Park, B. T. Koo,
K. S. Shin, K. B. Seo, and J. J. Cha, “MPEG-4 Video
7. REFERENCES Codec on an ARM core and AMBA,” in Workshop and
Exhibition on MPEG-4, 2001, pp. 95–98.
[1] P. M. Kuhn and W. Stechele, “Complexity analysis of
the emerging MPEG-4 standard as a basis for VLSI im- [7] C. W. Hsu, W. M. Chao, Y. C. Chang, and L. G.
plementation,” in International Conference on Visual Chen, “Cost-Effective Scheduling Of Texture Coding
Communications and Image Processing, 1998. For MPEG-4 Video,” IEEE International Conference
on Multimedia and Expo(ICME’02), Aug 2002.
[2] H .C. Chang, L. G. Chen, M. Y. Hsu, and Y. C. Chang,
“Performance analysis and architecture evaluation of [8] T. Sikora, “The MPEG-4 Video Standard Veriﬁcation
MPEG-4 video codec system,” in IEEE International Model,” IEEE Trans. on Circuits and Systems for Video
Symposium on Circuits and Systems (ISCAS), 2000, Technology, vol. 7, no. 1, pp. 19–31, Feb 1997.
vol. 2, pp. 449–452.

256

Electronics Circuit Design
No ratings yet
Electronics Circuit Design
8 pages
Kulmala - Scalable MPEG-4 Encoder o
No ratings yet
Kulmala - Scalable MPEG-4 Encoder o
15 pages
VLSI Design For Video Coding 2010th Edition Youn Download
No ratings yet
VLSI Design For Video Coding 2010th Edition Youn Download
48 pages
Sample3 2column
No ratings yet
Sample3 2column
4 pages
Motion Estimation Architecture For Mpeg4 Part 9 Reference Hardwa
No ratings yet
Motion Estimation Architecture For Mpeg4 Part 9 Reference Hardwa
4 pages
Design and Implementation of Gray-Coded Bit-Plane Based Reconfigurable Motion Estimation Architecture Using Binary Content Addressable Memory For Video Encoder
No ratings yet
Design and Implementation of Gray-Coded Bit-Plane Based Reconfigurable Motion Estimation Architecture Using Binary Content Addressable Memory For Video Encoder
8 pages
Exercising H.264 Video Compression IP Using Commercial FPGA Prototypes
No ratings yet
Exercising H.264 Video Compression IP Using Commercial FPGA Prototypes
9 pages
Video Coding Unit (VCU) : Hot Chips 2021
No ratings yet
Video Coding Unit (VCU) : Hot Chips 2021
30 pages
Energy Effi Cient Embedded Video Processing Systems: Muhammad Usman Karim Khan Muhammad Shafi Que Jörg Henkel
No ratings yet
Energy Effi Cient Embedded Video Processing Systems: Muhammad Usman Karim Khan Muhammad Shafi Que Jörg Henkel
242 pages
Towards Flexible Hardware - Software Encoding Using H.264
No ratings yet
Towards Flexible Hardware - Software Encoding Using H.264
111 pages
HD TV Encoding and Decoding
No ratings yet
HD TV Encoding and Decoding
9 pages
A Reconfigurable Multiple Transform Selection Architecture For VVC
No ratings yet
A Reconfigurable Multiple Transform Selection Architecture For VVC
12 pages
Ateme MPEG-4 AVC H264 Motion Estimation IP Datasheet
No ratings yet
Ateme MPEG-4 AVC H264 Motion Estimation IP Datasheet
13 pages
Wang IMG
No ratings yet
Wang IMG
11 pages
1 s2.0 S1434841116309037 Main
No ratings yet
1 s2.0 S1434841116309037 Main
8 pages
Serial Parallel Dataflow-Pipelined Processing Architecture Based Accelerator For 2D Transform-Quantization in Video Coder and Decoder
No ratings yet
Serial Parallel Dataflow-Pipelined Processing Architecture Based Accelerator For 2D Transform-Quantization in Video Coder and Decoder
12 pages
ASIP Approach For Implementation of H.264/AVC
No ratings yet
ASIP Approach For Implementation of H.264/AVC
15 pages
VLSI Architecture of Full-Search Variable-Block-Size Motion Estimation For HEVC Video Encoding
No ratings yet
VLSI Architecture of Full-Search Variable-Block-Size Motion Estimation For HEVC Video Encoding
6 pages
HEVC
No ratings yet
HEVC
50 pages
Analysis and Architecture Design of An HDTV720p 30 Frames/s H.264/AVC Encoder
No ratings yet
Analysis and Architecture Design of An HDTV720p 30 Frames/s H.264/AVC Encoder
16 pages
Fast Implementation of The MPEG-4 AAC Ma PDF
No ratings yet
Fast Implementation of The MPEG-4 AAC Ma PDF
4 pages
MPEG-4 Beyond Conventional Video Coding Object Coding, Resilience and Scalability by Et Al Mihaela Van Der Scharr PDF
No ratings yet
MPEG-4 Beyond Conventional Video Coding Object Coding, Resilience and Scalability by Et Al Mihaela Van Der Scharr PDF
86 pages
New VLSI Architecture For Motion Estimation Algorithm: V. S. K. Reddy, S. Sengupta, and Y. M. Latha
No ratings yet
New VLSI Architecture For Motion Estimation Algorithm: V. S. K. Reddy, S. Sengupta, and Y. M. Latha
4 pages
Design of Soc Based Platform & Development of Software For Video Display Application
No ratings yet
Design of Soc Based Platform & Development of Software For Video Display Application
4 pages
Evolution of Video Codec Chips
No ratings yet
Evolution of Video Codec Chips
27 pages
High-Performance Hardware Implementation of The H
No ratings yet
High-Performance Hardware Implementation of The H
4 pages
Wenjunzhao 2013
No ratings yet
Wenjunzhao 2013
4 pages
Research Public Journals
No ratings yet
Research Public Journals
13 pages
FPGA H.264 Encoder Design & Implementation
No ratings yet
FPGA H.264 Encoder Design & Implementation
9 pages
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
No ratings yet
Vlsi Implementation of Integer DCT Architectures For Hevc in Fpga Technology
12 pages
Vineet Vlsi For Mutlimedia
No ratings yet
Vineet Vlsi For Mutlimedia
2 pages
H.264/ AVC: Compression Standard
No ratings yet
H.264/ AVC: Compression Standard
21 pages
The VLSI Architecture of A Highly Efficient Deblocking Filter For HEVC Systems
No ratings yet
The VLSI Architecture of A Highly Efficient Deblocking Filter For HEVC Systems
13 pages
Product Overview: 1.1 Features
No ratings yet
Product Overview: 1.1 Features
47 pages
Survey 1
No ratings yet
Survey 1
10 pages
Content-Based Video Transmission Over Wireless Channels
No ratings yet
Content-Based Video Transmission Over Wireless Channels
7 pages
High-Level Synthesis Based VLSI Architectures For Video Coding
No ratings yet
High-Level Synthesis Based VLSI Architectures For Video Coding
109 pages
Rob Clark - GStreamer and OMAP4
No ratings yet
Rob Clark - GStreamer and OMAP4
23 pages
IET Image Processing - 2015 - Pastuszak - Hardware Architectures For The H 265 HEVC Discrete Cosine Transform
No ratings yet
IET Image Processing - 2015 - Pastuszak - Hardware Architectures For The H 265 HEVC Discrete Cosine Transform
11 pages
Inter Frame Bus Encoding
No ratings yet
Inter Frame Bus Encoding
5 pages
L3 - 4-Digital Video Standards
No ratings yet
L3 - 4-Digital Video Standards
60 pages
Fpga Arch FVC Amt
No ratings yet
Fpga Arch FVC Amt
7 pages
MPEG Standards Explained
No ratings yet
MPEG Standards Explained
68 pages
Introduction To SoC
No ratings yet
Introduction To SoC
50 pages
Decode Mpeg-2 Video With Virtex Fpgas
No ratings yet
Decode Mpeg-2 Video With Virtex Fpgas
3 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
7 pages
Mastering Video Coding A Comprehensive Dive From Tools To Consumer Deployment
No ratings yet
Mastering Video Coding A Comprehensive Dive From Tools To Consumer Deployment
8 pages
Overview of The AVC
No ratings yet
Overview of The AVC
24 pages
A System On Chip For Digital TV
No ratings yet
A System On Chip For Digital TV
6 pages
A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
No ratings yet
A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
11 pages
MPEG 4 Beyond Conventional Video Coding Object Coding Resilience and Scalability 1st Edition Et Al Mihaela Van Der Scharr
100% (25)
MPEG 4 Beyond Conventional Video Coding Object Coding Resilience and Scalability 1st Edition Et Al Mihaela Van Der Scharr
67 pages
The Design and Implementation of A Networked Real-Time Video Surveillance System Based On FPGA
No ratings yet
The Design and Implementation of A Networked Real-Time Video Surveillance System Based On FPGA
5 pages
SCAN Chain Based Clock Gating For Low Power Video Codec Design
No ratings yet
SCAN Chain Based Clock Gating For Low Power Video Codec Design
7 pages
Image Compression Using High Efficient Video Coding (HEVC) Technique
No ratings yet
Image Compression Using High Efficient Video Coding (HEVC) Technique
3 pages
High-Throughput H.264 CABAC Decoder
No ratings yet
High-Throughput H.264 CABAC Decoder
5 pages
H.264 Video Compression Guide
No ratings yet
H.264 Video Compression Guide
8 pages
Core Design and SOC Integration
No ratings yet
Core Design and SOC Integration
10 pages
Technology: Si L Tnla Wsrbor e
No ratings yet
Technology: Si L Tnla Wsrbor e
3 pages
PAS600L User Guide
No ratings yet
PAS600L User Guide
136 pages
Information Display - 2021 - Lee - Journal of The Society For Information Display
No ratings yet
Information Display - 2021 - Lee - Journal of The Society For Information Display
1 page
CS3251 (UNIT 3) NOTES EduEngg
No ratings yet
CS3251 (UNIT 3) NOTES EduEngg
24 pages
Verdi - Rigoletto.ponnelle - pavarotti.1983.DVD Rip - Aac.x264
No ratings yet
Verdi - Rigoletto.ponnelle - pavarotti.1983.DVD Rip - Aac.x264
3 pages
Remove CamScanner Watermark
No ratings yet
Remove CamScanner Watermark
38 pages
1-Stronlgy Agree 2-Agree 3-Undecided 4-Disagree 5-Strongly Disagree
100% (6)
1-Stronlgy Agree 2-Agree 3-Undecided 4-Disagree 5-Strongly Disagree
2 pages
1.selective Repeat ARQ Protocol: (Author)
No ratings yet
1.selective Repeat ARQ Protocol: (Author)
9 pages
Panasonic GH4 Quick Start
No ratings yet
Panasonic GH4 Quick Start
71 pages
Unit-1.4 Data Communication
100% (1)
Unit-1.4 Data Communication
49 pages
Unit2 Networking
No ratings yet
Unit2 Networking
8 pages
Document
No ratings yet
Document
2 pages
Ta400 Ta800 User Manual en
No ratings yet
Ta400 Ta800 User Manual en
64 pages
Manual DS-55100-1
No ratings yet
Manual DS-55100-1
8 pages
UI Designing With Photoshop
No ratings yet
UI Designing With Photoshop
3 pages
Computer Hardware Essentials
No ratings yet
Computer Hardware Essentials
40 pages
HP Mini Dock
No ratings yet
HP Mini Dock
2 pages
HCIA (Datacom) V1.0 WEEKEND SESSION COURSE OUTLINE
No ratings yet
HCIA (Datacom) V1.0 WEEKEND SESSION COURSE OUTLINE
4 pages
Shenzhen Tiglon Technology Company Limited - Dash Cams For Trucks
No ratings yet
Shenzhen Tiglon Technology Company Limited - Dash Cams For Trucks
19 pages
Com. Prog Notes
No ratings yet
Com. Prog Notes
4 pages
Desktop Computer Specification
No ratings yet
Desktop Computer Specification
21 pages
Ec4091-Digital Signal Processing Lab: Electronics and Communication Engineering National Institute of Technology, Calicut
No ratings yet
Ec4091-Digital Signal Processing Lab: Electronics and Communication Engineering National Institute of Technology, Calicut
12 pages
Haiwell HMI & SCADA Solutions
100% (1)
Haiwell HMI & SCADA Solutions
27 pages
Sony FS700 Camera Specs
No ratings yet
Sony FS700 Camera Specs
4 pages
Social Mobile Assistive Media
No ratings yet
Social Mobile Assistive Media
14 pages
Chapter Two
No ratings yet
Chapter Two
50 pages
Cloud Computing
No ratings yet
Cloud Computing
19 pages
Enterprise IT Professionals Quiz
No ratings yet
Enterprise IT Professionals Quiz
14 pages
Presentation Material - MS PowerPoint
No ratings yet
Presentation Material - MS PowerPoint
41 pages
VX1828B Vxis
No ratings yet
VX1828B Vxis
64 pages

(C) (2004) (SIPS) (Yung-Chi Chang)

Uploaded by

(C) (2004) (SIPS) (Yung-Chi Chang)

Uploaded by

PLATFORM-BASED MPEG-4 VIDEO ENCODER SOC DESIGN

Yung-Chi Chang, Wei-Min Chao and Liang-Gee Chen

DSP/IC Design Lab

ABSTRACT heavily relied for efﬁcient implementations and data access

0-7803-8504-7/04/$20.00 ©2004 IEEE 251 SIPS 2004

RISC Cache DMA MEM IF

RISC Bus Block-based map

Motion Motion Share Texture Bitstream Reconstructed

Fig. 1. System Architecture

tion estimator (ME) carries out motion estimation with the

predictor of macroblock X = median of motion vectors of

Range MUX Pattern 00 10 20 30 40 50 60 70 01 11 21 31 41 51 61 71

Checker (id,u,v) Generation 06 16 26 36 46 56 66 76 07 17 27 37 47 57 67 77

Rate-biased Termination Fig. 6. Memory organization of motion estimator

Fig. 8. Conﬁgurable platform

Designer [4] [5] [6] Proposed

You might also like