0% found this document useful (0 votes)
57 views

Low-Power and Area-Optimized VLSI Implementation of AES Coprocessor For Zigbee System

This document describes a proposed low-power and area-optimized AES coprocessor implementation for Zigbee systems. The implementation optimizes the SubBytes/InvSubBytes and MixColumns/InvMixColumns transformations. It also integrates encryption and decryption using resource sharing and employs hierarchical power management with finite state machines and clock gating to reduce power consumption. Based on a 0.18um CMOS technology, the AES coprocessor requires only 10.5k gates, consumes 69.1uW/MHz, and achieves a throughput of 32Mbps. This implementation consumes less power and fewer resources than other designs, making it suitable for low-power applications like Zigbee systems.

Uploaded by

johan wahyudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Low-Power and Area-Optimized VLSI Implementation of AES Coprocessor For Zigbee System

This document describes a proposed low-power and area-optimized AES coprocessor implementation for Zigbee systems. The implementation optimizes the SubBytes/InvSubBytes and MixColumns/InvMixColumns transformations. It also integrates encryption and decryption using resource sharing and employs hierarchical power management with finite state machines and clock gating to reduce power consumption. Based on a 0.18um CMOS technology, the AES coprocessor requires only 10.5k gates, consumes 69.1uW/MHz, and achieves a throughput of 32Mbps. This implementation consumes less power and fewer resources than other designs, making it suitable for low-power applications like Zigbee systems.

Uploaded by

johan wahyudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Journal of China

Universities of Posts and


Telecommunications
June 2009, 16(3): 89–94
www.sciencedirect.com/science/journal/10058885 www.buptjournal.cn/xben

Low-power and area-optimized VLSI implementation of AES


coprocessor for Zigbee system
LI Zhen-rong ( ), ZHUANG Yi-qi, ZHANG Chao, JIN Gang

Key Laboratory of the Ministry of Education for Wide Band-Gap Semiconductor Materials and Devices, Xidian University, Xi’an 710071, China

Abstract

A low-power and low-cost advanced encryption standard (AES) coprocessor is proposed for Zigbee system-on-a-chip (SoC)
design. The cost and power consumption of the proposed AES coprocessor are reduced considerably by optimizing the
architectures of SubBytes/InvSubBytes and MixColumns/InvMixColumns, integrating the encryption and decryption procedures
together by the method of resource sharing, and using the hierarchical power management strategy based on finite state machine
(FSM) and clock gating (CG) technologies. Based on SMIC 0.18 Pm complementary metal oxide semiconductor (CMOS)
technology, the scale of the AES coprocessor is only about 10.5 kgate, the corresponding power consumption is 69.1 PW/MHz ,
and the throughput is 32 Mb/s, which is reasonable and sufficient for Zigbee system. Compared with other designs, the proposed
architecture consumes less power and fewer hardware resources, which is conducive to the Zigbee system and other portable
devices.

Keywords Zigbee, AES, architecture, encryption, decryption, application specific integrated circuit (ASIC)

1 Introduction  are reported. However, most of them focus on the


performance of high throughput, which are not area-efficient
In recent years, there has been an exponential growth in and result in high power consumption. Whereas, in low-cost
wireless technology and ever increasing need for throughput. and low-rate environments, such as Zigbee system, power
However, there are certain segments that do not require high consumption and hardware cost are more important factors
data transfer capacities, such as Zigbee, a new low-rate than data transfer capacities.
wireless network standard designed for automation and In this article, the basic operations used in the AES
control network. Zigbee aims to be a low-cost and low-power algorithm are analyzed and a low-power and area-optimized
solution for systems consisting of unsupervised groups of very large scale integration (VLSI) implementation of the AES
devices in houses, factories and offices. coprocessor is proposed. As a solution, the architectures of
The Zigbee standard is specified by Zigbee Alliance [1] and optimized SubBytes/InvSubBytes and MixColumns/InvMix-
utilizes IEEE 802.15.4 [2] standard as radio layer (media Columns are applied, which synthesize the encryption and
access control (MAC) and physical layer). At the MAC decryption, and employ the low power techniques of resource
sublayer of IEEE 802.15.4 standard, the AES is adopted for sharing and power management. Other methods for resource
guaranteeing basic security. The AES algorithm is issued by reducing and power saving are also adopted.
the National Institute of Standards and Technology (NIST) as The remainder of this article is organized as follows. The
a successor to the data encryption standard (DES) algorithm [3]. AES algorithm is introduced in Sect. 2. Sect. 3 describes the
It is a complex computational algorithm that requires a huge basic structure and the proposed AES coprocessor
number of resources and power consumption in applications. implementation. The implementation results are presented in
In recent literature, implementations of the AES algorithm in comparison to previous implementations in Sect. 4. The
ASIC [4–7] and field programmable gate array (FPGA) [8–11] article is concluded in Sect. 5.

Received date: 12-06-2008


Corresponding author: LI Zhen-rong, E-mail: [email protected]
DOI: 10.1016/S1005-8885(08)60232-0
90 The Journal of China Universities of Posts and Telecommunications 2009

2 Overview of AES algorithm S rc, c Sr ,( c Vshift ( r , Nb )) mod Nb ; 0< r <4, 0< c < Nb (2)
The shift value Vshift(r,Nb) depends on the row number r, as
The AES algorithm [3] is a symmetric block cipher that shown in Eq. (3):
processes data blocks of 128 bit using a cipher key of length Vshift(0,4)=0, Vshift(1,4)=1, Vshift(2,4)=2, Vshift(3,4)=3 (3)
of 128 bit, 192 bit, or 256 bit, which result in 10, 12 and 14
rounds of operation, respectively. Each data block consists of 2.3 MixColumns transformation
a 4 u 4 array of bytes called the state, on which the basic
operations of the AES algorithm are performed. The AES The MixColumns transformation operates on the state
encryption and decryption procedures are shown in Fig. 1, in column-by-column, treating each column as a four-term
which the AES algorithm uses a round function composed of polynomial. The columns are considered as polynomials over
four different byte-oriented transformations: SubBytes, GF(28) and multiplied with a fixed polynomial. This process
ShiftRows, MixColumns and AddRoundKey. The individual is displayed in Fig. 2(c). It can be written as a matrix
encryption is described in the following subsections, and it multiplication as follows:
can be inverted and then implemented in reverse order to ª S0,c c º ª02 03 01 01º ª S0, c º
produce decryption for the AES algorithm. « Sc » « »« »
« 1, c » « 01 02 03 01» « S1, c »
« S 2,c c » « 01 01 02 03» « S 2, c »
« » « »« »
¬« S3,c c ¼» ¬ 03 01 01 02 ¼ ¬« S3, c ¼»

(a) SubBytes

Fig. 1 AES encryption and decryption


(b) ShiftRows

2.1 SubBytes transformation

The SubBytes transformation, as shown by Eq. (1), is a


non-linear byte substitution that operates independently on
each byte of the state using a substitution table (S-box). The
S-box is composed of two transformations:
1) Take the multiplicative inverse in the finite field GF (28). (c) MixColumns
2) Apply the following affine transformation over GF (2).
Fig. 2(a) illustrates the SubBytes transformation.
bic bi † b(i  4) mod 8 † b(i  5) mod 8 † b(i  6) mod 8 † b( i  7) mod 8 † ci (1)

2.2 ShiftRows transformation

The bytes in the last three rows of the State are cyclically (d) AddRoundKey
shifted over different numbers of bytes (offsets). Fig. 2(b) Fig. 2 AES encryption transformations
illustrates the ShiftRows transformation, and Eq. (2) shows
the transformation process
Issue 3 LI Zhen-rong, et al. / Low-power and area-optimized VLSI implementation of AES coprocessor for Zigbee system 91

2.4 AddRoundKey transformation system, the authors propose the SoC architecture for Zigbee
nodes. The AES coprocessor is integrated in the SoC and
In the AddRoundKey transformation, a Round Key is operated when needed by the Zigbee system.
added to the State by a simple bit wise XOR operation. There are considerbale studies on the implementation of the
Each Round Key consists of words from the key schedule. AES algorithm [4–11], in which novel methods or
Those words are added into the columns of the State. This architectures are introduced. The implementation based on
process is illustrated in Fig. 2(d). pipelined architectures [8,11] is not considered because they
yield high throughput at the expense of large hardware cost.
2.5 Key Expansion The performance in FPGAs is also ignored in this article
because the memories and random access memories (RAMs)
The AES algorithm takes the Cipher Key and performs a Key commonly used in FPGA designs [9–11] are not suitable for
Expansion routine to generate a key schedule. This process, as ASIC designs. In this article, power consumption and hardware
shown in Fig. 3, consists of the following sub-functions: cost other than high throughput are primary concerns.
1) RotWord performs a one-byte circular left shift on a
word. 3.1 Encryption and decryption integration
2) SubWord performs a byte substitution on each byte of its
input word using the S-box. The AES coprocessor contains encryption and decryption
3) The result of steps 1) and 2) is XOR-ed with a round procedures, and the two procedures do not run simultaneously.
constant Rcon[j]. These two procedures can be integrated into a whole
architecture and share hardware resource to reduce hardware
cost and power consumption. The architecture of the AES
coprocessor is shown in Fig. 4, where every sub-module
corresponds to one transformation. A 1-bit signal En/De
determines the encryption/decryption mode, and the whole
process is controlled by the Round signal. One can also
Fig. 3 Key Expansion process integrate the SubBytes and the MixColumns with InvSubBytes
and InvMixColumns, respectively, by analyzing the similarity
3 Low cost and low power implementation between the transformations and its inverse, and the
transformation can be performed individually by the same
To minimize power consumption and cost of the Zigbee hardware resources.

Fig. 4 En/Decryption hardware architecture

efficiently be used in FPGAs [9–11], this approach does not


3.2 Data unit
result in the optimal solution in terms of area in ASIC designs.
In the proposed design, the cost and power are more
3.2.1 Implementation of SubBytes/InvSubBytes
important than speed, hence the authors employ the composite
field calculation method described in Ref. [12] to reduce
Designing a compact S-box is one of the most critical steps
hardware cost.
in reducing the hardware resource of the AES. Generally,
The structure of SubBytes/InvSubBytes is described in Fig. 5,
there are two ways of realizing the S-box, look-up table (LUT)
in which the effective application of composite field GF((24)2)
and composite field calculation. Although implementing
arithmetic is proposed and the element of GF(24) is expressed
S-box directly as an LUT by using block RAMs can
92 The Journal of China Universities of Posts and Telecommunications 2009

as symbol B. The computation of the GF(28) inversion is


divided into computations in smaller sub-fields GF(24). This
approach can accelerate and simplify the GF(28) calculation.
Meanwhile, it can reduce hardware implementation complexity.
According to Ref. [7], the composite field decomposition can
reduce the gate count significantly. Another identical S-box is
used for the Key Expansion.

Fig. 6 Structure of MixColumns/InvMixColumns

3.2.4 Implementation of AddRoundKey

AddRoundKey transformation need only one step of XOR


operation, thus no optimization has been performed for it.
Fig. 5 Structure of SubBytes/InvSubBytes
3.2.5 Implementation of Key Expansion
3.2.2 Implementation of ShiftRows/InvShiftRows
The Key Expansion module, as shown in Fig. 7, is entirely
In the proposed design, no optimization has been independent of the other four transformations except the
performed on ShiftRows/InvShiftRows. Several methods S-box.
depending on the block RAM, which is suitable for FPGA
device [9], cannot result in the optimal solution in ASIC
designs.

3.2.3 Implementation of MixColumns/ InvMixColumns

In the AES algorithm, both the MixColumns and


InvMixColumns are hardware demanding operations. Various
architectures have been proposed for the implementations of
the MixColumns and InvMixColumns transformations [6–7].
By analyzing the basic operations employed in MixColumns Fig. 7 Key Expansion module
and InvMixColumns, it is found that the modular multiplier is Key Expansion can only occur in a forward order. For the
the vital calculation module and that InvMixColumns is more decryption operation, the key must be generated in a forward
complicated than MixColumns.To reduce design complexity, order and available to the decryption process in a reverse
the InvMixColumns can be decomposed to share hardware order. For the reverse Key Expansion, the last Round Key
resources with MixColumns. In this article, the method should be generated first, which corresponds to the first
described in Ref. [13] is adopted that can save the area and Round Key of the forward Key Expansion. The decryption
reduce the complexity. requires more cycles than encryption because it needs
By this approach, as shown in Fig. 6, byte-level resource pre-generation to generate the last key value. The process uses
sharing is realized for both parallel and serial InvMixColumns shift, XOR, SubBytes, and Rcon operations. In this article, the
decomposition. The architecture based on the serial authors generate the key by a real time method because the
InvMixColumns decomposition with byte-level resource power and cost are the primary concerns, although more clock
sharing is the most area-efficient solution. cycles are needed.
Issue 3 LI Zhen-rong, et al. / Low-power and area-optimized VLSI implementation of AES coprocessor for Zigbee system 93

3.3 Power management technique

Power consumption is one of the most critical factors in


communication systems, especially for the Zigbee system, and
the system power consumption consists of different components
and can be analyzed from different design levels [14]. In
CMOS circuits, the dynamic power consumption caused by
propagations and transitions of clock signals consumes a huge
amount of the system energy [15], and clock gating is a
simple and effective method to decrease dynamic power Fig. 9 Sub-modules level clock gating
consumption.
4 Implementation results and comparison
3.3.1 Intellectual property (IP) level clock gating
The AES coprocessor is the key IP of this Zigbee SoC
An SoC system integrates many IP cores by bus design. It dominates the performance and cost of the system.
interconnection. The AES coprocessor is a key IP core of the The register transfer level (RTL) model of this AES
Zigbee SoC, and its power consumption constitutes a majority coprocessor is described in Verilog, simulated in ModelSim
of the total power. Therefore, to reduce IP power, low-power and synthesized to gate-level by the Design Compiler. The
methods should be employed at IP level. The authors use the power consumption is estimated by PrimePower and the
feedback state signal from the FSM of the AES coprocessor to layout is designed by Astro.
control the clock gating of the IP. As shown in Fig. 8, if the By applying the techniques of integration, power
AES coprocessor is not in use, the FSM goes to the Idle state. management and resource sharing, the design achieves low
The clock can be turned off to decrease the power hardware cost and low power consumption. Based on SMIC
0.18 Pm standard CMOS technology, the scale of the AES
consumption of the Zigbee system.
coprocessor is only about 10.5 kgate under the typical
condition of 1.8 V and 25ć. And the corresponding power
consumption is 69.1 PW/MHz , and the maximum frequency
can reach 125 MHz. For additional analysis, Table 1 shows
the area and power consumption distributions of the AES core.
The MixColumns/Inv and Key Expansion units consume a
majority of the area and account for about 50% of the total
area. The Key Expansion is the most power consuming unit
because of its high switching activity. The second highest
power consumption unit is the S-box.
Table 1 Area and power consumption of the AES
coprocessor
Unit A/kgate P (PW ˜ MHz 1 )
SubBytes/Inv
1.6 15.2% 11.4 16.5%
Fig. 8 IP level clock gating ShiftRows/Inv
S-box 0.5 5.0% 14.9 21.6%
3.3.2 Sub-module (of IP) level clock gating MixColumns/ Inv 3.1 29.5% 14.1 20.4%
Key Expansion 1.8 17.2% 16.5 23.8%
PMU/Control/Other 3.5 33.1% 12.2 17.7%
There are five functional sub-modules in the AES IP Total 10.5 100% 69.1 100%
corresponding with the transformations of the AES algorithm,
The operation speed of the AES can be determined by the
as shown in Fig. 9. Another sub-module called power manage
data throughput, expressed by Eq. (4):
unit (PMU) controls the working state of these five functional
128 f max
sub-modules. Since the authors’ design is not based on the Tthoughtput (4)
N cycle
pipelined architecture, other sub-modules do not contribute to
the result when making one transformation of encryption and The maximum frequency of this design is 125 MHz and the
decryption. They can shut down the clock of these operation cycle N-cycle is 500, hence the maximum
sub-modules, and the active and inactive parts are controlled throughput Tthroughput is 32 Mb/s. For comparison, the
by Enable signal generated from PMU. performances of several ASIC designs are illustrated in Table 2,
94 The Journal of China Universities of Posts and Telecommunications 2009

which includes the results of area, power, throughput and Acknowledgements


maximum frequency. The authors did not compare any ASIC
implementations based on fully pipelining as they give greater This work was supported by the National Natural Science
throughput at the expense of larger area. Meanwhile, no Foundation of China (60676053).
comparison is made of performance with FPGA designs because
the memories and RAMs are not suited in the ASIC design. References
Table 2 Comparison with other AES ASIC implementations 1. ZigBee Alliance. ZigBee Specification Version 1.0. San Ramon, CA,
Parameter Ref. [6]-1 Ref. [6]-2 Ref. [7] Ref. [16] Ours USA: ZigBee Standards Organization, 2004
Tech./ȝm 0.18 0.18 0.25 0.13 0.18 2. IEEE 802.15.4 ü 2006. IEEE standard for information technology ü
Scale/kgate ˉ ˉ 12 12.78 10.5 Telecommunications and information exchange between systemüLocal
A/mm2 0.408 0.286 ˉ ˉ 0.105 and metropolitan area networks specific requirements, Part 15.4: Wireless
P (mW ˜ MHz 1 ) 0.247 0.424 ˉ 0.074 0.069 MAC and physical layer (PHY) specifications for low-rat wireless
personal area networks (LR-WPANs). 2006
f/MHz 48 48 100 10 125 3. Federal Information Processing Standards (FIPS) 197. Specification for
Throughput/ (Mb ˜ s 1 ) 565 570 256 ˉ 32 the advanced encryption standard (AES). 2001
4. Kosaraju N M, Varanasi M, Mohanty S P. A high-performance VLSI
Zigbee SoC is a low-rate communication system controlled architecture for advanced encryption standard (AES) algorithm.
Proceedings of 19th International Conference on VLSI Design, Held with
by an embedded micro processing unit (MPU), with a working the 5th International Conference on Embedded Systems and Design, Jan
frequency of 16 MHz. The power consumption and throughout 37, 2006, Hyderabad, India. Los Alamitos, CA, USA: IEEE Computer
Society, 2006: 4
of the AES coprocessor are about 1.1 mW and 4 Mb/s, 5. Alam M, Ray S, Mukhopadhayay D, et al. an area optimized
respectively. A maximum sized Zigbee packet would be reconfigurable encryptor for AES-rijndael. Proceedings of Design,
Automation & Test in Europe Conference & Exhibition, Apr 1620, 2007.
completely operated by the AES coprocessor within 0.51 ms.
Nice, France. Piscataway, NJ, USA: IEEE, 2007: 1 6
Compared with others, the proposed design has lower hardware 6. Huang Y J, Lin Y S, Hung K Y, et al. Efficient implementation of AES IP.
cost, lower power consumption, and reasonable throughput for Proceedings of IEEE Asia Pacific Conference on Circuits and Systems,
Dec 47, 2006, Singapore. Piscataway, NJ ,USA: IEEE, 2006: 14181421
the Zigbee system. The layout of the AES coprocessor is shown 7. Zhao J, Zeng X Y, Han J, et al. Very low-cost VLSI implementation of
in Fig. 10, and the logic area is about 0.5 mm2. AES Algorithm. Proceedings of IEEE Asian Solid-State Circuits
Conference, Nov 1315, 2006, Hangzhou, China. Piscataway, NJ, USA:
IEEE, 2006: 223226
8. Iyer N C, Anandmohan P V, Poornaiah D V, et al. High throughput, low
cost, fully pipelined architecture for AES crypto chip. Proceedings of
2006 Annual IEEE India Conference, Sep 1517, 2006, New Delhi, India.
Piscataway, NJ, USA: IEEE, 2006: 16
9. Huang C W, Chang C J, Lin M Y, et al. Compact FPGA implementation
of 32-bits AES algorithm using block RAM Proceedings of 2007 IEEE
Region 10 Conference, Oct 30Nov 2, 2007 Teipei, China. Piscataway,
NJ, USA: IEEE, 2007: 14
10. Li H. Efficient and flexible architecture for AES. IEE Proceedings:
Circuits, Devices and Systems, 2006, 153(6): 533538
11. Fan C P, Hwang J K. Implementations of high throughput sequential and
fully pipelined AES processors on FPGA. Proceedings of International
Fig. 10 Layout of the AES coprocessor Symposium on Intelligent Signal Processing and Communication Systems,
Nov 28Dec 1, 2007, Xiamen, China. Piscataway, NJ, USA: IEEE, 2007:
5 Conclusions 353356
12. Salomon D. Data privacy and security. Berlin, Germany: Springer, 2003
13. Fischer V, Drutarovsky M, Chodowiec P, et al. InvMixColumn
This article presents an optimized ASIC implementation of decomposition and multilevel resource sharing in AES implementations.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
the AES coprocessor for the Zigbee system. By optimizing
2005, 13(8): 989992
the architectures of SubBytes/InvSubBytes and MixColumns/ 14. Nakajima M, Yamamoto T, Yamasaki M, et al. Low power techniques for
InvMixColumns, integrating the encryption and decryption mobile application SoCs based on integrated platform “UniPhier”
Proceedings of Asia and South Design Automation Conference, Jan
procedures, and using the hierarchical power management 2326, 2007, San Diego, CA, USA. Piscataway, NJ, USA: IEEE, 2007:
strategy, the hardware cost and power consumption of the 649653
15. Babighian P, Benini L, Macii E. A scalable algorithm for RTL insertion
proposed AES coprocessor are reduced considerably. Based of gated clocks based on ODCs computation. IEEE Transactions on
on SMIC 0.18 Pm CMOS technology, the scale of the AES Computer-Aided Design of Integrated Circuits and Systems, 2005, 24(1):
coprocessor is only about 10.5 kgates, and the corresponding 2942
16. Kothari N B, Sudarshan T S B, Gurunarayanan S, et al. SOC design of a
power consumption is 69.1 PW/MHz . This IP core has been low power wireless sensor network node for Zigbee systems. Proceedings
integrated in the proposed Zigbee SoC with good of International Conference on Advanced Computing and
Communications, Dec 2023, 2006, Surathka, India. Piscataway, NJ,
performance. Moreover, this IP core is also suitable for the USA: IEEE, 2006: 462466
portable devices, and the power consumption could be further
reduced while using low voltage power supply. (Editor: WANG Xu-ying)

You might also like