0% found this document useful (0 votes)
15 views6 pages

Aghaghiri 2002

This paper presents a sector-based encoding technique aimed at reducing power consumption on memory address buses by partitioning memory space into sectors. The proposed methods achieve significant reductions in bus transitions, with up to 67% reduction on data address buses and 55% on multiplexed address buses, while maintaining low power and delay overhead. The techniques are designed to be irredundant, allowing for implementation without additional bus lines, making them suitable for existing systems.

Uploaded by

p19940415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Aghaghiri 2002

This paper presents a sector-based encoding technique aimed at reducing power consumption on memory address buses by partitioning memory space into sectors. The proposed methods achieve significant reductions in bus transitions, with up to 67% reduction on data address buses and 55% on multiplexed address buses, while maintaining low power and delay overhead. The techniques are designed to be irredundant, allowing for implementation without additional bus lines, making them suitable for existing systems.

Uploaded by

p19940415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Reducing Transitions on Memory Buses Using

Sector-based Encoding Technique


Yazdan Aghaghiri Farzan Fallah Massoud Pedram
University of Southern California Fujitsu Laboratories of America University of Southern California
3740 McClintock Ave 595 Lawrence Expressway 3740 McClintock Ave
Los Angeles, CA 90089 Sunnyvale, CA 94086 Los Angeles, CA 90089
yazdan @ sahand.usc.edu farzan @fla.fujitsu.com pedram@ ceng.usc.edu

Abstract receiver side is required to restore the original values. For this approach
In this paper, we introduce a class of irredundant low power encoding to be effective, the power consumed by the encoder and the decoder has
techniques for memory address buses. The basic idea is to partition the to be much less than the power saved as a result of activity reduction on
memory space into a number of sectors. These sectors can, for the bus. Furthermore, there should be little or no delay penalty. These
example, represent address spaces for the code, heap, and stack constraints, which are imposed on the encoder/decoder logic, limit the
segments of one or more application programs. Each address is first space of possible encoding solutions. Although numerous encoding
dynamically mapped to the appropriate sector and then is encoded with techniques for instruction address buses have been reported ([2], [3],
respect to the sector head. Each sector head is updated based on the last [41, [SI, [7], [9], [lo], [ l l ] , etc.), there are not as many encoding
accessed address in that sector. The result of this sector-based encoding methods for data address or multiplexed address buses ([6], [9]).’ In the
technique is a reduction in the number of bus transitions when encoding case of instruction address bus encoding, high temporal correlation
consecutive addresses that access different sectors. Our proposed between consecutive addresses is exploited to decrease the number of
techniques have small power and delay overhead when compared with transitions on the bus. Although sequentiality is interrupted when
many of the existing methods in the literature. One of our proposed control flow instructions come to execution, it is still possible to encode
techniques is very suitable for encoding addresses that are sent from an the addresses effectively because the offset (arithmetic difference)
on-chip cache to the main memory when multiple application programs between consecutive addresses is typically a small integer value [l].
are executing on the processor in a time-sharing basis. For a computer Unfortunately, there is much less correlation between consecutive data
system without an on-chip cache, the proposed techniques decrease the addresses, and the offsets are usually much larger. Therefore, reducing
switching activity of data address and multiplexed address buses by an the transitions on a data address bus in the course of bus encoding is a
average of 55% and 67%, respectively. For a system with on-chip much more difficult task. In multiplexed address buses, compared to
cache, up to 55% transition reduction is achieved on a multiplexed data address buses, there is more correlation between addresses because
address bus between the internal cache and the external memory. of the presence of instruction addresses; thus, more reduction in activity
Assuming a lOpF per line bus capacitance, we show that power can potentially be obtained when compared to data addresses. However
reduction of up to 52% for an external data address bus and 42% for the presence of two different address streams (i.e., instruction and data
multiplexed bus between cache and main memory is achieved using our addresses) with different characteristics makes the encoding complex.
methods. In this paper we introduce low overhead encoding methods targeting
data address and multiplexed address buses. Our methods are
Categories and Subject Descriptors: B.4.3. [Input/output irredundant meaning that they do not require any additional line to be
and data communications]: Interconnections, Interfaces. added to the bus. This feature makes it possible to adopt our techniques
General Terms: Algorithms and Design. in an existing system without making any changes to the chip pinouts
and the designed printed circuit board. It will be seen that second group
of our encoders are very low overhead in terms of their power
1 INTRODUCTION consumption and delay. No time consuming operation like addition is
With the rapid increase in the complexity and speed of integrated used in them.
circuits and the popularity of portable embedded systems, power
consumption has become a critical design criterion. In today’s The rest of this paper is organized as follows. In Section 2 the related
processors, a large number of VO pins are dedicated to interface the works are described. Section 3 provides the insight and a top-level view
processor core to the external memory through high-speed address and of the proposed sector-based encoding techniques. Our encoding
data buses. Compared to a general-purpose high-performance techniques are presented in Section 4. Section 5 presents experimental
processor, an embedded processor has much fewer transistors results of utilizing our techniques to encode address buses and the
integrated on the chip. Therefore, the amount of the energy dissipated number of gates and power consumption of our encoders. Conclusion
at VO pins of an embedded processor is significant when it is and some future work are discussed in Section 6.
contrasted with the total power consumption of the processor. It is
desirable to encode the values sent over these buses to decrease the
switching activity and thereby reduce the bus power consumption. An 2 PREvIouswoRK
encoder on the sender side does this encoding whereas a decoder on the Musoll et al. proposed the working zone method in [6]. Their method
takes advantage of the fact that data accesses tend to remain in a small
set of working zones. For the addresses that lie in each of these zones a
Permission to make digital or hard copies of all or part of this work for relatively high degree of locality is observed. Each working zone
personal or classroom use is granted without fee provided that copies are not requires a dedicated register that is used to keep track of the accesses in
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the fmt page. To copy otherwise, or that zone. When a new address arrives. the offset of this address is
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ISLPED’O2. August 12-14,2002,Monterey, California, USA. 1 A multiplexed address bus refers to a bus that is used for sending
Copyright 2002 ACM 1-58113-475-4/02/0008...$5.00. both instruction and data addresses.

190
calculated with respect to all zone registers. The address is, thus, 3 OVERVIEW
mapped to the working zone with the smallest offset. If the offset is In this paper we propose three sector-based encoding techniques. All
sufficiently small, one-hot encoding is performed and the result is sent these techniques partition the address space into disjoint sectors. Each
on the bus using transition signaling (by transition signaling we mean
address is encoded based on the sector in which it is located. Usually
that instead of sending the code itself we XOR it with the previous the addresses in the same sector have a tendency to be close to each
value of the bus.) Otherwise, the address itself is sent over the bus. The
other; this means if we encode each address with respect to the
working zone method uses one extra line to show whether encoding has
previous address accessed in the same sector, spatial locality enables us
been done or the original value has been sent. It also uses additional
to develop an encoding technique that results in only a small number of
lines to identify the working zone that was used to compute the offset.
transitions on the bus.
Based on this information, the decoder on the other side of the bus can
uniquely decode the address. To better explain this, consider two cases. In the first case, a trace of
addresses, which are scattered all over the address space, is sent over a
The working zone method has also the ability to detect a stride in any
bus without any encoding. Because these addresses are dispersed, it is
of the working zones. Stride is a constant offset that occurs between
more likely that they have larger Hamming distances in their binary
multiple consecutive addresses repeatedly and can be used to representations. In the second case, we partition the address space into
completely eliminate the switching activity for those addresses. For
two sectors so that the original trace is divided into two sub-traces
instruction addresses, stride is the difference between the addresses of
based on this sectorization. In each sector, the addresses are closer to
consecutive instructions. Stride is very important when instruction
each other. If we sum up the inter-pattem transitions of these two sub-
address encoding is tackled. In fact, the large number of sequential
traces, this summation will be less than the total transition count for the
instructions with constant stride is responsible for the considerable
original trace. In practice, addresses are not partitioned into two sub-
transition savings that is usually seen in instruction address encoding
traces; rather it is the function of the encoding technique to realize this
techniques. For data addresses, stride can happen when, for example, a
“virtual separation” of addresses in the trace. This last statement reveals
program is accessing elements of an array in the memory. Apart from
the key insight for the proposed sector-based encoding techniques.
some special cases, detecting and utilizing strides have a very small
impact on decreasing the switching activity of data addresses. Let’s consider the data addresses for a memory system without cache.
Each data access generated by the CPU can be either for accessing a
The working zone method has a large area and power dissipation data value in a stack, which is used for storing retum addresses and
overhead due to the complexity of the decoder and encoder logic. In
local variables, or in a heap, which is used to hold the global data and
addition, it is ineffective for data address buses. This is largely due to
dynamically allocated variables. The stack may reside in some memory
the fact that offsets on a data address bus are often not small enough to segment, e.g., in the upper half of the memory, whereas the heap may
be mapped to one-hot codes; in such a case the original address is sent reside in another memory segment, e.g., in the lower half of the
over the bus, which usually causes many transitions on the bus. memory. Let H and S denote Heap and Stack accesses, respectively. By
Another encoding method that can be used for data addresses is the H+S access, we mean address bus transitions that occur when the
bus-invert method [7]. Bus-invert selects between the original and the stack is accessed after a heap access. S+H, S+S and H+H are
inverted pattem in a way that minimizes the switching activity on the defined similarly. The number of bit transitions caused by H+S and
bus. The resulting patterns together with an extra bit (to notify whether S+H accesses are often higher than those for the S+S and H+H
the address or its complement has been sent) are transition signaled accesses. This is because the heap and stack sectors are usually placed
over the bus. This technique is quite effective for reducing the number very far from one another in the memory address space. Per our
of 1’s in addresses with random behavior, but it is ineffective when detailed simulations on benchmark programs, if we apply the Offset-
addresses exhibit some degree of locality. To make the bus-invert XOR encoding technique [9] to a data address bus, S+H and H+S
method more effective, the bus can be partitioned into a handful bit- accesses will be responsible for almost 73% of the overall bit
level groups and bus-invert can be separately applied to each of these transitions. Now suppose we break the trace into two parts, one
groups. However, this scheme will increase the number of surplus bits includes accesses to the stack, whereas the other includes the accesses
required for the encoding, which is undesirable. to the heap. If we apply the Offset-XOR encoding to each of these two
traces separately and add total transitions of each trace, then up to 61%
In [8], Mamidipaka et al. proposed an encoding technique based on the reduction in the switching activity will be achieved with respect to the
notion of self-organizing lists. They use a list to create a one-to-one undivided trace.
mapping between addresses and codes. The list is reorganized in every
clock cycle to map the most frequently used addresses to codes with A key advantage of the encoding techniques presented in this work is
fewer ones. For multiplexed address buses, they used a combination of that they do not require any redundant bits. Obviously, in the codeword
their method and Increment-XOR [9]. In Increment-XOR, which is some bits are dedicated for conveying information about the sector that
proven to be quite effective on instruction address buses, each address has been used as a reference for encoding. The remaining bits are used
is XORed with the summation of the previous address and the stride; for encoding the offset or the difference between the new address and
the result is then transition signaled over the bus. Obviously, when the previous address accessed in that sector. The value of the last access
consecutive addresses grow by the stride, no transitions will happen on in the sector is kept in a special register called a sector head. Among
the bus. The size of the list in this method has a big impact on the our proposed techniques, the first two are only suitable when addresses
performance. To achieve satisfactory results, it is necessary to use a accessed in two separate sectors. The first method is very general. The
long list. However, the large hardware overhead associated with second method is not as general as the first one, but its implementation
maintaining long lists make this technique quite expensive. is much simpler and its encoder’s delay is smaller . The last method is
Furthermore, the encoder and the decoder hardware are practically an extension of the second method, which maintains its logic simplicity
complex and their power consumption appears to be quite large. and speed, yet it can support arbitrary number of sectors at the expense
of a marginal hardware overhead. The problem of how to partition the
Ramprasad et al. proposed a coding framework for low-power address address space into disjoint sectors so that addresses are evenly
and data buses in [9]. Although they have introduced remarkable distributed over these sectors is a critical one. As it will be explained
methods for encoding instruction addresses, their framework does not shortly, in the first method, the trace partitioning is dynamically
introduce effective techniques for data address and multiplexed address changed so that the encoding method can precisely track addresses in
buses. up to two sectors. However, in the second and third methods, the
partitioning is done statically. Obviously, a careless partitioning can

191
cause large chunks of addresses to lie in a single sector and a twice: once when X is closer to A and a second time when X is
consequent degradation in the performance of the encoding. We will closer to B.
show how this scenario can be prevented by a novel sectorization of the Using Lemma 2 we explain the way the DS encoder works. We call the
address space. two sector heads SHI and SHz. First, C(X,SHI;SH~)is calculated. This
is an (N-1)-bit integer. We use the MSB bit to send the sector
4 ENCODING TECHNIQUES information, which is the sector whose head was closer to the address
and was used for encoding. For example, 0 can be used for SHI and 1
4.1 Dynamic-Sector Encoder for SHz. Lets call this bit the Sector-ID. Therefore, the DS encoder is
Our first technique, named DS, stands for Dynamic Sector encoding. defined as follows:
DS encoder. partitions the address space into two sectors; thus, it has
two different sector heads. To encode an address, its offset is computed // DS Encoder
with respect to both sector heads. The closer sector head is chosen for Codeword = (Sector-ID)( I C(X,SH1;SH2)
encoding the address. The sector heads are dynamically updated. After Update the value of the SH that is closer to X with X
the codeword is computed based on the sector head that is closer to the This code is transition signaled over the bus (i.e., it is XORed with the
sourceword, that sector head is updated with the value of the previous value of the bus). Lemma 2 guarantees that for any arbitrary
sourceword, i.e., one of the sector heads always tracks the addresses. A values of sector heads, the N-bit address is mapped to an N-bit integer
detailed explanation is provided next. in a one-to-one manner. As a result, it is possible to uniquely decode
the numbers on the receiver side.
In the sequel, X and Y are assumed to be N-bit 2's-complement
integers. The binary digits of X are represented by XI to XN,where XN The LSB-Inv function used in the DS code is intended to reduce the
is the MSB. number of 1's in the generated code since this code will be transition
signaled on the bus and the number of 1's will determine the number of
Definition 1. LSB-Inv(X) is defined as: transitions on the bus. Note that this function is applied to 2's
if (X>=O) complement numbers to reduce the number of 1's in small negative
LSB-Inv(X) = X numbers. When applied to large negative numbers, then the number of
else 1's is increased. In practice and on average, the LSB-Inv function is
LSB-Inv(X)= X XOR (2N-1-1) quite effective since offsets in each sector tend to be small numbers.
Definition 2. Given two N-bit integers X and Y , distance of X from To obtain a better understanding of how the DS encoder works, let's
Y is defined as follows: ignore the function of the LSB-hv operator. Subsequently, C(X,A;B)
dist(X,Y) = {R}N-I becomes equal to a function that calculates the offset of X with respect
sign(X,Y) = RN to either A or B, whichever is closer and then deletes its MSB. This bit
where R= LSB-Inv(X - Y). Note that dist is an (N-1)-bit integer. deletion is necessary because one bit of the codeword is used to send
Notation {R}N-ldenotes casting R to (N-1) bits by suppressing its the Sector-ID; therefore, only the (N-1) remaining bits can be used for
MSB. the offset. Using (N-1) bits each sector head covers 2N-' numbers (we
consider a circular address space, i.e., 2N=0). Half of the covered
Definition 3. Given three N-bit integers A, B and X, we say X is numbers are greater than the sector head and the other half are smaller
closer to A when dist(X,A) is smaller than dist(X,B). (see Figure 1). Note that some addresses are covered twice, while some
are not covered at all. We call the first set of addresses SI and the
Lemma 1. As X sweeps the N-bit space, harfof the time X is closer
second SZ. The size of SI is equal to the size of Sz. Moreover, by adding
to A and haZf of the time it is closer to B. If X is closer to A, X
ZN-'or -ZN-'to S I , it can be mapped to SZ. The addresses in S I are
+2"will be closer to B and vice versa covered by both SH1 and SH2, but they are encoded with respect to the
Suppose all N-bit integers are put on the periphery of a circle in such a closer sector-head only. This means for each address in SI, one code is
way that 2N-1and 0 are next to each other. For any two integers X and wasted. These wasted codes can be used to encode the addresses in SZ.
Y, the length of the shortest arc between them is equal to dist(X,Y) as This is done by mapping SZ to SI and encoding the numbers with
defined above. The direction of this arc, either clockwise or not, is
shown by sign(X,Y). Based on this construction, one can easily verify
Lemma 1.

Definition 4. Given two arbitrary integers A and B in the N-bit


space, we define C(X,A;B) as follows:

S = Min {dist(X,A), dist(X,B)} // S is an (N-1)-bit integer.


2N-1
if (dist(X,A) c dist(X,B)) Figure 1- Address space, two sector heads and their coverage
M = sign(X,A) sets.
else
M = sign(X,B) // M is a single bit.

if (SN-1 == 1) respect to the sector-head, which is not closer. This makes DS a one-to-
C(X,A;B) = NOT (M 1 1 {S}N-2) // 1 I is the concatenation operator. one mapping.
else
C(X,A;B) = M 1 I CSlN-2 // C(X,A;B) is an (N-1)-bit integer. On the receiver side, the sector is directly determined based on the
MSB bit. Then, by using the value of the corresponding sector head in
Lemma 2. As X sweeps the N-bit space, C(X,A;B) will sweep the the receiver side (the sector heads on the sender and receiver sides are
(N-1)-bit space. Each integer in this space is covered exactly synchronized) and the remaining (N-1) bits of the codeword, the
sourceword X is computed. After that, it is determined whether the

192
computed X is actually closer to the sector head that has been used for sent to main memories, which will makeup a trace of addresses utterly
decoding. If true, the sourceword has been correctly calculated; scattered over the address space. A sector-based encoder needs more
otherwise, a value of 2"' should be added to X to produce the correct than two sectors to be of use for such a bus. Therefore, the importance
sourceword. of FS encoding techniques is realized.
/I DS Decoder 4.2.1 Fixed-Two-Sector Encoder
// Received Codeword after transition signaling is 2
U = LSB-Inv ( II 0 I I {Z}N-~) In the Fixed-Two-Sector (FTS) encoding, the address space is
if (2, == 0) partitioned into two sectors. The sectors are simply lower half and
X = SH1+ U upper half of the address space. There is one sector head for each of the
I f (dist(X,SH2) < dist(X,SHi)) sectors. Each sector head consists of (N-1) bits (As the MSB is known
x += 2N-1 by default). The MSB of the address or sourceword determines the
else sector head to be used for encoding. In addition, this MSB will be equal
X = SH2+ U to the MSB of the codeword. The remaining bits are XORed with the
I f (dist(X,SH1) < dist(X,SHz)) sector head to generate the codeword. As long as the address trace is
such that distant addresses lie in different sectors and, within the
sectors, the addresses show some degree of locality, this technique
if (dist(X,SH1) < dist(X(SH2)) helps reduce the transitions.
SHi=X
else FTS encoder works as follows:
SHz=X
Table 1 shows an example of using DS to encode a three-bit address // FTS encoder
space. The first column denotes the original addresses (sourcewords). if (XN ==1)
The two bold numbers in this column show the sector heads. The Codeword = 1 I I (SH2 XOR (X}N-I)
second and the third columns provide sign(X,SH) and dist(X,SH) with SH2= {x>N.i
respect to the two sector heads. The fourth column shows the SH that else
has been used in calculation of C(X,SHI;SH~).The fifth column shows Codeword = 0 I I (SH1 XOR {X}N.I)
C(X,SHI;SH~).The last column shows the codewords. The MSB of the SHi= {Xh-i
codewords shows the Sector-ID; 0 for the addresses that are encoded The codeword is transition signaled over the bus. SHI and SH2 are (N-
with respect to SHI and 1 for those encoded with respect to SHZ. 1)-bit numbers and they belong to lower half and upper half of the
memory map, respectively. Therefore, the MSB of the codeword in the
Table 1- An example of DS mapping, for a three-bit address space above equation will always be equal to the MSB of X. The simplicity
and sector heads equal to 001 and 011. of FTS comes from the fact that unlike DS, no subtraction and
comparison operations are required to determine the sector head that is
used. This also simplifies the decoder.
I I I
000 1,oo 1,lO 001,o 10 0 10 4.2.2 Fixed-Multiple-SectorEncoder
001 0,oo 1,Ol 001,o 00 0 00 In Fixed-Multiple-Sector (FMS) encoding the address space is
partitioned into multiple sectors. The number of allowed sectors is a
010 0,Ol 1 ,oo 011,l 10 110 power of 2.
t I
011 0,lO 0,oo 011,l 00 1 00
Consider FTS, if all addresses lie in the lower half of the memory, then
100 0,11 0,Ol 011,l 01 101 FTS encoding degenerates to that of XORing addresses with the bus
which clearly leads to poor performance. FMS avoids this problem by
101 1,11 0,lO 011,l 11 111
I I I using two techniques. First one is increasing the number of sectors.
110 1,lO 0,11 001,o 01 0 01 This helps to reduce the probability of having distant addresses in the
111 1,Ol 1,11 001,o 11 0 11
same sector. Second and more important is that FMS uses a segment-
based method to partition the address space, which further helps to
prevent the above problem. This method is described next.
4.2 Fixed-Sector Encoders
In this section we take a look at another set of sector-based encoding Suppose the address space is divided into 2M sectors. If the same
techniques that utilize fixed partitioning of address space. In each of the approach as FTS is used, the M most significant bits of the sourceword
sectors there is a sector head that is used for encoding the addresses that are needed to define the sectors. These bits will be the same for the
lie in the sector. These techniques, which are referred to as FS, are not sourceword and the codeword. The remaining bits in the sourceword
as general as DS, in the sense that sometimes even if consecutive are then XORed with the corresponding sector head to compose the
addresses are far from one another, they may end up being in the same codeword. However, the increased number of sectors may not be
sector. Subsequently, they are encoded with respect to the same sector enough to evenly distribute the addresses over the sectors. Consider the
head and the value of the encoding totally fades away. However, the FS main memory of a system with an internal cache. When compared to
techniques have two major advantages over DS. The first one is the the whole address space, the main memory can be so small that it may
simplicity of decoder and encoder and their negligible delay overhead totally reside in one of the 2M sectors. For this reason, we propose a
for the memory system and the second one is the extensibility of these new technique for partitioning the address space. Now, instead of using
methods. DS cannot be easily extended to support four sectors. If it is the MSB bits, some of the center bits in the addresses are used as
somehow extended, the encodeddecoder will be too complex and Sector-ID bits. Implicitly, this changes the sectors from a large
costly (in terms of area, delay and power overheads) to be used for low contiguous section of address space to smaller disjoint (dispersed)
power bus encoding schemes. In contrast, as it will be seen, FS can be sections. We call each of these subsections a segment of the sector and
easily extended to support an arbitrary number of sectors. This is this type of partitioning dispersed sectorization.
attractive when for example the target bus is the bus between the Now consider the two different sectorization methods as depicted in
cache and the Outside memory chip' Over that the Figure 2. In contiguous sectorization, the number of sectors that cover
addresses of instructions and data blocks of multiple applications are

193
the addresses between any two arbitrary numbers in the address space 5 EXPERMINETAL RESULTS
depends on the value of those boundary numbers and number of To evaluate our encoding techniques, we simulated SPEC2000
sectors. However, in dispersed sectorization, the size of the segments benchmark programs [13] using the simplescalar simulator [ 121. The
will also be a decisive factor to determine the number of sectors results are based on averaging over six programs named vpr, parser,
between two different addresses. Even if a subsection of the address equake, vortex, gcc and art. We generated three different kinds of
space is small, as long as that space includes a segment from each of address traces. These traces represent different memory configurations.
the sectors, addresses that lie in that space can fall in any of the 2M The first two traces were generated for a memory system without an
sectors. on-chip cache and are traces of data and multiplexed addresses,
respectively. A data address trace includes all data accesses and
assumes that data and instruction buses are separate. A multiplexed
address trace includes all instruction and data addresses. The third set
of traces was generated for a system with two levels of internal caches
and a memory management unit that translates second level cache
misses into physical addresses. The second level cache is a unified
cache; therefore, addresses that miss this cache are either instruction or
data addresses requesting for data and instruction blocks.
We have compared our proposed techniques with the Working Zone
method with two registers or briefly WZE-2. We first show a detailed
comparison of our techniques and WZE-2 when applied over the data
address traces. After that we present the final results of comparison of
our techniques and WZE-2 for all traces.
Fieure 2- Comoarisnn of cnntieunus versus disoersed
In Table 2 the detailed results have been shown for data address traces
Suppose there are 2M sectors in FMS. Each of the sector heads is an (no cache). For each trace we have shown the original number of
address bounded to one of the sectors. Consequently, M bits of each transitions (Base) and the number of suppressed transitions after
sector head are constant and known. Therefore, we only need (N-M) for applying different techniques. We have also shown the percentage
storing each sector head. However, to make the following pseudo-code reduction in the total number of transitions for each set of traces and
easier to understand we assume that sector heads are also N-bit each encoding technique.
numbers and those bits that are in the position of the sector-ID bits are
all zeros. We implicitly know the sector to which each sector head Table 2- Total suppressed transitions (in millions) and percentage
belongs. The Sector-ID bits in the sourceword are used to select the savings for traces of data address (without cache).
correct sector head for codeword calculation and they have to be copied
to the codeword exactly as they are. When these bits are XORed with
corresponding zeros in the sector head, they do not change.
// FMS encoder
// ZM sectors, ZM Sector Heads, SH[1]...SH[2M]
// Sector-ID bits: X+M...X,+~
( An M-bit number)
Codeword = X XOR SHIX+M...X,+l]
Update SHIXI+~...Xl+l]with X and make the Sector-ID bits zero.

A basic question to ask is, “Which bits do we use for Sector-ID?’ The
number of bits defines the number and size of sectors. The location of
bits defines the number and size of segments. In the sequel, we
consider a bus between an internal cache and an external memory. We
determine a range for the Sector-ID bits. As long as the Sector-ID bits
are within that range, the reduction in switching activity will be almost
the same.
We assume that Sector-ID bits are M contiguous bits in the address. The same procedure has been repeated for two other sets of traces and
Shifting the position of the Sector-ID bits to right will make the the results are shown in Table 3. The numbers in parentheses in the
segments smaller. A main memory usually occupies a small portion of FMS column shows the number of Sector-ID bits that have been used.
the address space. The segments should be small enough so that one As one can see, for data addresses and multiplexed addresses our
segment of each sector goes into the space taken by the memory. On techniques outperform the WZE-2. For the multiplexed address bus
the other hand, the Sector-ID bits should be shifted to left to make each with a cache, FMS performs significantly better than the other
segment at least as large as one physical page in the memory paging techniques.
system. Although consecutive pages in virtual addressing can be far
from each other when they are translated to physical memory Table 3- Average transition saving for different techniques.
addresses, all the addresses that are in the same physical page will be
I I WZ-2 I DS [ FTS I FMS
very close. Suppose that multiple programs are executed. All cache Data Address
misses cause requests to the external memory. Whenever a program is
switched out and a new program is executed, many second level misses
happen that read the code and data of the new program form
consecutive blocks in physical pages. The dispersed sectorization
scheme should work in a fashion to put all of these addresses in the
same sector. As long as the Sector-ID bits satisfy the two
Figures 2 and 3 depict the encoders for DS and FMS, respectively. The
aforementioned constraints, a good performance will be achieved.
encoder for FTS has not been shown because of its similarity with the
FMS encoder. The signals have been tagged with the bits they carry.

194
For example 32,30+1 represents bit 32 and bits 30 to 1. To better [3] W. Fomaciari, M. Polentamtti, D.Sciuto, and C. Silvano, “Power
compare the overhead of these techniques, we made a comparison Optimization of System-Level Address Buses Based on Software
between the power consumption of the encoders. For this, all three Profiling,” CODES, pp. 29-33,2000.
encoders and decoders were designed and their netlists were generated
in Berkeley Logic Interchange Format (BLF). The netlists were [4] L. Benini, G . De Michelli, E. Macii, M. Poncino, and S. Quer,
“System-Level Power Optimization of Special Purpose Applications:
optimized using SIS scriptmgged and mapped to a 1.5-volt 0 . 1 8 ~ The Beach Solution,” IEEE Symposium on Low Power Electronics
CMOS library using the SIS technology mapper. The U 0 voltage was and Design, pp. 24-29, Aug. 1997.
assumed to be 3.3 volts. The address traces were fed into a gate-level
simulation program called sim-power to estimate the power [5] P. Panda, N. Dutt,“Reducing Address Bus Transitions for Low Power
consumption of the encoders and decoders. The clock frequency was 50 Memory Mapping”, European Design and Test Conference, pp. 63-67,
MHZ. We also calculated the power dissipated on the bus in absence of March 1996.
any encoding. We assumed a lOpF capacitance per bus line. Different [6] E. Musoll, T. Lang, and J. Cortadella, “ Exploiting the locality of
bus configurations were assumed for evaluation of different encoding memory references to reduce the address bus energy”, Proceedings of
techniques. For DS and FTS we assumed a data address bus without Intemational Symposium on Low Power Electronics and Design, pp.
any internal cache. For FMS we experimented over the multiplexed bus 202-207, Monterey CA, August 1997.
between cache and main memory. FMS was the most efficient
technique for this bus. The results are shown in Table 4. The reduced [7] M. R. Stan, W. P. Burleson, “ Bus-Invert Coding for Low Power Y O ,
bus power shows the power after encoding. The last column shows the IEEE Transactions on Very Large Integration Systems, Vol. 3, No. 1,
percentage power saving after considering the extra overhead of pp. 49-58, March 1995.
decoder and encoder for different techniques. [8] M. Mamidipaka, D. Hirschberg, N. Dutt, “Low Power Address
Encoding using Self-organizing Lists”, Intemational Symposium on
Given the fact that Working-Zone encoder needs several subtractors for Low Power Design, Aug 2001.
calculating offsets with respect to zone registers, several comparators
for choosing the zone register with the smallest offset, a subtractor and [9] S. Ramprasad, N. Shanbhag, I. Hajj, “A Coding Framework for Low
several registers and comparators for detecting the stride, and a special Power Address and Data Busses”, IEEE Transactions on Very Large
table for encoding the offset to a one-hot code, its overhead will be Scale Integration Systems, 7:212:221, 1999.
much higher than that of our sector-based encoders. 1o]Y. Aghaghiri, F. Fallah, M. Ped”, “Irredundant Address Bus
Table 4- Percentage Power Saving for Different Techniques Encoding for Low Power”, Intemational Symposium on Low Power
Design, Aug 2001, pp 182-187.
I I OriginalBus I Encoder I ReducedBusPower I Power I 11]L. Macchiarulo, E. Macii, M. Poncino, “Low-energy for Deep-
submicron Address Buses”, Intemational Symposium on Low Power
Design, Aug 2001, pp176-181.
121www.simpIescaIar.org
In terms of delay and area, FMS produces the best results. It only [ 131www.spec.org
consists of four levels of logic, whereas the encoding techniques that
require adding addresses or incrementing them ([2],[3],[6], etc.) need
more than ten levels of logic for a 32-bit bus. The following table
shows the number of gates and area required for each of the sector-
based encoders.
Table 5- Comparison of the encoder hardware for the pi-oposed
techniques
I I Number of ”gates I Area (* 1000> 1
DS 505 I 488.7 I \ I I ‘ I
FTS 326
I<” I --_.”
3n5 II
-
FMS 313 I 282.7 -
Figure 3- DS Encoder
6 CONCLUSION
In this paper, we proposed a new approach toward bus encoding by
sectorization of address space. The sectorization can be either dynamic
or fixed. We compared different approaches in terms of power, speed
and extensibility. For the multiple fixed-sector method, we introduced a
technique that partitions the sectors evenly. We also showed that using
our methods up to 52% power reduction for an external data address
bus and 42% reduction for a multiplexed bus between internal cache
and external memory can be achieved.

7 REFERENCES i
I

[ 11 D. Patterson, J. Hennessy, “Computer Architecture, A Quantitative


Approach”, second edition, 1996. Figure 4- FMS Encoder
[2] L. Benini, G . De Micheli, E. Macii, D. Sciuto, C. Silvano,
“Asymptotic Zero-Transition Activity Encoding for Address Buses in
Low-Power Microprocessor-Based Systems,” IEEE 7th Great Lakes
Symposiumon VLSI, Urbana, IL, pp. 77-82, Mar. 1997.

195

You might also like