Lecture 02 Hardware
Lecture 02 Hardware
Hardware
Motivation
2: Design
Design
Application Knowledge
Specification repository
3: 8:
ES-hardware 6: Application Test
mapping
4: system 7: Optimization
software (RTOS,
5: Evaluation &
middleware, …) validation (energy, cost,
performance, …)
- 3-
TU Dortmund
cyber-physical systems
- 4-
TU Dortmund
Heating
Lights
Engine control
Power supply
…
Robots
Heating: www.masonsplumbing.co.uk/images/heating.jpg
Robot:: Courtesy and ©: H.Ulbrich, F. Pfeiffer, TU München
- 5-
TU Dortmund
Sensors
- 7-
TU Dortmund
Based
Basedon
oncharge
chargetransfer
transferto
tonext
next pixel
pixelcell
cell
- 8-
TU Dortmund
Based on standard
production process
for CMOS chips,
allows integration
with other
components.
- 9-
TU Dortmund
- 10 -
TU Dortmund
© P. Marwedel, 2010
- 11 -
TU Dortmund
Artificial eyes
© Dobelle Institute
(was at www.dobelle.com)
- 12 -
TU Dortmund
© Dobelle Institute
- 13 -
TU Dortmund
Movie - 14 -
TU Dortmund
Other sensors
Pressure sensors
Proximity sensors
- 15 -
TU Dortmund
Signals
- 16 -
Discretization
Discretization of time
s : DT DV
Sample-and-hold circuits
- 18 -
TU Dortmund
Sample-and-hold circuits
e(t) is a mapping ℝ ℝ
h(t) is a sequence of values or a mapping ℤ ℝ
- 19 -
TU Dortmund
- 20 -
TU Dortmund
Approximation of a K=1
square wave (1)
Target: square wave
with period p1=4
K
4 2 t
e' K (t ) sin
k 1, 3, 5,.. k pk
K=3
- 21 -
TU Dortmund
Approximation of a K=5
square wave (2)
K
4 2 t
e' K (t ) sin
k 1, 3, 5,.. k 4/k K=7
- 22 -
TU Dortmund
Approximation of a K=9
square wave (3)
K
4 2 t
e' K (t ) sin
k 1, 3, 5,.. k 4/k K=11
K=11
Applet at © http://
www.jhu.edu/~signals/fourier2/index.html- 23 -
TU Dortmund
Linear transformations
- 24 -
TU Dortmund
Aliasing
2 t 2 t
e3 (t ) sin 0 . 5 sin
8 4
2 t 2 t 2 t
e4 (t ) sin 0.5 sin 0. 5 sin
8 4 1
Periods of 8,4,1
Indistinguishable if sampled at integer times, ps=1
Matlab demo - 25 -
TU Dortmund
Aliasing (2)
- 26 -
TU Dortmund
Anti-aliasing filter
g (t ) Ideal filter
e(t )
Realizable
filter fs /2 fs
- 27 -
TU Dortmund
http://en.wikipedia.org/wiki/Image:
Moire_pattern_of_bricks_small.jpg - 28 -
TU Dortmund
Examples of Aliasing in
computer graphics (2)
Filtered &
Original (pdf screen copy) sub-
sampled
Sub-
sampled,
no filtering
http://www.niirs10.com/
Impact of
Resources/Reference rasterization
Documents/Accuracy in Digital
Image Processing.pdf
- 29 -
TU Dortmund
s: DT DV
- 30 -
TU Dortmund
*
Encodes input
number of most
significant ‘1’ as an
unsigned number,
e.g.
“1111” -> “100”,
“0111” -> “011”,
“0011” -> “010”,
“0001” -> “001”,
“0000” -> “000”
(Priority encoder).
- 31 -
TU Dortmund
“11“
“10“
“01“
“00“
Vref /4 Vref /2 3Vref /4 Vref h(t)
- 32 -
TU Dortmund
Resolution
VFSR
Q with
n
- 33 -
TU Dortmund
- 34 -
TU Dortmund
Higher resolution:
Successive approximation
h(t)
V-
w(t)
1100
1011
Vx
h(t) 1010
1000
V-
t
- 36 -
TU Dortmund
(used in multimeters)
(using single bit D/A-converters;
common for high quality audio equipments)
[http://www.beis.de/Elektronik/
DeltaSigma/DeltaSigma.html]
(Pipelined flash
converters)
Quantization Noise
Assuming
h(t) “rounding“
(truncating)
towards 0
w(t)
w(t)-h(t)
- 38 -
TU Dortmund
Quantization Noise
h(t)
Assuming
“rounding“
w(t) (truncating)
towards 0
h(t)-w(t)
- 39 -
TU Dortmund
MATLAB demo - 40 -
TU Dortmund
- 41 -
TU Dortmund
Summary
Hardware in a loop
Sensors
Discretization
• Definition of signals
• Sample-and-hold circuits
- Aliasing (and how to avoid it)
- Nyquist criterion
• A/D-converters
- Flash-based
- Successive approximation
- Quantization noise
- 42 -
Hardware
- Processing -
Embedded System
cyber-physical systems
- 44 -
TU Dortmund
Processing units
- 45 -
TU Dortmund
Importance
of Energy
Efficiency
p ower on“
e rent of silic
“inh iency
c
effi
Hugo De Man,
©
- 46 -
TU Dortmund
E P dt
P
E'
E
t
In many cases, faster execution also means less energy,
but the opposite may be true if power has to be increased
to allow faster execution.
- 47 -
TU Dortmund
- 48 -
TU Dortmund
Nuclear reactor
Prescott: 90 W/cm²,
90 nm [c‘t 4/2004]
© Intel
M. Pollack,
Micro-32
- 49 -
TU Dortmund
http://
www.phys.ncku.edu.tw/
~htsu/humor/fry_egg.html
- 50 -
TU Dortmund
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
- 51 -
TU Dortmund
- 53 -
TU Dortmund
Trend
towards
implementation
in Software
HW
HWsynthesis
synthesisnot
not covered
coveredin
inthis
thiscourse.
course.
[http://www.molecularimprints.com/Technology/
tech_articles/MII_COO_NIST_2001.PDF9]
- 54 -
TU Dortmund
1. Energy/
power-
efficiency
- 55 -
TU Dortmund
si ult er
fa ow
interrupts
P
160ms
al
gn
SLEEP: Shutdown of on- 10µs
chip activity 90µs
IDLE Power fault SLEEP
signal
50mW 160µW
- 56 -
TU Dortmund
- 57 -
TU Dortmund
OS should
schedule
distribution
of the
energy
budget.
Basic equations
Power: P ~ VDD² ,
Maximum clock frequency: f ~ VDD ,
Energy to run a program: E = P t, with: t = runtime (fixed)
Time to run a program: t ~ 1/f
Changes due to parallel processing, with operations per clock:
Clock frequency reduced to: f’ = f / ,
Voltage can be reduced to: VDD’ =VDD / ,
Power for parallel processing: P° = P / ² per operation,
Power for operations per clock: P’ = P° = P / ,
Time to run a program is still: t’ = t,
Energy required to run program: E’ = P’ t = E / Rough
Argument in favour of voltage scaling, approxi-
VLIW processors, and multi-cores mations!
- 60 -
TU Dortmund
- 61 -
TU Dortmund
- 63 -
TU Dortmund
Code-size efficiency
Compression techniques (continued):
• 2nd instruction set, e.g. ARM Thumb instruction set:
16-bit Thumb instr.
001 10 Rd Constant ADD Rd #constant
Dynamically
decoded at
major
source=
run-time
opcode minor
opcode destination zero extended
[Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size
Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp 223-267]
- 65 -
TU Dortmund
http://www-perso.iro.umontreal.ca/~latendre/
codeCompression/codeCompression/node1.html
http://www.iro.umontreal.ca/~latendre/compactBib/
- 68 -
TU Dortmund
- 69 -
TU Dortmund
ADSP 2100
- 71 -
TU Dortmund
Heterogeneous registers
Example
Example(ADSP
(ADSP210x):
210x):
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
Different
Differentfunctionality
functionalityof
ofregisters
registersAn,
An,AX,
AX,AY,
AY,AF,MX,
AF,MX,MY,
MY,MF,
MF,MR
MR
- 72 -
TU Dortmund
Example
Example(ADSP
(ADSP210x):
210x):
Data memory can only be
fetched with address contained
in A,
but this can be done in parallel
with operation in main data path
(takes effectively 0 time).
A := A ± 1 also takes 0 time,
same for A := A ± M;
A := <immediate in instruction>
requires extra instruction
Minimize load immediates
Optimization in optimization
chapter
- 73 -
TU Dortmund
Modulo addressing
sliding window
Modulo addressing: w
Am++ Am:=(Am+1) mod n
(implements ring or circular
buffer in memory)
t
t1
.. ..
n most w[t1-1] w[t1-1]
recent w[t1] w[t1]
values w[t1-n+1] w[t1+1]
w[t1-n+2] w[t1-n+2]
.. ..
Memory, t=t1 Memory, t2= t1+1
- 74 -
TU Dortmund
Saturating arithmetic
- 75 -
TU Dortmund
Example
MATLAB Demo - 76 -
TU Dortmund
Fixed-point arithmetic
Shifting
Shiftingrequired
requiredafter
aftermultiplications
multiplicationsand
anddivisions
divisionsin
in
order
ordertotomaintain
maintainbinary
binarypoint.
point.
- 77 -
TU Dortmund
Real-time capability
P
D
AX AY MX MY
Address- AF MF
registers
A0, A1,
A2 ..
+,-,..
*
Address
AR
+,-
generation
unit (AGU) MR
Multimedia-Instructions/Processors
+
4 additions per instruction;
carry disabled at word
boundaries.
- 81 -
TU Dortmund
- 83 -
TU Dortmund
- 84 -
TU Dortmund
Appli-
cation
Scaled
interpolation
between two
images
Next word =
next pixel,
same color. pxor mm7,mm7 ;clear register mm7
movq mm3,fade_val;load scaling value
4 pixels movd mm0,imageA ;load 4 red pixels for A
processed at movd mm1,imageB ;load 4 red pixels for B
a time. unpcklbw mm1,mm7 ;unpack,bytes to words
unpcklbw mm0,mm7 ;upper bytes from mm7
psubw mm0,mm1 ;subtract pixel values
pmulhw mm0,mm3 ;scale
paddw mm0,mm1 ;add to image B
packuswb mm0,mm7 ;pack, words to bytes - 85 -
TU Dortmund
Summary
Hardware in a loop
Sensors
Discretization
Information processing
• Importance of energy efficiency
• Special purpose HW very expensive
• Energy efficiency of processors
• Code size efficiency
• Run-time efficiency
• MPSoCs
• Reconfigurable Hardware
D/A converters
Actuators
- 87 -
Embedded System
- 89 -
TU Dortmund
- 90 -
TU Dortmund
0 1 1 0 1 1 0
Instr. Instr. Instr. Instr. Instr. Instr. Instr.
A B C D E F G
- 92 -
TU Dortmund
bundle 1 bundle 2
- 94 -
TU Dortmund
- 95 -
TU Dortmund
410M transistors
374 mm2 die size
6MB on-die L3
cache
1.5 GHz at 1.3V
[ftp://download.intel.com/design/
itanium2/download/
madison_slides_r1.pdf]
© Intel, 2003 - 96 -
TU Dortmund
Philips
TriMedia-
Processor
For
For
multimedia-
multimedia-
applications,
applications,
up
upto
to55
instructions/
instructions/
cycle.
cycle.
http://www.nxp.com/acrobat/
datasheets/
PNX15XX_SER_N_3.pdf
(incompatible with firefox?)
© NXP
- 97 -
TU Dortmund
- 98 -
TU Dortmund
- 99 -
TU Dortmund
Predicated execution:
Implementing IF-statements “branch-free“
- 101 -
TU Dortmund
Predicated execution:
Implementing IF-statements “branch-free“: TI C6x
- 102 -
TU Dortmund
Microcontrollers
- MHS 80C51 as an example -
http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 104 -
TU Dortmund
Multiprocessor systems-on-a-chip
(MPSoCs) (2)
http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 105 -
TU Dortmund
Multiprocessor systems-on-a-chip
(MPSoCs) (3)
http://www.mpsoc-forum.org/2007/slides/Hattori.pdf
- 106 -
TU Dortmund
ower n“
p o
e rent of silic
“inh iency
c
effi
Hugo De Man,
©
IMEC, Philips, 2007
- 109 -
TU Dortmund
Reconfigurable Logic
- 110 -
TU Dortmund
- 112 -
TU Dortmund
Memories typically
used as look-up
tables to implement
any Boolean
function of 6
variables.
- 113 -
TU Dortmund
Virtex 5 SliceM
- 114 -
TU Dortmund
Resources
available
in Virtex 5
devices
- 115 -
Hierarchical Routing Resources;
no routing plan found for Virtex 5.
TU Dortmund
- 116 -
TU Dortmund
- 117 -
Memory
TU Dortmund
Memory
Memories?
Oops!
Memories!
- 119 -
TU Dortmund
- 120 -
TU Dortmund
1.8 7 14
1.7 6 12
1.6 10
5
1.5
4 8
1.4
3 6
1.3
1.2 2 4
1 0 0
16 32 64 128 16 32 64 128 16 32 64 128
- 121 -
TU Dortmund
Processor Energy
Cache ($)-less Main Mem.
monoprocessor Energy
71%
Proc. Energy
Main Mem.
Energy
[M. Verma, P. Marwedel: Advanced Memory Optimization Techniques for Low-Power Embedded Processors, Springer, 2007]
- 122 -
TU Dortmund
Dcache
16%
Strong ARM
- 123 -
TU Dortmund
[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell
Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
- 124 -
TU Dortmund
4
(1 U
CP
2x “Memory wall”
every 2 problem
2
years
(1. 07 p.a.)
DRAM
1
0 1 2 3 4 5 years
[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]
- 125 -
TU Dortmund
|Set| = 2
Address
Tag Index
= =
1
Data
- 126 -
TU Dortmund
Hierarchical memories
using scratch pad memories (SPM)
- 127 -
TU Dortmund
200
150
116
mA
100 77,2 82,2
50 1,16
48,2 50,9 44,4 53,1
0
Prog Main/ Data Prog Main/ Data Prog SPM/ Data Prog SPM/ Data SPM
Main SPM Main
- 128 -
TU Dortmund
9
.
7
Energy per access [nJ]
6
Scratch pad
5 Cache, 2way, 4GB space
4 Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
3
0
256 512 1024 2048 4096 8192 16384
memory size [R. Banakar, S. Steinke, B.-S. Lee, 2001]
- 129 -
TU Dortmund
- 130 -
TU Dortmund
Summary
Processing
• VLIW/EPIC processors
• MPSoCs
FPGAs
Memories
• “Small is beautiful”
(in terms of energy consumption, access times, size)
- 131 -
Communication
cyber-physical systems
- 133 -
TU Dortmund
Communication
- Requirements -
Real-time behavior
Efficient, economical
(e.g. centralized power supply)
Appropriate bandwidth and communication delay
Robustness
Fault tolerance
Diagnosability
Maintainability
Security
Safety
- 134 -
TU Dortmund
Basic techniques:
Electrical robustness
Single-ended vs. differential signals
ground
Evaluation
Advantages:
Subtraction removes most of the noise
Changes of voltage levels have no effect
Reduced importance of ground wiring
Higher speed
Disadvantages:
Requires negative voltages
Increased number of wires and connectors
Applications:
USB, FireWire, ISDN
Ethernet (STP/UTP CAT 5/6 cables)
differential SCSI
High-quality analog audio signals (XLR) © wikipedia
- 136 -
TU Dortmund
Communication
- Requirements -
Real-time behavior
Efficient, economical
(e.g. centralized power supply)
Appropriate bandwidth and communication delay
Robustness
Fault tolerance
Diagnosability
Maintainability
Security
Safety
- 137 -
TU Dortmund
- 138 -
TU Dortmund
Real-time behavior
Carrier-sense multiple-access/collision-detection
(CSMA/CD, Standard Ethernet) no guaranteed response time.
Alternatives:
token rings, token busses
Carrier-sense multiple-access/collision-avoidance
(CSMA/CA)
• WLAN techniques with request preceding transmission
• Each partner gets an ID (priority). After each bus transfer,
all partners try setting their ID on the bus; partners
detecting higher ID disconnect themselves from the bus.
Highest priority partner gets guaranteed response time;
others only if they are given a chance.
- 139 -
TU Dortmund
http://www.ece.cmu.edu/
~koopman/jtdma/
jtdma.html#classical
FlexRay
TDMA in FlexRay
http://www.tzm.de/FlexRay/FlexRay_Introduction.html
Bandwidth used only when it is actually needed.
- 142 -
TU Dortmund
seite=introduction_flexray_en&root=5873&system_id=5875&com=formular_suche_treff
e.g. so-called “babbling idiots”
http://www.ixxat.de/index.php?
- 144 -
TU Dortmund
Communication:
Hierarchy
Sensor/actuator busses
- 145 -
TU Dortmund
Other busses
- 147 -
D/A-Converters
TU Dortmund
cyber-physical systems
- 149 -
TU Dortmund
i 0
k k
[Jewett and
Count current flowing away from node as negative. Serway, 2007].
- 150 -
TU Dortmund
Example:
The principle of conservation of energy
implies that:
The sum of the potential
differences (voltages) across all
elements around any closed circuit
must be zero
[Jewett and Serway, 2007].
- 152 -
TU Dortmund
- 154 -
TU Dortmund
Loop rule:
x0 I 0 8 R V Vref 0
Vref
I 0 x0
8 R
Vref
In general: I i xi
2 3 i R
Junction rule: I Ii
i
- 155 -
TU Dortmund
Hence:
Op-amp turns
current I ~ nat
R1 3 R1
y Vref
8 R i 0
xi 2 i
Vref
8 R
nat ( x) (x) into a voltage
~ nat (x)
- 156 -
TU Dortmund
*
* Assuming
“zero-order
hold”
Possible to
reconstruct
input
signal?
- 157 -
Sampling
Theorem
TU Dortmund
- 159 -
TU Dortmund
No influence at ts+n
- 160 -
TU Dortmund
- 161 -
TU Dortmund
* Assuming 0-
order hold
- 162 -
TU Dortmund
fs /2 fs
Filter removes high frequencies present in y(t)
- 163 -
TU Dortmund
Sampling theory:
- 164 -
TU Dortmund
Limitations
- 165 -
TU Dortmund
Output
- 166 -
TU Dortmund
cyber-physical systems
- 167 -
Actuators
TU Dortmund
Actuators
(© MCNC)
- 169 -
TU Dortmund
Actuators (2)
Courtesy and ©: E.
Obermeier, MAT, TU Berlin
http://www.piezomotor.se/pages/PWtechnology.html
http://www.elliptec.com/fileadmin/elliptec/User/Produkte/Elliptec_Motor/Elliptecmotor_How_it_works.h
- 170 -
TU Dortmund
Secure Hardware
Summary
Hardware in a loop
Sensors
Discretization
Information processing
• Importance of energy efficiency, Special purpose HW very
expensive, Energy efficiency of processors, Code size
efficiency, Run-time efficiency
• Reconfigurable Hardware
Communication
D/A converters
Sampling theorem
Actuators
- 172 -