Week 1
Week 1
Topic
Lecture 1: Introduction to Embedded Systems
Introduction to embedded systems
3
What are Embedded Systems?
• Computers are embedded within other systems:
• What is “other systems”? – Hard to define.
• Any computing system other than desktop / laptop server.
• Typical examples:
• Washing machine, refrigerator, camera, vehicles, airplane,
missile, printer.
• Processors are often very simple and inexpensive
(depending on application of course).
• Billions of embedded system units produced yearly,
versus millions of desktop units.
4
Common Features of Embedded Systems
• They are special-purpose or single-functioned.
• Executes a single program, possibly with inputs from the environment.
• Imagine a microwave oven, a washing machine, an AC machine, etc.
• Tight constraints on cost, energy, form factor, etc.
• Low cost, low power, small size, relatively fast.
• They must react to events in real-time.
• Responds to inputs from the system’s environment.
• Must compute certain results in real-time without delay.
• The delay that can be tolerated depends on the application.
5
Typical Design Constraints
• Low Cost
• A sophisticated processor can increase the cost of the embedded system.
• Low Energy Consumption
• Many embedded systems operate on battery.
• Limited Memory
• Typically constrained to a finite and small amount of memory.
• Real-Time Response
• Most embedded systems are used for controlling some equipment.
• Must generate response within a specified time.
6
How to define an Embedded System?
• It is a microcontroller-based system that is
designed to control a function or range of
functions, and is not meant to be programmed by
the end user.
• The user may make choices concerning the
functionality but cannot change them.
• The user cannot make modifications to the software.
• Can you “program” your washing machine or
refrigerator or car?
• Not today … but not very sure of the near future.
7
• What embedded system is not …
• A microprocessor sitting inside a traditional computing system (like desktop, laptop, server,
etc.).
• It is actually:
• A microprocessor used to control another piece of technology (dedicated, and not general-
purpose).
• For low cost, microcontrollers that are typically used are single-chip devices containing
processor, memory, and I/O interfaces.
8
Applications of Embedded Systems
• Limited by imagination.
a) Consumer Segment: Refrigerator, washing machine, A/C machine, camera, microwave oven,
TV, security system, etc.
b) Office Automation: Printers, Fax machines, photocopying machines, scanners, biometric
scanner, surveillance camera, etc.
c) Automobiles: Air bags, anti-lock braking system (ABS), engine control, door lock, GPS
system, vehicular ad-hoc network (VANET), etc.
d) Communication: Mobile phones, network switches, WiFi hotspots, telephones,
MODEM, etc.
e) Miscellaneous: Automatic door locks, automatic baggage screening,
surveillance systems, intelligent toilet, etc.
9
10
Notable subsystems:
a) Analog-to-digital (ADC) interfaces
b) Digital-to-analog (DAC) interfaces
c) Pulse-width-modulation (PWM) interfaces
d) Timers and counters
e) In addition to … processor, memory, digital
I/O ports, etc.
11
12
Course Name: Embedded System Design with ARM
Faculty Name: Prof. Indranil Sen Gupta
Department : Computer Science and Engineering
Topic
Lecture 2: Design Considerations of Embedded Systems
Design challenges for embedded systems
3
Common Design Metrics
• Non Recurring Engineering (NRE) Cost: One-time initial cost of designing a system.
• Unit Cost: The cost of manufacturing each copy of the system, without counting
the NRE cost.
• Size: The actual physical space occupied by the system.
• Performance: This is measured in terms of the time taken or throughput.
• Power: The amount of (battery) power consumed by the system.
• Flexibility: The ability to change the functionality of the system.
4
• Maintainability: How easy or difficult it is to modify the design of the system?
• Time-to-prototype: How much time is required to build a working version of the
system (i.e. a prototype)?
• Time-to-market: How much time is required to develop a system such that it can
be released to the market commercially?
• Safety: Are there any adverse effects on the operating environment?
• Can be many more …
5
Design Tradeoff
Performance Size
NRE cost
6
• Often requires expertise in both hardware and software to take a proper
decision.
• Expertise in hardware may indicate the types of co-processor or I/O interfaces to use for
specific applications (e.g. analog ports, digital ports, PWM ports, etc.).
• Expertise in software is required to identify parts of the implementation that need to be
implemented in software and run on the microcontroller.
• Hardware / Software Co-design becomes important.
7
Time-to-market Design Metric
• This is a very crucial design metric.
• Must be strictly followed to make a product
commercially viable.
• Requires exhaustive market study and analysis.
• Starting from the point a product design
starts, we can define a Market Window
within which it is expected to have the
highest sales.
• Any delay can result in drastic reductions in
sales.
8
Loss due to Delayed Market Entry
Peak revenue
Revenues ($)
Market rise
• Maximum sale occurs at time W Market fall
9
Peak revenue
On-time
• Area (delayed) = ½ * (2W – D) * (W – D)
Market rise
Market fall
• Percentage revenue loss = D(3W –D)/2W2 * 100
Delayed
• Examples:
• 2W = 52 weeks, D = 4 weeks LOSS = 22%
D W 2W
• 2W = 52 weeks, D = 10 weeks LOSS = 50%
On-time Delayed Time
entry entry
10
NRE and Unit Cost Metrics
• If CNRE denotes the NRE cost and Cunit the unit cost of a product, then the total
cost for manufacturing N units is given by:
Total Cost = CNRE + N * Cunit
• Therefore, per-unit cost is given by: CNRE / N + Cunit
• Example:
• CNRE= Rs. 5,00,000 and Cunit = Rs. 5,000
• Total cost for manufacturing 100 units = 5,00,000 + 5000 * 100 = 10,00,000
• Per unit cost = 5,00,000 / 100 + 5000 = 10,000
11
Per-unit cost = CNRE / N + Cunit
• We can compare technologies by cost:
• Choice A: CNRE = Rs. 20,000, Cunit = Rs. 8,000
• Choice B: CNRE = Rs. 4,00,000, Cunit = Rs. 3,000
• Choice C: CNRE = Rs. 10,00,000, Cunit = Rs. 8,000
• Of course, time-to-marker cost must also be considered.
12
Performance Design Metric
13
14
Course Name: Embedded System Design with ARM
Faculty Name: Prof. Indranil Sen Gupta
Department : Computer Science and Engineering
Topic
Lecture 3: Microprocessors and Microcontrollers
Classification of computer architecture
Characteristics of a microprocessor
Characteristics of a microcontroller
Basic Operation of a Computing System
• The central processing unit (CPU) carries
out all computations.
• Fetches instructions from the program
memory and executes it; may require access
to data in data memory.
• The input/output block provides interface
with the outside world.
• Allows users to interact with the computing
system, and also observe the output results.
3
• About the instruction set architecture (ISA) of the CPU.
a) Complex Instruction Set Computer (CISC)
• Typically used in desktops, laptops and servers (courtesy Intel).
b) Reduced Instruction Set Computer (RISC)
• Typically used in microcontrollers, that are used to build embedded systems.
4
Classification of CPU Architecture
• Broadly two types of architectures:
a) Von Neumann Architecture
• Both instructions and data are stored in the same memory.
• This model is followed in conventional computing systems.
b) Harvard Architecture
• Instructions and data are stored in separate memories.
• Typically followed in microcontrollers, used for building embedded systems.
• Instructions are stored in a ROM (permanent), while temporary data are stored
in RAM.
5
Von Neumann Architecture Harvard Architecture
6
What is a Microprocessor?
7
Schematic Diagram
of Microprocessor
8
What is a Microcomputer?
• It is a computer system built using a microprocessor.
• Since a microprocessor does not contain memory and I/O, we have to interface these
to build a microcomputer.
• Too complex and expensive for very small and low-cost embedded systems.
9
Microcontrollers: The Heart of Embedded Systems
10
11
Microcontroller Packaging and Appearance
• When a PC executes a program, the program is first loaded from disk/SSD into an
allocated section of memory.
• Usually the program is loaded part by part to conserve memory space.
• There is a complicated operating system that handles all low-level operations (includes low-
level driver codes for interfacing with various devices).
• In a microcontroller there is no disk to read from.
• On-chip ROM stores the program that is to be executed.
• Size of the ROM limits the maximum size of the application.
• There is no operating system, and the program is ROM is the only program that
is running (must include low-level routines).
13
Where are Microcontrollers Used?
14
Evolution of Microcontrollers
• Microcontroller evolved from a microprocessor-based board-level design to a single
chip in the mid-1970's.
• As the process of miniaturization continued, all of the components needed for a controller were
built into a single chip.
• In the mid-1980’s, microcontrollers got embedded into a larger ASIC (Application
Specific Integrated Circuit).
• Microcontrollers are fabricated as a module inside a larger chip.
15
Advantages of using microcontrollers
• Fast and effective
• The architecture correlates closely with the problem being solved (control systems).
• Low cost / Low power
• High level of system integration within one component.
• Only a handful of components needed to create a working system.
• Compatibility
• Opcodes and binaries are the SAME for all 80x51 / ARM / PIC variants.
16
17
Course Name: Embedded System Design with ARM
Faculty Name: Prof. Indranil Sen Gupta
Department : Computer Science and Engineering
Topic
Lecture 4: Architecture of ARM Microcontroller (Part 1)
ARM series of microcontrollers
3
Why do we talk about ARM?
• One of the most widely used processor cores.
• Some application examples:
• ARM7: iPod
• ARM9: BenQ, Sony Ericsson
• ARM11: Apple iPhone, Nokia N93, N100
• 90% of 32-bit embedded RISC processors till 2010.
• Mainly used in battery-operated devices:
• Due to low power consumption and reasonably good
performance.
4
About ARM Processors
5
Popular ARM Architectures
• ARM7
• 3 pipeline stages (fetch/decode/execute)
• High code density / low power consumption
• Most widely used for low-end systems
• ARM9
• Compatible with ARM7
• 5 stages (fetch/decode/execute/memory/write)
• Separate instruction and data cache
• ARM10
• 6-stages (fetch/issue/decode/execute/memory/write)
6
ARM Family Comparison
ARM 7 (1995) ARM9 (1997) ARM10 (1999) ARM11 (2003)
Pipeline depth 3-stage 5-stage 6-stage 8-stage
Typical clock frequency (MHz) 80 150 260 335
Power (mW/MHz) 0.06 0.19 0.50 0.40
Throughput (MIPS/MHz) 0.97 1.1 1.3 1.2
Architecture Non Neumann Harvard Harvard Harvard
Multiplier 8 x 32 8 x 32 16 x 32 16 x 32
7
ARM is based on RISC Architecture
• RISC supports simple but powerful instructions that execute in a single cycle at
high clock frequency.
• Major design features:
• Instructions: reduced set / single cycle / fixed length
• Pipeline: decode in one stage / no need for microcode
• Registers: large number of general-purpose registers (GPRs)
• Load/Store Architecture: data processing instructions work on registers only;
load/store instructions to transfer data from/to memory.
• Now-a-days CISC machines also implement RISC concepts.
8
ARM Features
9
VonNeumann
Von Neumann Harvard
ARM9s
ARM7s and newers
and olders
Inst. Data
AHB
bus
Memory-mapped I/O: I D
• No specific instructions for I/O Cache Cache
MEMORY
• Use Load/Store instr. for I/O & I/O
• Peripheral’s registers at some
Bus Interface
memory addresses
AHB
bus
MEMORY
& I/O
10
A[31:0]
PC bus
Typical PC
Architecture REGISTER
BANK
ALU bus
Control Lines
INSTRUCCTION
DECODER
Multiplier
A bus
B bus
SHIFT
A.L.U.
Instruction Reg.
Thumb to
ARM
Write Data Reg. Read Data Reg.
translator
D[31:0]
11
12
Course Name: Embedded System Design with ARM
Faculty Name: Prof. Indranil Sen Gupta
Department : Computer Science and Engineering
Topic
Lecture 5: Architecture of ARM Microcontroller (Part 2)
Basic concepts of pipeline processing
Pipeline speedup
3
A Real-life Example W+D+R
T
• Suppose you have built a machine M
For N clothes, time T1 = N.T
that can wash (W), dry (D), and iron (R)
clothes, one cloth at a time.
• Total time required is T.
• As an alternative, we split the machine W D R
4
How does the pipeline work?
Cloth-1 Cloth-2 Cloth-3 Cloth-4 Cloth-5 W Finishing times:
• Cloth-1 – 3.T/3
• Cloth-2 – 4.T/3
Cloth-1 Cloth-2 Cloth-3 Cloth-4 D • Cloth-3 – 5.T/3
• …
• Cloth-N – (2 + N).T/3
Cloth-1 Cloth-2 Cloth-3 R
5
Extending the Concept to Processor Pipeline
• The same concept can be extended to hardware pipelines.
• Suppose we want to attain k times speedup for some computation.
• Alternative 1: Replicate the hardware k times cost also goes up k times.
• Alternative 2: Split the computation into k stages very nominal cost increase.
• Need for buffering:
• In the washing example, we need a tray between machines (W & D, and D & R) to keep the cloth
temporarily before it is accepted by the next machine.
• Similarly in hardware pipeline, we need a latch between successive stages to hold the
intermediate results temporarily.
6
Model of a Synchronous k-stage Pipeline
STAGE 1 STAGE 2 STAGE k
L S1 L S2 L … L Sk
Clock
• The latches are made with master-slave flip-flops, and serve the purpose of isolating
inputs from outputs.
• The pipeline stages are typically combinational circuits.
• When Clock is applied, all latches transfer data to the next stage simultaneously.
7
Speedup and Efficiency
Some notations:
τ :: clock period of the pipeline
ti :: time delay of the circuitry in stage Si
dL :: delay of a latch
Maximum stage delay τm = max {ti}
Thus, τ = τm + dL
Pipeline frequency f = 1/τ
• If one result is expected to come out of the pipeline every clock cycle, f will represent
the maximum throughput of the pipeline.
8
• The total time to process N data sets is given by
Tk = [(k – 1) + N].τ (k – 1) τ time required to fill the pipeline
1 result every τ time after that total N.τ
• For an equivalent non-pipelined processor (i.e. one stage), the total time is
T1 = N.k.τ (ignoring the latch overheads)
As N ∞, Sk k
9
• Pipeline efficiency:
• How close is the performance to its ideal value?
Sk N
Ek = =
k k + (N – 1)
• Pipeline throughput:
• Number of operations completed per unit time.
N N
Hk = =
Tk [k + (N – 1)].τ
10
14
12
10
8 k=4
Speedup
k=8
6
k = 12
4
0
1 2 4 8 16 32 64 128 256
Number of tasks N
11
ARM Pipelining Examples
ARM7TDMI Pipeline
1 Clock cycle
ARM9TDMI Pipeline
1 Clock cycle
12
Pipelining in ARM7
Simple instructions (like ADD, SUB)
can complete at a rate of one
1 FETCH DECODE EXECUTE instruction per cycle.
13
With more complex instructions … stall cycles possible
1 ADD FETCH DECODE EXECUTE
14
ARM7 3-state Pipeline
15
ARM7
ARM9
• In execution, the program counter (PC) is always 8 bytes ahead.
In execution,
execution pc always 8 bytes ahead
16
17
Course Name: Embedded System Design with ARM
Faculty Name: Prof. Indranil Sen Gupta
Department : Computer Science and Engineering
Topic
Lecture 6: Architecture of ARM Microcontroller (Part 3)
ARM processor modes and registers
3
Registers
4
Registers (contd.)
• The current processor mode governs which of several register sets is accessible.
• Only 16 registers are visible to a specific mode of operation. Each mode can
access:
• A particular set of registers r0-r12
• r13 (SP, stack pointer)
• r14 (LR, link register)
• r15 (PC, program counter)
• Current program status register (CPSR)
• Privileged modes (except System) can also access a particular SPSR.
5
General-purpose Registers
• 6 data types are supported (signed/unsigned)
• 8-bit byte, 16-bit half-word, 32-bit word
• All ARM operations are 32-bit.
General-purpose registers
• Shorter data types are only supported by data transfer operations.
31 24 23 16 15 87 0
8-bit
8 bit Byte
16-bit Half word
32-bit word
mode bits
overflow Thumb state
carry/borrow FIQ disable
zero IRQ
Q disable
negative
7
Special Registers
• PC (r15): Any instruction with PC as its destination register is a program branch.
• LR (r14): Saves a copy of PC when executing the BL instruction (subroutine call) or
when jumping to an exception or an interrupt handler.
• It is copied back to PC on return from those routines.
• SP (r13): There is no stack in the ARM architecture.
• R13 is reserved as a pointer for the software-managed stack.
• CPSR: Holds the visible status register.
• SPSR: Holds a copy of the previous status register while executing
exception or interrupt handler routines.
• Copied back to CPSR on return from exception or interrupt.
8
Program Counter
• When the processor is executing in ARM mode:
• All instructions are 32-bits wide, and must be word aligned.
• The last two bits of PC are zero (i.e. not used).
• Due to pipelining, PC points 8 bytes ahead of the current instruction, or 12 bytes ahead if the
current instruction includes a register-specified shift.
9
User, SYS FIQ IRQ SVC Undef Abort
r0
Register r1
User
r2
Organization r3
mode
r0-r7,
r4 r15, User User User User
Summary r5 and
cpsr
mode
r0-r12,
mode
r0-r12,
mode
r0-r12,
mode
r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr
10
31 28 27 24 23 16 15 8 7 6 5 4 0
N Z C V undefined I F T mode
f s x c
Program Status
Condition code flags Interrupt Disable bits. Register
N = Negative result from ALU I = 1: Disables the IRQ.
Z = Zero result from ALU F = 1: Disables the FIQ.
C = ALU operation Carried out
V = ALU operation oVerflowed T Bit (Arch. with Thumb mode only)
Mode bits T = 0: Processor in ARM state
10000 User
10001 FIQ T = 1: Processor in Thumb state
10010 IRQ
10011 Supervisor
10111 Abort
11011 Undefined
11111 System
11
Exception Handling
• When an exception occurs, the processor
• Copies CPSR into SPSR_<mode> 0x1C FIQ
• Sets appropriate bits in CPSR 0x18 IRQ
0x14 (Reserved)
• Changes to ARM state
0x10 Data Abort
• Changes to related mode
0x0C Prefetch Abort
• Disables IRQ, FIQ
0x08 Software Interrupt
• Stores return address in LR_<mode> 0x04 Undefined Instruction
12
• Exception handling in ARM is controlled through an area of memory called the
vector table.
• Exists at the bottom of the memory map from 0x00 to 0x1c.
• Within this table, one word is allocated to each of the various exception types.
• This word will contain some form of ARM instruction that should perform a branch.
• Does not contain an address.
13
ARM and Thumb Instruction Set
• Most ARM implementations provide two instruction sets:
a) 32-bit ARM instruction set
b) 16-bit Thumb instruction set
ARM (cpsr T = 0) Thumb (cpsr T = 1)
Instruction size 32-bit 16-bit
Core instructions 58 30
Conditional execution Most Only branch instructions
Data processing Access to barrel shifter and ALU Separate barrel shifter and ALU
instructions instructions
Program status register Read-write in privileged mode No direct access
Register usage 15 GPRs + pc 8 GPRs + 7 high registers + pc
14
What is Conditional Execution?
15
16