ARM Theory
ARM Theory
Cortex-M Processors
2
What am I going to get from this course?
• Programming Micro-controllers using ‘C'
• Learn about embedded software development and debugging
using STM Cube IDE
• Learn about Mixed ‘C’ and Assembly Coding
• Demystifying Memory, Bus interfaces, NVIC, Exception handling
with lots of animation
• Low level register Programming for interrupts, System
Exceptions, Setting Priorities, Preemption etc.
• Learn writing IRQ handlers , IRQ numbers, NVIC and many more
• Learn about OS related features like SVC, SysTick, PendSv and
many more
3
Cortex M3 @ S/W Developer point of View
• Programming Model
• How exceptions are handled
• The Memory Map
• Peripheral Interfacing
• How to use software driver libraries from Microcontroller
Vendor.
• CMSIS Core API’s
• STM32CUBE Libraries
4
Agenda
What are the ARM Cortex M Processors?
The CortexM3 and M4 Processors
The Cortex-M Processor Family
Advantage of the Cortex-M Processors
Low Power
Performance
Energy Efficiency
Code Density
Interrupts
Ease of use, C friendly
Scalability
Application of the ARM Cortex-M Processor
Background and History
ARM processor evolution
Architecture versions and Thumb ISA
5
What’s Happening in Microcontrollers?
6
What are ARM Cortex M processors?
The Cortex-M3 and Cortex-M4 are processors designed by
ARM. The Cortex-M3 processor was the first of the Cortex
generation of processors, released by ARM in 2005 (silicon
products released in 2006). The Cortex-M4 processor was
released in 2010 (released products also in 2010).
The Cortex-M3 and Cortex-M4 processors use a 32-bit
architecture.
Internal registers in the register bank, the data path, and the
bus interfaces are all 32 bits wide. The Instruction Set
Architecture (ISA) in the Cortex-M processors is called the
Thumb ISA and is based on Thumb-2 Technology which
supports a mixture of 16-bit and 32-bit instructions.
7
Microcontroller Containing ARM IP
Products
8
ARM Cortex-M3 Microcontroller
18 x 32-bit registers
Excellent compiler
target
Reduced pin count
requirements Efficient interrupt
handling Power management
Efficient debug and
development support features
Breakpoints, Watchpoints,
Flash Patch support,
Instruction Trace
Strong OS support
User/Supervisor
model
OS support
features
9
Designed to be fully programmed in C (even reset, interrupts
and exceptions)
ARM Cortex-M3 Microcontroller
ARMv7M Architecture
No Cache - No MMU
Debug is optimized for microcontroller applications
Vector table contains addresses, not instructions
DIV instruction
Interrupts automatically save/restore state
Exceptions programmed in C (No Coprocessor 15 - All registers are memory-mapped)
Interrupt controller is part of Cortex-M3 macrocell
Fixed memory map
Bit-banding
Non-Maskable Interrupt (NMI)
Only one processor status reg
Thumb-2 processing core
Mix of 16 and 32 bit instructions for very high code
density Gives complete Thumb compatibility
10
The Cortex M Processor Family
11
Advantage of Cortex –M Processor
• Low Power
Currently, many Cortex-M microcontrollers have power consumption
of less than 200 uA/MHz, with some of them well under 100
uA/MHz. In addition, the Cortex-M processors also include support
for sleep mode features and can be used with various advanced
ultra-low power design technologies.
• Performance
The Cortex-M3 and Cortex-M4 processors can deliver over 3
CoreMark/MHz & 1.25 DMIPS
• Energy Efficiency
• Code Density
• Interrupts
• Ease of use, C friendly
12
Features Cont.
• Scalability
• Debug Friendly
• OS Support
• Versatile system features
• Bit Banding
• MPU
14
ISA Enhancement and Evolution
15
Introduction to Embedded
Software Development
CDAC ACTS, Pune
Agenda
What are inside typical ARM microcontrollers?
What you need to start
Development Suites
Development Boards
Debug Adaptor
Documentation and other resources
Software Development Flow
Compiling your Application
Software flow
Polling
Interrupt Driven
Multi Tasking System
Input, output and peripherals accesses
Microcontroller interfaces
Cortex Microcontroller software interface standard (CMSIS)
17
What are inside typical ARM µC.
In many microcontrollers, the processor takes less than 10% of the
silicon area, and the rest of the silicon die is occupied by other
components such as:
• Program memory (e.g., flash memory)
• SRAM
• Peripherals
• Internal bus infrastructure
• Clock generator (including Phase Locked Loop), reset generator, and
distribution network for these signals
• Voltage regulator and power control circuits
• Other analog components (e.g., ADC, DAC, voltage reference circuits)
• I/O pads
18
What you need to start
• Development suites
• STM32CubeIDE (PG-DESD)
• Development Board
• STM32F4 Discovery Board (PG-DESD)
• Debug Adaptor
• STLinkv2 (PG-DESD)
• Documents and other Resources
• Link to Resources
19
Software Development Flow
20
Software Compiling Flow
21
Polling Flow
22
Interrupt Driven
23
Managing the ISR
• The code inside an ISR is generally kept as short as possible,
in order to minimize the amount of time spent in the
interrupt.
• This is important for a few reasons:
• If the interrupt occurs very often and the ISR contains a lot of
instructions, there is a chance that the ISR won't return before
being called again.
• For communication peripherals such as UART or SPI, this will mean
dropped data (which obviously isn't desirable).
• Another reason to keep the code short is because other interrupts
also need to be serviced.
• One way of achieving minimal instructions and responsibility in
the ISR is to do the smallest amount of work possible inside the
ISR and then set a flag that is checked by code running in the
super loop.
24
Direct Memory Access Systems
25
DMA Transfer Use Case
• In the case of receiving a stream of bytes from a UART without DMA,
information from the UART will move into the UART registers, be read by
the CPU, and then pushed out to RAM for storage:
1. The CPU must detect when an individual byte (or word) has been received,
either by polling the UART register flags, or by setting up an interrupt service
routine that will be fired when a byte is ready.
2. After the byte is transferred from the UART, the CPU can then place it into
RAM for further processing.
3. Steps 1 and 2 are repeated until the entire message is received.
• When DMA is used in the same scenario, the following happens:
1. The CPU configures the DMA controller and peripheral for the transfer.
2. The DMA controller takes care of ALL transfers between the UART peripheral
and RAM. This requires no intervention from the CPU.
3. The CPU will be notified when the entire transfer is complete, and it can go
directly to processing the entire byte stream.
26
Need for an RTOS..!!
• If the system is dealing with a limited number of responsibilities and none
of them are especially complicated or time-consuming, then there may be
no need for anything more sophisticated than a super loop.
• If the system is also responsible for generating a User Interface (UI),
running complex time-consuming algorithms, or dealing with complex
communication stacks, it is very likely that these tasks will take a non-
trivial amount of time.
• These are the types of systems where an RTOS is needed. Guaranteeing
that the most time critical tasks are always running when necessary and
scheduling lower priority tasks to run whenever spare time is available is
a strong point of preemptive schedulers. In this type of setup, the critical
sensor readings could be pushed into their own task and assigned a high
priority – effectively interrupting anything else in the system (except ISRs)
when it was time to deal with the sensor. That complex communication
stack could be assigned a lower priority than the critical sensor.
27
Multi Tasking System
In these applications, a Real-
Time Operating System(RTOS)
can be used to handle the
task scheduling.
An RTOS allows multiple
processes to be executed
concurrently, by dividing the
processor’s time into time
slots and allocating the time
slots to the processes that
require services.
28
RTOS to Handle Multiple Task
29
RTOS vs Superloop
30
Bare-metal vs RTOS vs GPOS
31
Technical Overview
34
Instruction Set - Thumb2
High Performance
35
ARM and Thumb Mode Switching
36
Thumb2 No Switching Req.
37
Block Diagram
38
Various Bus Interfaces on Cortex M3
39
AMBA System
High Performance
APB
ARM processor UART
High
Bandwidth AHB Timer
APB
External
Bridge
Memory Keypad
Interface
41
Memory System & Interrupt support
Typically, the microcontroller vendor will need to add the
following items to the memory system:
• Program memory, typically flash
• Data memory, typically SRAM
• Peripherals
The Cortex-M3 and Cortex-M4 processors include an interrupt
controller called the Nested Vectored Interrupt Controller (NVIC).
It is programmable and its registers are memory mapped.
The address location of the NVIC is fixed and the programmer’s
model of the NVIC is consistent across all Cortex-M processors.
NVIC supports a number of system exceptions, including a Non-
Maskable Interrupt (NMI).
42
Features of Cortex M3 - Performance
• The three-stage pipeline allows most instructions, including multiply, to
execute in a single cycle, and at the same time allows high clock
frequencies for microcontroller devices typically over 100 MHz, and up
to approx. 200 MHz in modern semiconductor manufacturing processes.
• Multiple bus interfaces allow simultaneous instruction and data
accesses to be performed.
• The pipelined bus interface allows a higher clock frequency in the
memory system.
• The highly efficient instruction set allows complex operations to be
carried out in a low numbers of instructions.
• Each instruction fetch is 32-bit, and most instructions are 16-bit.
Therefore, up to two instructions can be fetched at a time
43
Code Density
• • Thumb-2 technology allows 16-bit instructions and 32-bit instructions
to work together without any state switching overhead. Most simple
operations can be carried out with a 16-bit instruction.
• Various memory addressing modes for efficient data accesses
• Multiple memory accesses can be carried out in a single instruction
• Support for hardware divide instructions and Multiply-and-Accumulate
(MAC) instructions exist in both Cortex-M3 and Cortex-M4
• Instructions for bit field processing in Cortex-M3/M4
• Single Instruction, multiple data (SIMD) instruction support exists in
Cortex-M4
• Optional single precision floating point instructions are available in
Cortex
44
Low Power
• The Cortex-M processors provide a number of low power
features. These include
• Multiple sleep modes defined in the architecture
• Integrated architectural clock gating support, which allows clock
circuits for parts of the processor to be deactivated when the
section is not in use.
• The processors also have additional optional hardware
support:
• Wakeup Interrupt Controller (WIC) to enable advanced low power
technologies such as State Retention Power Gating (SRPG).
45
Memory System and Endianness
• Optional bit band feature: two bit addressable regions in
SRAM and peripheral regions. Bit value modifications via bit
band alias addresses are converted into
• atomic Read-Modify-Write operations to bit band regions.
• Exclusive accesses for multi-processor system designs. This
is important for semaphore operation in multi-processor
systems.
• Support of little endian or big endian memory systems. The
Cortex-M3/M4 processors can operate in both little endian
or big endian mode.
LD R1, 0x100
ADDi R1,R1,1 //if Processor 2 reads var i here then it can lead to incorrect value
SD R1, 0x100
46
Memory Protection Unit
• If an MPU is included, applications can divide the memory
space into a number of regions and define the access
permissions for each of them.
• When an access rule is violated, a fault exception is
generated and the fault exception handler will be able to
analyze the problem and, if possible, correct it.
47
OS Support and system level features
• They have a built-in system tick timer called SysTick, which can be set
up to generate regular timer interrupts for OS timekeeping.
• The Cortex-M3 processors also have banked stacked pointers:
• for OS kernel and interrupts, the Main Stack Pointer (MSP) is used;
• for application tasks, the Process Stack Pointer (PSP) is used.
• For simple applications without an OS, the MSP can be used all the
time.
• To improve system reliability further, the Cortex-M3 and Cortex-M4
processors support the separation of privileged and non-privileged
operation modes.
48
Instruction and Data Alignment
49
Example of 8 bit Data Alignment
Memory CPU Memory CPU
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 Data 5 5 Data 5
6 6 6 6
7 7 7 7
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
50
STM32F4x Block Diagram
51
52
Architecture
55
CPU Operating Modes – Cont.
Operation modes
• Handler mode: When executing an exception handler such as an Interrupt Service
Routine (ISR). When in handler mode, the processor always has privileged
access level.
• Thread mode: When executing normal application code, the processor can be
either in privileged access level or unprivileged access level. This is controlled by
a special register called “CONTROL.”
56
Inline Assembly Code Usage
• Purpose of using inline assembly is to include assembly
language code in ‘C’
• It will allow a mixture ‘C’ and Assembly code
Example: R0= R0+R1
Assembly Instruction : ADD R0,R0,R1
Read
Inline asm. Statement : __asm volatile(“ADD R0,R0,R1”)
void example1(void) void example2(void)
{ {
LDR R0,[R1] __asm volatile(“LDR R0,[R1]”); __asm volatile(
LDR R1,[R2] __asm volatile(“ LDR R1,[R2]”); “LDR R0,[R1]\n\t”
ADD R0,R0,R1 __asm volatile(“ ADD R0,R0,R1”); “LDR R1,[R2]\n\t”
STR [R3],R0 __asm volatile(“ STR [R3],R0”); “ADD R0,R0,R1\n\t”
} “STR [R3],R0\n\t”
Modify
);
Write
}
57
Inline Assembly syntax
Use the volatile qualifier for
assembler instructions that
have processor side-effects
code is the assembly instruction, for example "ADD R0, R1, R2".
code_template is a template for an assembly instruction, for example "ADD %[result], %[input_i], %[input_j]".
If you specify a code_template rather than code then you must specify the output_operand_list before
specifying the optional input_operand_list and clobbered_register_list.
output_operand_list is a list of output operands, separated by commas. Each operand consists of a symbolic name
in square brackets, a constraint string, and a C expression in parentheses. In this example, there is a single output
operand: [result] "=r" (res). The list can be empty.
For example:
__asm ("ADD R0, %[input_i], %[input_j]" :
/* This is an empty output operand list */
: [input_i] "r" (i), [input_j] "r" (j) );
input_operand_list is an optional list of input operands, separated by commas. Input operands use the same
syntax as output operands. In this example, there are two input operands: [input_i] "r" (i), [input_j] "r" (j). The list
can be empty.
clobbered_register_list is a comma-separated list of strings. Each string is the name of a register that the
assembly code potentially modifies, but for which the final value is not important. To prevent the compiler from
using a register for a template string in an inline assembly string, add the register to the clobber list.
58
Mixed C and Assembly Code
#include <stdio.h>
int add(int i, int j)
{
int res = 0;
__asm volatile("ADD %[result], %[input_i], %[input_j]"
: [result] "=r" (res)
: [input_i] "r" (i), [input_j] "r" (j)
);
return res;
}
int main(void)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a,b);
printf("Result of %d + %d = %d\n", a, b, c);
}
59
Register Set
• R0-R12
• General Purpose registers
• R0-R7 low registers due to limited
space available in IS , many 16 bit
instruction can only access the low
registers.
• R8-R12 high registers can be used by
32-bit instruction
• Initial Value of R0-R12 are undefined.
• R13 Stack Pointer
• R14 Link Register
• R15 Program Counter
60
Stack Pointer –R13
61
Link Register – R14
• This is used for holding the return address when calling a
function or subroutine or ISR.
• At the end of the function or subroutine, the program control can
return to the calling program and resume by loading the value of
LR into the Program Counter (PC).
• When a function or subroutine call is made, the value of LR is
updated automatically.
• If a function needs to call another function or subroutine, it
needs to save the value of LR in the stack first. Otherwise, the
current value in LR will be lost when the function call is made.
62
Link Register Flow diagram
63
Program Counter – R15
• It is readable and writeable:
• a read returns the current instruction address plus 4 (this is
due to the pipeline nature of the design, and compatibility
requirement with the ARM7TDMI processor).
• Writing to PC (e.g., using data transfer/processing
instructions) causes a branch operation.
64
Special Registers
They are needed for
development
of an embedded OS, or when
advanced interrupt masking
features are needed.
Special registers are not memory
mapped, and can be accessed
using special register
access instructions such as MSR
and MRS(Move to ARM register
from system coprocessor
register).
65
Program Status Register
31 16 15 10 7 0
28 27 26 25 24
N Z C V23Q IT T IT/ICI ISR Number
66
APSR
Bits 28 to 31 are
alu condition
code flags.
Q bit – sticky
overflow flag ,
used by
saturating
instructions.
67
Q Flag – Signed & Unsigned Saturation
68
IPSR
69
EPSR
*Note: Since Cortex M3 has only thumb2 mode so bit 24 i.e. T bit
will always be high and you should never attempt to change it.
• The IT (IF-THEN) instruction statement contains the IT instruction opcode with up to an additional
three optional suffixes of “T” (then) and “E” (else), followed by the condition to check against, which is the
same as the condition symbol for conditional branches.
• The “T”/”E” indicates how many subsequence instructions are inside the IT instruction block, and whether
they should or should not be executed if the condition is met.
70
Mask registers
72
CONTROL register
The CONTROL register controls the stack used and the
privilege level for software execution when the processor is in
Thread mode.
73
Stacks
74
Both Thread and Handler using MSP
75
Thread uses PSP & Handler uses MSP
76
Switching Privileged & Unprivileged
77
Memory Map
79
Exceptions and Interrupts
• What are exceptions?
• Exceptions are events that cause changes to program flow. When
one happens, the processor suspends the current executing task
and executes a part of the program called the exception handler.
• After the execution of the exception handler is completed, the
processor then resumes normal program execution.
• In the ARM architecture, interrupts are one type of exception.
Interrupts are usually generated from peripheral or external
inputs, and in some cases they can be triggered by software.
• The exception handlers for interrupts are also referred to as
Interrupt Service Routines (ISR).
• Link
80
Exception sources and Zero Wait State
82
Nested Vectored Interrupt Controller
• Features
• Flexible exception and interrupt management
• Each interrupt (apart from the NMI) can be enabled or disabled and can
have its pending status set or cleared by software.
• Nested exception/interrupt support
• Each exception has a priority level. Some exceptions, such as interrupts,
have programmable priority levels and some others (e.g., NMI) have a fixed
priority level. When an exception occurs, the NVIC will compare the priority
level of this exception to the current level. If the new exception has a
higher priority, the current running task will be suspended.
• Vectored exception/interrupt entry
• The Cortex-M processors automatically locate the starting point of the
exception handler from a vector table in the memory. As a result, the
delays from the start of the exception to the execution of the exception
handlers are reduced.
• Interrupt masking
83
Starting Address of Exception Handler
• To determine the starting address of the exception handler, a vector
table mechanism is used.
• The vector table is an array of word data inside the system memory,
each representing the starting address of one exception type.
• Example :The vector table is located at address 0x0 after reset.
• if the reset is exception type 1, the address of the reset vector is 1
times 4 (each word is 4 bytes), which equals 0x00000004, and the NMI
vector (type 2) is located at (n x 4)2 x 4 = 0x00000008 .
• The address 0x00000000 is used to store the starting value of the MSP.
• The LSB of each exception vector indicates whether the exception is to
be executed in the Thumb state. Since the Cortex-M processors can
support only Thumb instructions, the LSB of all the exception vectors
should be set to 1.
84
Vector Table
LSB of exception vectors
should be set to 1 to indicate
Thumb state
85
Fault Handling
86
System Control Block (SCB)
One part of the processor that is merged into the NVIC unit is the
SCB. The SCB contains various registers for:
• Controlling processor configurations (e.g., low power modes)
• Providing fault status information (fault status registers)
• Vector table relocation (VTOR)
• The SCB is memory-mapped. Similar to the NVIC registers, the
SCB registers are accessible from the System Control Space (SCS).
87
Core Sight Debug and Trace Technology
88
Core Sight features
• Core Sight features can be accessed through a JTAG or Serial Wire interface.
Debugging in JTAG and Serial Wire mode at the same time is not possible. Cortex-M
processor-based devices can include a:
• Debug Interface
• The debug interface offers two modes:
• JTAG Debug is the industry-standard interface that allows device chaining.
• Serial Wire Debug is a 2-pin interface with an optional Serial Wire Trace Output. In
contrast to JTAG, devices cannot be chained.
• The Debug Interface communicates with the following units:
• Run Control: allows the user to start, stop, and single-step through the source code.
• Breakpoint Unit: allows the user to set breakpoints even while the processor is
running.
• Memory Access Unit: allows the user to read or write to memory and peripheral
registers even while the program is running.
89
Debug connections
90
Trace Port Interface Link
The Trace Port Interface encodes and provides trace information via two possible
interfaces:
• The Serial Wire Trace Output pin (SWO) can be used in Serial Wire Debug mode
only.
• The 4-Pin Trace Output has a greater bandwidth than Serial Wire Trace Output and
uses 5 functional pins. It is the only way to output ETM trace data.
The Trace Port Interface communicates with the following units:
• Embedded Trace Macrocell (ETM): can be used for instruction tracing to debug
historical sequences, for software profiling, and code coverage analysis. ETM data
are output through an extra 4-bit interface.
• Instrumentation Trace Macrocell (ITM): provides application information like debug
printf(), RTOS information, unit test, or UML annotation.
• Data Watchpoint & Trace Unit (DWT): provides PC sampling, event counters, timing,
and interrupt execution information. In addition, it allows Access Breakpoints for up
to four memory addresses.
91
Trace Connections
92
Cortex Reset Sequence & Startup
Link
93
Vector Table at Startup
94
Reset Sequence
95
Reset Behavior
96
Status of SP and PC during reset
97
C compiler. Memory map. Program in RAM
Sections: 0x2001FFFF
stack
0
x
2 Vectors (RAM)
0
0
0
0
0
C compiler. Memory map. Program in Flash
Flash memory RAM memory
0x000F FFFF 0x2001 FFFF
stack
free Flash
free RAM
.data Copied
on
.rodata startup
_
e .bss
n
d
.text
.data
0x0000 0x200000
0040 40
Vectors (Flash) Vectors (RAM), unused
0x0000 0x200000
0000 00
C compiler. Memory map. Program in Flash
• The Flasher program (stlinkv2_utility.exe) writes the .text, .rodata and
.data sections into the Flash. This storage is non volatile
• The C runtime startup code (startup_stm32f407vgtx.s file) does some processing before
calling the “main” routine. In particular it:
• Copies the .data section into RAM, because variables have to be stored in a
read/write memory
• Fills the .bss section with zeroes
• The free RAM area can be dynamically allocated by providing suitable “malloc” and
“free” functions
Explicitly stated:
• startup_stm32f407vgtx.s : First code to be executed, written in assembler, including:
• Reset, interrupt and exception vectors
• Basic I/O and system initialization (clocks, UART, etc)
• .data and .bss initialization
• Call to “main”
• Simple I/O library (printf like, provided by author, non standard)
• User source code. Must include the “main” function
C compiler. Command line options
arm-none-eabi-gcc -O2 -g –mcpu=cortex-m4 -mthumb -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -
nostartfiles –static -Wl,-T STM32F407VGTX_RAM.ld -o code.elf -DCRLF startup_stm32f407vgtx.s
main.c
106
Initialization Summary
107
Memory System
109
Memory Map
110
111
Memory Endianness
The Cortex-M3
(byte-invariant
big-endian, BE-8) –
Data on
the AHB Bus
Little Endian –
Data on the AHB
Bus
112
Data alignment and unaligned data access
113
Bit Banding
• Normally Bit Manipulation requires a
READ MODIFY WRITE operations which
is expensive in terms of no. of CPU
cycles taken.
• To overcome this limitation , a technique
called bit banding allows direct bit
manipulations on section of Peripheral
and SRAM memory , without need of
special instructions.
• While performing Bit banding
operations we need to consider two
memory regions
• Bit Band Region [1 MB]
• Bit Band Alias Region [32 MB]
• Bit Banding works by mapping each Bit
in the Bit Band Region to a Word in Alias
Region
114
Bit Access to Bit band region
115
Bit Banding Mapping
116
Cortex-M3 Bit Banding
• Calculate the bit band alias address for given bit band memory address and bit position.
• 5th bit position of the memory location 0x20000200 using its alias address
Formula:
117
Read Modify Write vs Bit Banding
Assembler
sequence to write a
bit with and
without bit-band
118
Cortex-M3 Bit Banding
Writes to a word address in the
bit band alias affect a single bit
in the bit band region
The write is translated to an atomic
read-modify-write by the Cortex-
M3 bus matrix
Bit 0 of the stored register is
written to the appropriate bit
31MB
31MB
119
Advantages of Bit-Band
• Bit-Band vs. Bit-Bang
• In the Cortex-M3, we use the term bit-band to indicate that the feature is a
special memory band (region) that provides bit accesses.
• Bit-bang commonly refers to driving I/O pins under software control to
provide serial communication functions. The bit-band feature in the Cortex-
M3 can be used for bit-banging implementations, but the definitions of these
two terms are different.
• Reading the whole register ,Masking the unwanted bits,Comparing
and branching
You can simplify the operations by BitBanding to:
• Reading the status bit via the bit-band alias (get 0 or 1) ,Comparing
and branching
120
Memory Access attributes
121
Memory Types and Properties
122
Memory Attributes in Relation to
Memory Types
123
Multi-Processor System Sharable Memory
Buffered
write
operation
124
Exclusive Access via MUTEX semaphore
126
Memory system in a Microcontroller
In many microcontroller devices, the designs integrate additional
memory system features such as:
• Boot loader
• Memory remapping
• Memory alias
There are many different reasons why chip designers put a boot loader
into the system. For example, to:
• Provide a flash programming utility, so that you can program the
flash using a simple UART interface, or even program some parts of
the flash memory dynamically.
• Provide Built-In Self Test (BIST) for the chip.
127
Bootloaders and Memory Remapping
• For chips with a boot loader ROM, the boot loader is executed
when the system is started, so it has to be located in address 0
when the system starts at power up.
• However, the next time the system starts, it might not need to
execute the boot loader again and can run the application in the
flash directly, so the memory map needs to be changed.
• In order to do this, the address decoder needs to be
programmable. A hardware register (e.g., a peripheral register in
a system control unit) can be used.
• The operation to switch the memory map is called “Memory
Remap.” This operation is done by the boot loader.
128
129
Memory remap implementation with
boot loader
130
Possibilities of Re-mapping
131
Flash Memory Starting Address after Reset
All ARM Cortex M Based MCUs right after reset does,
1) Load value @ Memory addr. 0x00000000 in to MSP
2) Load value @ Memory addr. 0x00000004 in to PC (Value = Addr of the reset handler)
In STM32 Microcontroller ,
1) MSP value stored at 0x08000000
2) Vector table starts from 0x08000004
3) Address of the reset handler found at 0x08000004
134
Exception Type
135
Exception types
136
137
List of Interrupts
138
Commonly Used CMSIS-Core Fxns
139
Definition of Priority
140
141
Group Priority
• if the priority-level configuration registers are 8-bits wide,
there are only 128 pre-emption levels?
• This is because the 8-bit register is further divided into two
parts: group priority and sub-priority.
• Using a configuration register in the System Control Block
(SCB) called Priority Group
• the priority-level configuration registers for each exception
with programmable priority levels is divided into two halves.
• The group priority level defines whether an interrupt can
take place when the processor is already running another
interrupt handler.
• The sub-priority level is used only when two exceptions with
same group-priority level occur at the same time.
142
Group Priority
143
Vector Table and its Relocation
• When the Cortex-M processor accepts an exception request, the
processor needs to determine the starting address of the
exception handler (or ISR if the exception is an interrupt).
• This information is stored in the vector table in the memory. By
default, the vector table starts at memory address 0.
• The vector table is normally defined in the startup codes provided
by the microcontroller vendors.
• Usually, the starting address (0x00000000) should be boot
memory, and it will usually be either flash memory or ROM
devices.
• The Vector Table Relocation feature provides a programmable
register called the Vector Table Offset Register (VTOR)
144
Vector Table Offset Register
145
Vector Table Relocation for Boot Rom
146
Booting from Pen Drive or SD Card
Link
147
Interrupt Inputs and Pending Behavior
There are various status attributes applicable to each interrupt:
• Each interrupt can either be disabled (default) or enabled
• Each interrupt can either be pending (a request is waiting to be
served) or not pending
• Each interrupt can either be in an active (being served) or
inactive state
An interrupt request can be accepted
by the processor if:
• The pending status is set,
• The interrupt is enabled, and
• The priority of the interrupt is higher than the current level
148
Interrupt pending and activation behavior
149
Register In NVIC for Interrupt Control
150
Exceptions Handling In Detail
152
Exception entrance and stacking
153
Nested Interrupt Stacking
154
Feature Description of NVIC
Note:
Interrupt
Inputs are
Active HIGH
155
NVIC Operations Exception Entry/Exit
156
Interrupt Preemption
157
Tail Chaining
15
8
Pop Preemption
15
9
Late Arrival
160
Exception for System Level Services
• ARM Cortex M Processor Supports two important system level
exceptions
• SVC (SuperVisor Call)
• PendSV (Pendable SerVice)
• Supervisory calls are typically used to request privilege operations or
access to system resource from an Operating System
• SVC Exception is mainly used in OS environment. For E.g. an
Unprivileged user task can trigger SVC exception to get system level
services(e.g. accessing device drivers, peripherals) from kernel of OS
• PendSV is mainly used in an OS environment to carry out context
Switching Operation between two tasks when no other exceptions
are active in the system.
161
SVC
• The SVC Handler executes right after the SVC instruction is executed, unless
there is a higher priority exception arrives at the same time.
• SVC instruction is always used along with a number, which can be used to
identify the request type.
__asm volatile("SVC #0"); // call SVC 0
• The OS will have a lookup table of SVC numbers and call to the corresponding
handler of the request number.
162
PendSV
System Without Pend SV
163
Low Power and System
Control Features
CDAC ACTS, Pune
Power Management
Multiple sleep modes supported
Controlled by NVIC
Sleep Now – Wait for Interrupt/Event instructions
Sleep On Exit – Sleep immediately on return from last ISR
Deep Sleep
Long duration sleep, so PLL can be stopped
Exports additional output signal SLEEPDEEP
165
Clock Gating
• Clock gating is a popular technique
used in many synchronous circuits for
reducing dynamic power dissipation.
167
Various Power Modes
168
Entering Low Power Modes
169
Wakeup Interrupt Controller
170
WIC
171
WIC Entry
172
State Retention Power Gating
• In some designs, advanced power-saving techniques called
State Retention Power Gating (SRPG) can be used to reduce
the leakage current of the chip by a wide margin.
• In SRPG designs, the registers (often called flip-flops in IC
design terminology) have a separate power supply for state
retention elements inside the registers .
• When the system is in Deep Sleep mode, the normal power
supply can be turned off, leaving only the power to the state
retention elements ON.
• The leakage in this type of design is greatly reduced because
the combinational logic, clock buffers, and most parts of the
registers are powered DOWN.
173
SRPG
174
System Timer (SysTick)
175
Cortex System Clock Tree
176
Cortex-M3 Pipeline
Cortex-M3 has 3-stage fetch-decode-execute pipeline
Similar to ARM7
Cortex-M3 does more in each stage to increase
overall performance
Instruction
Fetch
Decode & Multiply & Write
(Prefetch
Register Divide
)
Read
Branch Shift ALU & Branch
Branch forwarding & speculation
177
The Cortex microcontroller software interface
standard (CMSIS)
• CMSIS was developed by ARM to allow microcontroller and software vendors to use a
consistent software infrastructure to develop software solutions for Cortex-M
microcontrollers.
• Many software products for Cortex-M microcontrollers are CMSIS-compliant.
Currently the Cortex-M microcontroller market comprises:
• More than 15 microcontroller vendors shipping Cortex-M microcontroller
products, with some other silicon vendors providing Cortex-M based
FPGA and ASICs
• More than 10 toolchain vendors
• More than 30 embedded operating systems
• Additional Cortex-M middleware software providers for codecs, communication
protocol stacks, etc.
With such a large ecosystem, some form of standardization of the way the software
infrastructure works becomes necessary to ensure software compatibility with
various development tools and between different software solutions.
178
CMSIS
To Increase the interoperability of various software components , ARM worked with various
microcontroller vendors, tools vendors, and software solution providers to develop CMSIS, a software
framework covering most Cortex-M processors and Cortex-M microcontroller products.
180
Standardization in CMSIS-Core
• Standardized access functions to access processor’s features -
These include various functions for interrupt control using NVIC, and functions for
accessing special registers in the processors.
• Standardized functions for system initialization - Most modern feature-
rich microcontroller products require some configuration of clock circuitry and power
management registers before the application starts. In CMSIS-compliant
device-driver libraries, these configuration steps are placed in a function called
“SystemInit().” However, having a standardized function name and a standardized location
where this function can be found makes it much easier for a designer to pick up and start
using a new Cortex-M microcontroller device.
• Standardized software variables for clock speed information - This
might not be obvious, but often our application code does need to know what clock
frequency the system is running at. For example, such information might be needed for
setting up the baud rate divider in a UART, or to initialize the SysTick timer for an
embedded OS. A software variable called “SystemCoreClock” is defined in the CMSIS-
Core.
181
Organization of CMSIS-Core
In a general sense, we can define the CMSIS into multiple layers:
• Core Peripheral Access Layer - Name definitions, address definitions, and helper functions
to access core registers and core peripherals. This is processor specific and is provided by ARM.
• Device Peripheral Access Layer - Name definitions, address definitions of peripheral
registers, as well as system implementations including interrupt assignments, exception vector
definitions, etc. This is device specific (note: multiple devices from the same vendor might use
the same file set).
• Access Functions for Peripherals - The driver code for peripheral accesses. This is vendor
specific and is optional. You can choose to develop your application using the peripheral driver
code provided by the microcontroller vendor, or you can program the peripherals directly if you
prefer.
There is also a proposed additional layer for peripheral accesses:
Middleware Access Layer - This layer does not exist in current version of CMSIS. The idea is
to develop a set of APIs for interfacing common peripherals such as UART, SPI, and Ethernet. If
this layer exists, developers of middleware can develop their applications based on this layer to
allow software to be ported between devices easily.
182
CMSIS-Core structure
183
Using CMSIS in Project
184
Cortex-A8
• ARMv7-A Architecture
• Thumb-2
• Thumb-2EE (Jazelle-RCT)
• TrustZone extensions
• Custom or synthesized design
• MMU
• 64-bit or 128-bit AXI Interface
• L1 caches
• 16 or 32KB each
• Unified L2 cache
• 0-2MB in size
• 8-way set-associative
Optional features
VFPv3 Vector Floating-Point
NEON media processing engine
Super-scalar 13-stage pipeline
Super Scalar Processor
• An approach to equip the
processor with multiple processing
units to handle several instructions
in parallel in each processing stage.
• With this arrangement, several
instructions start execution in the
same clock cycle and the process is
said to use multiple issue.
• Such processors are capable of
achieving an instruction execution
throughput of more than one
instruction per cycle. They are
known as ‘Superscalar Processors’.
186
Cortex-R: Ideal for safety-critical applications
Safety features Lockstep implementation
• Supports Lockstep
• Memory Protection Unit (MPU) Compar
• Error-Correcting Code (ECC) Output +
e
Error
Higher performance Control
CCM
Cycle
• 8-stage processor pipeline Delay Self
Tes
• Dual issue – two instructions can execute in parallel t
188
Cortex-R4 features
Architecture Armv7-R
Eight-stage pipeline with instruction pre-fetch, branch prediction and selected dual-issue execution.
Microarchitecture
Parallel execution paths for load-store, MAC, shift-ALU, divide and floating point.
Optional Tightly-Coupled Memory interfaces are used for highly deterministic or low-latency applications
Tightly-Coupled that may not respond well to caching ( e.g. instruction code for interrupt service routines and data that
Memories requires intense processing). One or two logical TCMs, A and B, can be used for any mix of code and data.
TCM size can be up to 8 MB.
Standard interrupt, IRQ, and non-maskable fast interrupt, FIQ and inputs are provided together with a VIC
interrupt controller vector port. The GIC interrupt controller can also be used if more complex priority-
Interrupt Interface based interrupt handling is required. The processor includes low-latency interrupt technology that allows
long multi-cycle instructions to be interrupted and restarted. Lengthy memory accesses are also deferred
in certain circumstances. Worst-case interrupt response can be as low as 20-cycles using the FIQ alone.
Optional single-bit error correction and double-bit error detection for cache and/or TCM memories with
ECC ECC bits. Single-bit soft errors automatically corrected by the processor. ECC protection possible on all
external interfaces.
A dual-core processor configuration implements a redundant Cortex-R4 CPU in lock step with offset clocks
Dual-core
and comparison logic for fault tolerant/fault detecting dependable systems.
189
Enabling key IoT technologies in mbed!
Internet
Interview Tips
1. Resume is the Index of your Knowledge; you should be able to explain every keyword.
2. C Lang: Complete Ashok Pathak (Advance Test in C), Cracking the IT Interview
3. C++ : Also Cover the C++ Chapter from Cracking the IT Interview
4. Microcontrollers : Study ARM Arch. + Pay more attention to Buses like SPI , I2C, CAN, UART
5. OS : Focus more on USER Space part of OS like Thread, IPC’s, Deadlocks, Semaphores,
Mutex, Critical Section
6. You should know everything that’s mentioned in your CV
7. Use Diagrams while Explaining in Offline Interviews, use Paint in online Interviews(have few
diagrams ready before Interview)
8. Practice Your self by recording Your Introduction and Project Info in your mobile.
9. While Explaining your Introduction Always prefer Sequence as mentioned: Name, Native,
10th ,12th,B.tech,Cdac and end by throwing some keywords of your Project.
10. Use PITCH Manipulation and Keywords to make the interview in your direction , as your
answer frames the next Question.
11. Provide Variety of Coding methods to explain the Answers to the Question.
12. Things also depends on your second account, pay attention to it.
13. Practice Solving Question on Hackerrank, Hackerearth to avoid first day panic for Written
Exam
191
41
Project Discussion-1
192