0% found this document useful (0 votes)
57 views

CO Slides Unit1 Part1

This document outlines a course on ARM processors. Module 1 introduces basic computer architecture concepts like the Harvard and Von Neumann architectures. It also covers ARM processor fundamentals like registers and pipelines. Module 2 discusses the ARM9 instruction set, including data processing, branch, load-store, and other instructions. Module 3 covers arithmetic units and multiplication/division algorithms. Module 4 discusses processor design topics. Module 5 covers input/output and memory system organization. The document lists textbooks and provides differences between computer organization and architecture. It also outlines the computer organization laboratory.

Uploaded by

Vinod Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

CO Slides Unit1 Part1

This document outlines a course on ARM processors. Module 1 introduces basic computer architecture concepts like the Harvard and Von Neumann architectures. It also covers ARM processor fundamentals like registers and pipelines. Module 2 discusses the ARM9 instruction set, including data processing, branch, load-store, and other instructions. Module 3 covers arithmetic units and multiplication/division algorithms. Module 4 discusses processor design topics. Module 5 covers input/output and memory system organization. The document lists textbooks and provides differences between computer organization and architecture. It also outlines the computer organization laboratory.

Uploaded by

Vinod Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Outline of the Course

Module1
Introduction
• Basic structure
• Operational and operational concepts of computers
• Performance considerations
• Different architectures-Harvard Architecture, Von
Neumann Architecture, RISC and CISC

ARM Processor Fundamentals:


• Registers, Current program status register, Pipeline,
Exceptions, Interrupts and vector table
Outline of the Course
Module2

-Introduction to ARM9 Instruction Set:

• Data Processing Instructions


• Branch Instructions
• Load- store Instructions
• Software Interrupt Instruction
• Program status register Instructions Loading constants &
conditional execution
• Introduction to Thumb Instruction set
• Programming examples
Outline of the Course
Module3

-Arithmetic Unit

• Booth’s algorithm for multiplication


• High speed multipliers, division
• Arithmetic operations on floating point numbers (IEEE std)
Outline of the Course

Module4

-Processor Design

• Design of a basic processor


• Units in a Instruction execution
• Functional units and their interconnection
• Hardware for generating internal control signals
• Microprogramming approach
Outline of the Course
Module5
-Input/Output Organization
• Accessing I/O devices
• Program controlled I/O
• Interrupts and supporting software and hardware
• DMA
• Design of interface circuits

-Memory System
• Organization of memory
• Hierarchical memory system
• Cache memory and its operation
• Cache memory mapping
Text and Reference Books
Book Type Code Author and Title Publication Specification
Edition Publication Year
Text Book 1 Computer organization by 5th McGraw Hill 2011
Carl Hamacher, Z Vranesic and
Zaky
Text Book 2 ARM System Developer’s - Morgan 2004
Guide – Designing and Kaufmann
optimizing system software by
Andrew N SLOSS, Dominic SYMES,
Chris Wright

Reference 3 Computer Architecture and 2nd Tata McGraw- 1988


Book Organisation by J.P. Hayes Hill
Reference 4 Computer Organization and 4th Morgan 2010
Book Design by David A, Patterson and Kaufmann
John L Hennessy
Computer Organization Laboratory
• Design and implement applications on ARM based controllers.
• Write applications in embedded C and Introduction to
Assembly and Python in Embedded System.
• Interface peripherals with standard buses like UART and SPI.
• Understand embedded system's hardware components and
software tool chain.
• Design an embedded system, debug and test it.
Differences
• Computer Organization- refers to the operational units
and their interconnections that realize the architectural
specifications.

• Computer Architecture- refers to the overall design or


structure of a computer system, including the hardware
and the software required to run it, especially the internal
structure of the microprocessor.
Unit 1: Introduction
Computer Types
• Computer is a fast electronic calculating machine which is
used for storing and processing data, typically in binary
form, according to instructions (in the form of program)
given to it through a program

• The basic functional units of computer: Electronics circuit


and it works with electrical signal
Computer Types
There are many types of computers that differ in size, cost,
computational power and its indented use. They are:
• Desktops computers or Personal computers
• Portable notebook computers
• Work stations
• Mainframes
• Super computers

Portable notebook computers:


• A portable notebook computer is a compact version of a
personal computer with all components of a PC packaged into
a single unit which is handy and portable.
• Laptop is an example for this type of computer.
Computer Types
Work stations:
• A work station is a high computational powered personal
computer having high resolution graphics and improved input-
output capabilities.
• It often finds its use in engineering applications and interactive
design work.
Mainframes:
• A Main frame computer is a large data processing system used
in medium and large sized business units.
• It is implemented using two or more central processing units
and designed to operate at very high speeds for large volumes
of data.
• A Mainframe is also known as an Enterprise system.
Computer Types
Super Computers:
• A super computer is a high-performance computing device
meant for large scale numerical calculations required in
applications such as weather forecasting, aircraft design and
simulation.
Functional Units
• A computer in its simplest form consists of five components.
They are: input, output, memory, arithmetic and logic unit,
and control unit as shown in the following Figure.

Figure 1: Basic Functional units of a computer


Functional Units
Input Unit
• Coded information is accepted through input units by
computers
Ex: Keyboard, mouse, joysticks, Microphones
Functional Units
Input Unit

How information can be provided to computers?


• Machine Language - a computer programming language
consisting of binary or hexadecimal instructions which a
computer can respond to directly

• Assembly Language - a low-level symbolic code converted by


an assembler

• High Level Language - is a programming language with strong


abstraction from the details of the computer
Functional Units
Memory Unit

• Storage unit which is used to store programs and data

What are programs?


• Set of instructions arranged in some order to perform a task

What are instructions?


• Commands/set of rules to perform the basic operation
• Has two fields-opcode and operands
Ex: ADD R0, R1,R2
Functional Units
Memory Unit
• 2 classes of storage- primary, secondary

Primary Storage
• Fast and operates at electronic speeds

• Contains large number of semiconductor storage cells, each


capable of storing one bit of information

• Accessing a word in the memory is carried out using distinct


address associated with each word location
Functional Units
Memory Unit

• The number of bits in each word is often referred to as the


word length of the computer which is usually in the range 16-
64 bits
• Capacity of the memory characterizes the size of a computer
• When a memory is accessed, usually one word of data is
read/written
Functional Units
Memory Unit

• RAM: Random access memory - any location can be reached in


a short and fixed amount of time (fast) after specifying its
address
• Time required to access one word is called access time (10ŋs-
100ŋs)
• The small, fast, RAM units are called as caches which are
tightly coupled with the processor and often contained on the
same IC to achieve high performance.
• The largest and slowest unit is referred to as main memory.
Secondary Memory
• To store large essential information which are not frequently
accessed
Ex: Magnetic disks, tapes, CD-ROMs (optical disks)
Functional Units
Memory Unit
Functional Units
Arithmetic and logic Unit

• Most computer operations are executed in ALU of the


processor
Ex: Addition of 2 numbers
• Operands are brought into the processor they are stored in
high speed storage elements called registers
• Each register can store one word of data.
• Access time to registers is faster than access time to the fastest
cache unit. (As seen in memory hierarchy design)
Functional Units
Output Unit

• To send processed results to the outside world


Ex: Printer

• Some units such as graphic displays provide both an output


function and an input function. Because of this dual role such
units are named as I/O unit.
Functional Units
Control Unit

• I/O transfers, are controlled by the instructions of I/O


programs that identify the devices involved and the
information to be transferred

• Actual timing signals that govern the transfers are generated


by control circuits

• Timing signals determine when a given action to take place

• Data transfer between processor and the memory are also


controlled by control unit through timing signals
Functional Units
Summary of Basic Functional Units
The operations performed by a computer using the functional
units can be summarized as follows:
• It accepts information (program and data) through input unit
and transfers it to the memory
• Information stored in the memory is fetched, under program
control, into an arithmetic and logic unit (ALU) for processing
• Processed information leaves the computer through an output
unit
• The control unit controls all activities taking place inside a
computer.
Basic Operational Concepts
Interconnection between Processor and Memory
Basic Operational Concepts
• Activity in a computer is governed by instructions

• List of instructions (in the form of data) to perform a task is


stored in memory

• Individual instructions are brought one after the other from


memory into the processor which executes the operations
specified.
Basic Operational Concepts
Steps involved in execution of an instruction

• Instruction is fetched from memory into processor

• Example: ADD LOCA, R0

• Operand from LOCA is added with the contents of register R0

• Sum is finally updated into destination register R0

LOCA + R0  R0
Basic Operational Concepts
Alternate method
How addition performed when no direct access to memory?
• Ex: Load LOCA, R1
Add R1, R0
• R0 and R1 previous contents are overwritten
Interconnection between Processor and Memory
Basic Operational Concepts
Registers
• Instruction Register (IR) - holds the instruction that is currently
being executed (e.g., ADD LOCA, R0)

• Program Counter (PC) - holds the address of the next


instruction to be executed

• General Purpose Register (GPR) - R0.. Rn-1 used to hold the


information required to process in an instruction
Basic Operational Concepts
Registers
• Temporary registers - Internal to processor and not accessible
by user

• Memory Address Register (MAR) - holds the address of the


location to be accessed (e.g., ADD LOCA, R0)

• Memory Data Register (MDR) - contains the data to be written


into or read out of the addressed location
Basic Operational Concepts
Execution of program
Instruction Execution
• When PC is set to first instruction, execution begins

• [PC]-> MAR and read control signal is sent to the memory

• After elapse of access time, addressed word is loaded into


MDR

• [MDR]-> IR

• Now instruction is ready for decoding and executing


Basic Operational Concepts
Execution of program

• During execution of instruction, PC is incremented to next


instruction

Note: Computer can also accept data from input devices and
sends data to output devices

• It can be noted that normal execution of programs can be


preempted if some device needs urgent servicing.

• E.g., A monitoring device in a computer controlled process


industry may detect a dangerous condition.
Basic Operational Concepts
Execution of program
ALU operation
• If an instruction involves ALU operation, operands are
obtained either from memory/registers by initiating Read cycle

• If data residing in memory, then data to be read into MDR and


then to ALU

• ALU operation is carried out after all operands are obtained


and result sent to MDR

• The address of the location where the result is to be stored is


sent to MAR, and Write cycle is initiated
Bus Structure
• To form an operational system, all functional parts must be
connected in some organized way

• To achieve reasonable speed of operation, all parts of


computer need to be organized such that they can handle
one full word of data at a given time

• When word is transferred between units all bits are


transferred simultaneously (parallel) over many wires/lines
and one bit per line
Bus Structure
• Group of lines that acts as connecting path for several
devices is called as bus

• This bus can be for data, address or control purposes.

Single Bus Transfer

• Simple way of connecting functional units

• Only 2 units can actively use bus at any given time as bus
supports one transfer at a time
Bus Structure
Single Bus Transfer

• Bus arbitration (with the help of control lines) is used to


handle multiple requests for use of bus

• Simple, Low cost, flexibility to attach several peripheral devices


Bus Structure
Single Bus Transfer
Bus Structure
Multiple Bus Structure
• Multiple bus organization is primarily used in industrial
systems

• In this structure, various devices that have different transfer


rates can be connected

• Maximum throughput is maintained, allows more number of


devices to be connected to the computer
Bus Structure
Multiple Bus Structure
Bus Structure
Communication issues

• Mismatch in speeds among devices connected to computer


Ex: Slower devices - Keyboard, Printer (I/O)
Faster devices - Processor, Memory

• Efficient transfer mechanism is required to satisfy needs of


speed mismatch

• Common approach - Inclusion of Buffer register


Bus Structure
Solution for Speed mismatch
Ex: Speed of printer is slow compared to that of processor
• Processor sends a character over bus to printer buffer

• Processor and bus is relieved to do some useful job

• Till the action is complete, printer is blocked

• This prevents high speed devices getting blocked by low


speed peripherals

• Processor switches between devices/activities and thus able to


manage several devices over single bus
Software
• Computer software is a collection of data or computer
instructions (programs) that tell the computer how to work

• It helps the user to enter and run application programs

• Two forms
- System Software
- Application Software
Software
System Software
• Collection of programs that are executed as needed to
perform functions like

- Boot loading and initialization


- Receiving and interpreting user commands
- Enter and edit application programs
- Running standard programs along with data supplied by user
- Controlling I/O units
Software
System Software
- Translating source code to object files consisting of machine
instructions

- Linking and running user-written application programs with


existing standard library routines

- System Software is responsible for the coordination of all


activities in a computing system
Software
Application Programs/Software
• Written in HLL like C, C++, Java etc.

• Programs written are independent of computer used to


execute the program

• A system software called a compiler translates the HLL


program into a suitable machine language program.

• Text editor: It is an important system program used by all


programmers.

• It is used for entering and editing application programs.


Software
Operating System
• An Operating System (OS) is an interface between a computer
user and computer hardware.
• It is a large program or collection of routines.
• It is used to control the sharing of various computer units and
also to control the interaction among these units.
Software
Operating System
• It is a software which performs all the basic tasks like
-file management
-memory management
-process management
-handling input and output
-controlling peripheral devices such as disk drives and
printers
Software
Steps involved in execution of application program

User Program and OS routine sharing Processor


Software
Steps involved in execution of application program
• Assuming machine language program stored in disk, this has to
be brought onto memory
• After transfer is complete, execution of program starts
• Consider a case where execution of program involves reading
data from disk, processing and printing the results
• When data is required, program requests OS to transfer file
from disk to memory
• Once OS completes this, control gets transferred back to
application program
Software
Steps involved in execution of application program
• Application program proceeds with computation to get results
• When program has results, it again requests OS
• OS routine is executed to print the results
Note: Disk and processor are idle most of the time
• When printer is busy printing, OS can load the next program to
be executed into memory
• This concurrency helps efficient use of available resources and
increases throughput
• This pattern of concurrent execution is called as
multiprogramming or multitasking
Performance
• How quickly it can execute programs
• Speed depends on hardware and its instructions
• Compilers also affect performance as programs are mostly
written in HLL
• Performance is computed based on time periods, when
processor is active
• Elapsed time depends on all units in a computer system,
processor time depends on the hardware involved in the
execution of individual machine instructions (e.g., in the figure
the time taken to execute the program is (t5 – t0), which is
elapsed time)
Performance
Processor cache

• Hardware - processor, memory


• Inclusion of cache as part of processor
• Prefetching into cache improves throughput
• Program execution will be faster if there is less main memory
access
Ex: In loop, few set of instructions repeat
Performance
Processor clock

• Processor circuits are controlled by a timing signal called clock


• It defines regular time intervals called clock cycles
• Machine instruction - processor divides action into sequence of
basic steps and each step can be completed in 1 clock cycle
• P - length of 1 clock cycle (P affects the processor performance)
• R = 1/P, the clock rate measured in cps (frequency in Hz) (few
hundred million to billion cycles/sec) (e.g., if clock rate is 500
million cps is nothing but 500 MHz and what will be the
corresponding clock period?)
Performance
Basic Performance Equation
• T - processor time required to execute HLL program
• Compiler generates N m/c language instructions (i.e. N is the
actual number of instruction executions)
• N need not be equal to no. of instructions in object program
(because of loop, unused instructions)
• S - average no. of basic steps needed to execute 1 m/c
instruction
• If each basic step takes 1 clock cycle and clock rate is R
cycles/sec, then the program execution time (T) is given by:
T= (NxS)/R
which is called as the basic performance equation
Performance
Basic Performance Equation
• T is of concern of user (T should be less)
• Computer designer works to reduce N, S and increase R
• If R is high, less time to complete 1 step
• N, S, R are interdependent
• Changing one has impact on others
Performance
Pipelining and Superscalar Operation

• Overlapping the execution of successive instructions –


pipelining. E.g., Add R1, R2, R3
• If all instructions are overlapped to maximum degree possible,
then execution proceeds at the rate of one instruction
completed in one clock cycle
• Higher degree of concurrency is possible with multiple
instruction pipelines ( i.e. multiple functional units)
• Execution of several instructions (parallel) in one clock cycle is
called as Superscalar execution
Note: Parallel execution must preserve logical correctness of
programs
Performance
Clock rate
• Role of clock rate in improving the performance
• 2 ways of improving
- Improving IC technology, reduces time for each step(affects all
aspects of processor’s operation equally except access to main
memory)
- Reducing the processing done in one basic step reduces P
- In the presence of a cache the percentage of accesses to main
memory is small. The value of T will be reduced by the same
factor as R is increased because S and N are not affected.
Performance
Compiler
• An optimizing compiler takes advantage of several features of
processor to reduce the total number of clock cycles (i.e. NxS)
needed to execute a program.
• It can be noted that the number of cycles is dependent not only
on the number of instructions but also the order in which they
appear in the program.
• Arranges program instructions to achieve better performance
but should not affect result of computation
• High quality compilers are built with much interaction between
the designers to achieve best results.
Performance
Performance Measurement
• Computing the value of T is not simple, clock speed & various
architectural features are not reliable indicators.
• computer engineers use benchmark programs

• Standardized programs are used for making comparisons


possible

• System Performance Evaluation Corporation(SPEC) selects and


publishes representative application programs for different
application domains, along with test results for many
commercially available computers
Performance
Performance Measurement
• The programs selected range from game playing, compiler and
database applications to numerically intensive programs.
• Program is compiled for the computer under test & running
time on real computer is measured
The same program is also compiled and run on a reference
computer.
Running time on the ref computer
SPEC rating = -----------------------------------------------------
Running time on the computer under test
A SPEC rating of 50 means that the computer under test is 50
times faster than the reference computer for the particular
benchmark.
Performance
Basic Performance Equation
• T - processor time required to execute HLL program
• Compiler generates N m/c language instructions (i.e. N is the
actual number of instruction executions)
• N need not be equal to no. of instructions in object program
(because of loop, unused instructions)
• S - average no. of basic steps needed to execute 1 m/c
instruction
• If each basic step takes 1 clock cycle and clock rate is R
cycles/sec, then the program execution time (T) is given by:
T= (NxS)/R
which is called as the basic performance equation
Performance
Basic Performance Equation
• T is of concern of user (T should be less)
• Computer designer works to reduce N, S and increase R
• If R is high, less time to complete 1 step
• N, S, R are interdependent
• Changing one has impact on others
Performance
Clock rate
Role of clock rate in improving the performance
• Two possibilities of increasing the clock rate.
- Improving IC technology, reduces time for each step (affects all
aspects of processor’s operation equally except access to main
memory)
- Reducing the amount of processing done in one basic step
reduces P
• In the presence of a cache the percentage of accesses to main
memory is small.
• The value of T will be reduced by the same factor as R is
increased because S and N are not affected.
Performance
Compiler
Role of compiler to reduce N
• An optimizing compiler takes advantage of several features of
processor to reduce the total number of clock cycles (i.e. NxS)
needed to execute a program.
• It can be noted that the number of cycles is dependent not only
on the number of instructions but also the order in which they
appear in the program.
• The compiler rearranges program instructions to achieve better
performance but should not affect result of computation
• High quality compilers are built with much interaction between
the designers to achieve best results.
Performance
Performance Measurement
• Computing the value of T is not simple.
• Clock speed & various architectural features are not reliable
indicators.
• Computer engineers use benchmark programs
• Standardized programs are used for making comparisons
possible
• System Performance Evaluation Corporation(SPEC) selects and
publishes representative application programs for different
application domains, along with test results for many
commercially available computers
Performance
Performance Measurement
• The programs selected range from game playing, compiler and
database applications to numerically intensive programs.
• Program is compiled for the computer under test & running
time on real computer is measured
• The same program is also compiled and run on a reference
computer.
Running time on the ref computer
SPEC rating = -----------------------------------------------------
Running time on the computer under test
• A SPEC rating of 50 means that the computer under test is 50
times faster than the reference computer for the particular
benchmark.
Instruction Set: CISC and RISC
• Simple instructions require a small number of basic steps to
execute but a large number of instructions needed to perform a
given task, which leads to large value of N and small value of S.
(Now think about complex instructions and the processor)
• A key consideration in comparing the two choices is the use of
pipelining.
Comparing CISC and RISC
• Example: Multiplying Two Numbers in Memory
• To find the product of two numbers - one stored in location 2:3
and another stored in location 5:2 - and then store the product
back in the location 2:3.
Instruction Set: CISC and RISC
The CISC Approach:
• The entire task of multiplying two numbers can be completed
with one instruction:
MULT 2:3, 5:2
• MULT is what is known as a "complex instruction."
• It operates directly on the computer's memory banks and does
not require the programmer to explicitly call any loading or
storing functions.
• One of the primary advantages of this system is that the
compiler has to do very little work to translate a high-level
language statement into assembly.
• Because the length of the code is relatively short, very little
RAM is required to store instructions.
Instruction Set: CISC and RISC
The RISC Approach
• RISC processors only use simple instructions that can be
executed within one clock cycle.
• To perform MULT equivalent, a programmer would need to
code four lines of assembly:
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
• This may seem like a much less efficient way of completing the
operation.
• Because there are more lines of code, more RAM is needed to
store the assembly level instructions.
Instruction Set: CISC and RISC
• However, the RISC strategy also brings some very important
advantages.
• Because each instruction requires only one clock cycle to
execute, the entire program will execute in approximately the
same amount of time as the multi-cycle "MULT" command.
• These RISC "reduced instructions" require less transistors of
hardware space than the complex instructions, leaving more
room for general purpose registers.
• Because all of the instructions execute in a uniform amount of
time (i.e. one clock), pipelining is possible.
Instruction Set: CISC and RISC

CISC RISC
• Complex instruction set • Reduced instruction set
computer computer
• It is prominent on • It is prominent on
hardware software
• Multiple instruction • Instructions of same size
sizes and formats with few formats
Instruction Set: CISC and RISC

CISC RISC
• Less registers • More registers
• More addressing modes • Less addressing modes
• Instructions take varying • Instructions take one
amount of cycle time cycle time
• Pipelining is difficult • Pipelining is efficient
Instruction Set: CISC and RISC

LDR R1, 05h


CISC Add R0, 05h RISC ADD R0,R1
• Many instructions can • Only Load and Store can
reference memory reference memory
• Large number of • Less number of
complex instructions instructions
• Uses microprogrammed • Uses hardwired control
control unit unit
Ex: Pentium, SHARC(DSP) Ex: ARM7, ARM9
References.
Acknowledgements:
To prepare these slides the following text book is referred. I
would like to thank the authors of the textbook and Publishers
also.

• Computer organization by Carl Hamacher, Z Vranesic and Zaky,


5th Edition, McGraw Hill Publishers.

You might also like