Personal computer (PC): A computer designed for use by an individual, usually
incorporating a graphics display, a keyboard, and a mouse.
Server: A computer used for running larger programs for multiple users, often
simultaneously, and typically accessed only via a network.
Supercomputer: A class of computers with the highest performance and cost; they are
configured as servers and typically cost tens to hundreds of millions of dollars.
Embedded computer: A computer inside another device used for running one
predetermined application or collection of software.
Personal mobile devices (PMDs): Small wireless devices used to connect to the Internet;
they rely on batteries for power, and software is installed by downloading apps. Conventional
examples are smart phones and tablets.
Cloud Computing: Refers to large collections of servers that provide services over the
Internet; some providers rent dynamically varying numbers of servers as a utility.
Software as a Service (SaaS): Delivers software and data as a service over the Internet,
usually via a thin program such as a browser that runs on local client devices, instead of
binary code that must be installed, and runs wholly on that device. Examples include web
search and social networking.
Multicore Microprocessor: A microprocessor containing multiple processors (“cores”) in a
single integrated circuit.
Acronym: A word constructed by taking the initial letters of a string of words. For example:
RAM is an acronym for Random Access Memory, and CPU is an acronym for Central
Processing Unit.
Eight Great Ideas in Computer Architecture
1. Design for Moore’s Law
It states that integrated circuit resources double every 18–24 months. Therefore,
computer architects must anticipate where the technology will be when the design
finishes rather than design for where it starts.
2. Use Abstraction to Simplify Design
Lower-level details are hidden to offer a simpler model at higher levels.
3. Make the Common Case Fast
Making the common case fast will tend to enhance performance better than
optimizing the rare case.
4. Performance via Parallelism
Computer architects have offered designs that get more performance by performing
operations in parallel.
5. Performance via Pipelining
A particular pattern of parallelism is so prevalent in computer architecture that it
merits its own name: pipelining. The pipeline icon is a sequence of pipes, with each
section representing one stage of the pipeline.
6. Performance via Prediction
In some cases, it can be faster on average to guess and start working rather than
wait until you know for sure, assuming that the mechanism to recover from a
misprediction is not too expensive and the prediction is relatively accurate.
7. Hierarchy of Memories
The fastest, smallest, and most expensive memory per bit is placed at the top of the
hierarchy and the slowest, largest, and cheapest per bit at the bottom.
8. Dependability via Redundancy
Since any physical device can fail, systems are made dependable by including
redundant components that can take over when a failure occurs and to help detect
failures.
Systems software: Software that provides services that are commonly useful, including
operating systems, compilers, loaders, and assemblers.
Operating system: Supervising program that manages the resources of a computer for the
benefit of the programs that run on that computer.
Compiler: A program that translates high-level language statements into assembly language
statements.
Binary Digit: Also called a bit. One of the two numbers in base 2 (0 or 1) that are the
components of information.
Instruction: A command that computer hardware understands and obeys.
Assembler: A program that translates a symbolic version of instructions into the binary
version.
Assembly Language: A symbolic representation of machine instructions.
Machine Language: A binary representation of machine instructions.
High-Level Programming Language: A portable language such as C, C++, Java, or Visual
Basic that is composed of words and algebraic notation that can be translated by a compiler
into assembly language.
Input Device: A mechanism through which the computer is fed information, such as a
keyboard.
Output Device: A mechanism that conveys the result of a computation to a user, such as a
display, or to another computer.
Liquid Crystal Display: A display technology using a thin layer of liquid polymers that can
be used to transmit or block light according to whether a charge is applied.
Active-Matrix Display: A liquid crystal display using a transistor to control the transmission
of light at each individual pixel.
Pixel: The smallest individual picture element. Screens are composed of hundreds of
thousands to millions of pixels, organized in a matrix.
Integrated Circuit: Also called a chip. A device combining dozens to millions of transistors.
Central Processor Unit (CPU): Also called processor. The active part of the computer,
which contains the data-path and control and which adds numbers, tests numbers, signals
I/O devices to activate, and so on.
Datapath: The component of the processor that performs arithmetic operations.
Control: The component of the processor that commands the data-path, memory, and I/O
devices according to the instructions of the program.
Memory: The storage area in which programs are kept when they are running and that
contains the data needed by the running programs.
Dynamic-Random-Access-Memory (DRAM): Memory built as an integrated circuit; it
provides random access to any location.
Cache Memory: A small, fast memory that acts as a buffer for a slower, larger memory.
Static Random Access Memory (SRAM): Memory built as an integrated circuit, but faster
and less dense than DRAM.
Instruction Set Architecture: Also called architecture. An abstract interface between the
hardware and the lowest-level software that encompasses all the information necessary to
write a machine language program that will run correctly, including instructions, registers,
memory access, I/O, and so on.
Application Binary Interface (ABI): The user portion of the instruction set plus the
operating system interfaces used by application programmers. It defines a standard for
binary portability across computers.
Implementation: Hardware that obeys the architecture abstraction.
Volatile Memory: Storage, such as DRAM, that retains data only if it is receiving power.
Non-Volatile Memory: A form of memory that retains data even in the absence of a power
source and that is used to store programs between runs. A DVD disk is non-volatile.
Main Memory: Also called primary memory. Memory used to hold programs while they are
running; typically consists of DRAM in today’s computers.
Secondary Memory: Non-volatile memory used to store programs and data between runs;
typically consists of flash memory in PMDs and magnetic disks in servers.
Magnetic Disk: Also called hard disk. A form of non-volatile secondary memory composed
of rotating platters coated with a magnetic recording material.
Flash Memory: A non-volatile semi-conductor memory. It is cheaper and slower than DRAM
but more expensive per bit and faster than magnetic disks.
Advantages of Networked Computers:
1. Communication: Information is exchanged between computers at high speeds.
2. Resource Sharing: Rather than each computer having its own I/O devices,
computers on the network can share I/O devices.
3. Non-Local Access: By connecting computers over long distances, users need not be
near the computer they are using.
Local Area Network (LAN): A network designed to carry data within a geographically
confined area, typically within a single building.
Wide Area Network (WAN): A network extended over hundreds of kilo-meters that can
span a continent.
Transistor: An on/off switch controlled by an electric signal.
Very large-scale integrated (VLSI) circuit: A device containing hundreds of thousands to
millions of transistors.
Silicon: A natural element (found in sand) that is a semiconductor.
Semiconductor: A substance that does not conduct electricity well.
Silicon Crystal Ingot: A rod composed of a silicon crystal that is between 8 and 12 inches
in diameter and about 12 to 24 inches long.
Wafer: A slice from a silicon ingot no more than 0.1 inches thick, used to create chips.
Defect: A microscopic flaw in a wafer or in patterning steps that can result in the failure of
the die containing that defect.
Die: The individual rectangular sections that are cut from a wafer, more informally known as
chips.
Yield: The percentage of good dies from the total number of dies on the wafer.
Response Time: Also called execution time. The total time required for the computer to
complete a task, including disk accesses, memory accesses, I/O activities, operating system
overhead, CPU execution time, and so on.
Throughput: Also called bandwidth. Another measure of performance, it is the number of
tasks completed per unit time.
CPU Execution Time: Also called CPU time. The actual time the CPU spends computing for
a specific task.
User CPU Time: The CPU time spent in a program itself.
System CPU Time: The CPU time spent in the operating system performing tasks on behalf
of the program.
Clock Cycle: Also called tick, clock tick, clock period, clock, or cycle. The time for one clock
period, usually of the processor clock, which runs at a constant rate.
Clock Period: The length of each clock cycle.
Clock Cycles Per Instruction (CPI): Average number of clock cycles per instruction for a
program or program fragment.
Instruction Count: The number of instructions executed by the program.
Instruction Mix: A measure of the dynamic frequency of instructions across one or many
programs.
Instruction Set: The vocabulary of commands understood by a given architecture.
Stored-Program Concept: The idea that instructions and data of many types can be stored
in memory as numbers, leading to the stored-program computer.
Registers: The operands of arithmetic instructions are restricted; they must be from a limited
number of special locations built directly in hardware called registers. Registers are
primitives used in hardware design that are also visible to the programmer when the
computer is completed.
Word: The natural unit of access in a computer, usually a group of 32 bits; corresponds to
the size of a register in the MIPS architecture.
Data Transfer Instruction: A command that moves data between memory and registers.
Address: A value used to delineate the location of a specific data element within a memory
array.
Load: The data transfer instruction that copies data from memory to a register is traditionally
called load. The actual MIPS name for this instruction is lw, standing for load word.
Alignment Restriction: A requirement that data be aligned in memory on natural
boundaries.
Computers divide into those that use the address of the left most or “big end” byte as the
word address versus those that use the rightmost or “little end” byte. MIPS is in the big-
endian camp.
Store: The instruction complementary to load is traditionally called store; it copies data from
a register to memory. The actual MIPS name is sw, standing for store word.
Spilling Registers: The process of putting less commonly used variables (or those needed
later) into memory.
Registers take less time to access and have higher throughput than memory, making data in
registers both faster to access and simpler to use. Accessing registers also uses less energy
than accessing memory.
Binary Digit: Also called binary bit. One of the two numbers in base 2, 0 or 1, that are the
components of information.
Least Significant Bit: The rightmost bit in a MIPS word.
Most Significant Bit: The left most bit in a MIPS word.
Two’s Complement: Two’s complement gets its name from the rule that the unsigned sum
of an n-bit number and its n-bit negative is 2n; hence, the negation or complement of a
number x is 2n - x, or its “two’s complement.”
One’s Complement: A notation that represents the most negative value by 10 . . . 000two
and the most positive value by 01 . . . 11two, leaving an equal number of negatives and
positives but ending up with two zeros, one positive (00 . . . 00two) and one negative (11 . . .
11two). Th e term is also used to mean the inversion of every bit in a pattern: 0 to 1 and 1 to
0.
Biased Notation: A notation that represents the most negative value by 00 . . . 000two and
the most positive value by 11 . . . 11two, with 0 typically having the value 10 . . . 00two,
thereby biasing the number such that the number plus the bias has a non-negative
representation.
Instruction Format: A form of representation of an instruction composed of fields of binary
numbers.
Machine language: Binary representation used for communication within a computer
system.
(These are for R-type, that is, Register-Type Instructions)
1. Op: Basic operation of the instruction, traditionally called the opcode. It is the field
that denotes the operation and format of an instruction.
2. Rs: The first register source operand.
3. Rt: The second register source operand.
4. Rd: The register destination operand. It gets the result of the operation.
5. Shamt: Shift amount.
6. Funct: This field, often called the function code, selects the specific variant of the
operation in the op field.
(These are for I-type, that is immediate-type instructions)
In a load word instruction, the rt field specifies the destination register, which receives the
result of the load.
AND: A logical bit-by-bit operation with two operands that calculates a 1 only if there is a 1 in
both operands.
OR: A logical bit-by-bit operation with two operands that calculates a 1 if there is a 1 in either
operand.
NOT: A logical bit-by-bit operation with one operand that inverts the bits; that is, it replaces
every 1 with a 0, and every 0 with a 1.
NOR: A logical bit-by-bit operation with two operands that calculates the NOT of the OR of
the two operands. That is, it calculates a 1 only if there is a 0 in both operands.
Conditional Branch: An instruction that requires the comparison of two values and that
allows for a subsequent transfer of control to a new address in the program based on the
outcome of the comparison.
Basic Block: A sequence of instructions without branches (except possibly at the end) and
without branch targets or branch labels (except possibly at the beginning).
Jump address table: Also called jump table. A table of addresses of alternative instruction
sequences.
PC-relative addressing: An addressing regime in which the address is the sum of the
program counter (PC) and a constant in the instruction.
Arithmetic Logic Unit (ALU): Hardware that performs addition, subtraction, and usually
logical operations such as AND and OR.
Exception: Also called interrupt on many computers. An unscheduled event that disrupts
program execution; used to detect overflow.
Interrupt: An exception that comes from outside of the processor (Some architectures use
the term interrupt for all exceptions).
MIPS includes a register called the exception program counter (EPC) to contain the
address of the instruction that caused the exception.
Saturation: When a calculation overflows, the result is set to the largest positive number or
most negative number, rather than a modulo calculation as in two’s complement arithmetic.
Scientific Notation: A notation that renders numbers with a single digit to the left of the
decimal point.
Normalized: A number in floating-point notation that has no leading 0s.
Floating Point: Computer arithmetic that represents numbers in which the binary point is not
fixed.
Fraction: The value, generally between 0 and 1, placed in the fraction field. The fraction is
also called the mantissa.
Exponent: In the numerical representation system of floating-point arithmetic, the value that
is placed in the exponent field.
A designer of a floating-point representation must find a compromise between the size of the
fraction and the size of the exponent, because a fixed word size means you must take a bit
from one to add a bit to the other. This trade-off is between precision and range: increasing
the size of the fraction enhances the precision of the fraction, while increasing the size of the
exponent increases the range of numbers that can be represented.
Overflow (Floating-point): A situation in which a positive exponent becomes too large to fit in
the exponent field.
Underflow (Floating-point): A situation in which a negative exponent becomes too large to fit
in the exponent field.
Single precision: A floating-point value represented in a single 32- bit word.
Double precision: A floating-point value represented in two 32-bit words.
The desirable notation must therefore represent the most negative exponent as 00 … 00two
and the most positive as 11 … 11two. Th is convention is called biased notation, with the
bias being the number subtracted from the normal, unsigned representation to determine the
real value.
Single Precision Bias: 127
Double Precision Bias: 1023
A Basic MIPS Implementation
We will be examining an implementation that includes a subset of the core MIPS instruction
set
1. The memory-reference instructions load word (lw) and store word (sw)
2. The arithmetic-logical instructions add, sub, AND, OR, and slt
3. The instructions branch equal (beq) and jump (j), which we add last.
For every instruction, the first two steps are identical:
1. IF - Send the program counter (PC) to the memory that contains the code and fetch
the instruction from that memory.
2. ID - Read one or two registers, using fields of the instruction to select the registers to
read. For the load word instruction, we need to read only one register, but most other
instructions require reading two registers.
After these two steps, the actions required to complete the instruction depend on the
instruction class.
Combinational Element: An operational element, such as an AND gate or an ALU.
State Element: A memory element, such as a register or a memory.
Clocking Methodology: The approach used to determine when data is valid and stable
relative to the clock.
Edge-Triggered Clocking: A clocking scheme in which all state changes occur on a clock
edge.
Control signal: A signal used for multiplexor selection or for directing the operation of a
functional unit.
Data signal: Contains information that is operated on by a functional unit.
Asserted: The signal is logically high or true.
De-asserted: The signal is logically low or false.
Datapath element: A unit used to operate on or hold data within a processor. In the MIPS
implementation, the data-path elements include the instruction and data memories, the
register file, the ALU, and adders.
Memory unit: To store the instructions of a program and supply instructions given an
address.
Program counter (PC): The register containing the address of the instruction in the program
being executed.
Adder: To increment the PC to the address of the next instruction.
Register file: A state element that consists of a set of registers that can be read and written
by supplying a register number to be accessed.
Sign-extend: A data-path item used to increase the size of a data item by replicating the
high-order sign bit of the original data item in the highorder bits of the larger, destination data
item.
Branch Target Address: The address specified in a branch, which becomes the new
program counter (PC) if the branch is taken. In the MIPS architecture the branch target is
given by the sum of the offset field of the instruction and the address of the instruction
following the branch.
Branch Taken: A branch where the branch condition is satisfied and the program counter
(PC) becomes the branch target. All unconditional jumps are taken branches.
Branch Not Taken or (untaken branch): A branch where the branch condition is false and
the program counter (PC) becomes the address of the instruction that sequentially follows
the branch.
Branch: A type of branch where the instruction immediately following the branch is always
executed, independent of whether the branch condition is true or false.
ALU Control
For branch equal, the ALU must perform a subtraction.
Main Control Unit
Effect of Control Lines (7 here + 2 from ALUop)
Pipelining: An implementation technique in which multiple instructions are overlapped in
execution, much like an assembly line. All steps—called stages in pipelining—are operating
concurrently. As long as we have separate resources for each stage, we can pipeline the
tasks. The reason pipelining is faster for many loads is that everything is working in parallel,
so more loads are finished per hour.
MIPS instructions classically take five steps:
1. IF Fetch instruction from memory.
2. ID Read registers while decoding the instruction. The regular format of MIPS
instructions allows reading and decoding to occur simultaneously.
3. EX Execute the operation or calculate an address.
4. MEM Access an operand in data memory.
5. WB Write the result into a register.
Pipelining improves performance by increasing instruction throughput, as opposed to
decreasing the execution time of an individual instruction, but instruction throughput is the
important metric because real programs execute billions of instructions.
Why MIPS is designed for pipeline execution:
1. All MIPS instructions are the same length. This restriction makes it much easier
to fetch instructions in the first pipeline stage and to decode them in the second
stage.
2. MIPS has only a few instruction formats, with the source register fields being
located in the same place in each instruction.
3. Memory operands only appear in loads or stores in MIPS.
4. Operands must be aligned in memory.
There are situations in pipelining when the next instruction cannot execute in the following
clock cycle. These events are called hazards, and there are three different types.
Structural Hazard: When a planned instruction cannot execute in the proper clock cycle
because the hardware does not support the combination of instructions that are set to
execute.
Data Hazard: Also called a pipeline data hazard. When a planned instruction cannot execute
in the proper clock cycle because data that is needed to execute the instruction is not yet
available.
Forwarding: Also called bypassing. A method of resolving a data hazard by retrieving the
missing data element from internal buffers rather than waiting for it to arrive from
programmer-visible registers or memory.
Load-use Data Hazard: A specific form of data hazard in which the data being loaded by a
load instruction has not yet become available when it is needed by another instruction.
Pipeline stall: Also called bubble. A stall initiated in order to resolve a hazard.
Control Hazard: Also called branch hazard. When the proper instruction cannot execute in
the proper pipeline clock cycle because the instruction that was fetched is not the one that is
needed; that is, the flow of instruction addresses is not what the pipeline expected.
Computers use prediction to handle branches. One simple approach is to predict always that
branches will be untaken. When you’re right, the pipeline proceeds at full speed. Only when
branches are taken does the pipeline stall.
Branch Prediction: A method of resolving a branch hazard that assumes a given outcome
for the branch and proceeds from that assumption rather than waiting to ascertain the actual
outcome.
One popular approach to dynamic prediction of branches is keeping a history for each
branch as taken or untaken, and then using the recent past behaviour to predict the future.
There is a third approach to the control hazard, called delayed decision. Called the delayed
branch in computers, and mentioned above, this is the solution actually used by the MIPS
architecture. The delayed branch always executes the next sequential instruction, with the
branch taking place after that one instruction delay.
Pipelining is a technique that exploits parallelism among the instructions in a sequential
instruction stream.
Latency (pipeline): The number of stages in a pipeline or the number of stages between two
instructions during execution.
Instructions and data move generally from left to right through the five stages as they
complete execution. There are, however, two exceptions to this left -to-right flow of
instructions:
1. The write-back stage, which places the result back into the register file in the middle
of the data-path.
2. The selection of the next value of the PC, choosing between the incremented PC and
the branch address from the MEM stag.
Two pairs of hazard conditions:
Nop: An instruction that does no operation to change state.
Instruction-Level Parallelism: The parallelism among instructions.
Multiple issue: A scheme whereby multiple instructions are launched in one clock cycle.
Static Multiple Issue: An approach to implementing a multiple-issue processor where many
decisions are made by the compiler before execution.
Dynamic Multiple Issue: An approach to implementing a multiple-issue processor where
many decisions are made during execution by the processor.
Issue slots: The positions from which instructions could issue in a given clock cycle; by
analogy, these correspond to positions at the starting blocks for a sprint.
Speculation: An approach whereby the compiler or processor guesses the outcome of an
instruction to remove it as a dependence in executing other instructions.
Issue Packet: The set of instructions that issues together in one clock cycle; the packet may
be determined statically by the compiler or dynamically by the processor.
Very Long Instruction Word (VLIW): A style of instruction set architecture that launches
many operations that are defined to be independent in a single wide instruction, typically with
many separate opcode fields.
Use Latency: Number of clock cycles between a load instruction and an instruction that can
use the result of the load without stalling the pipeline.
Loop Unrolling: A technique to get more performance from loops that access arrays, in
which multiple copies of the loop body are made and instructions from different iterations are
scheduled together.
Register Renaming: The renaming of registers by the compiler or hardware to remove anti-
dependences.
Anti-dependence: Also called name dependence. An ordering forced by the reuse of a
name, typically a register, rather than by a true dependence that carries a value between two
instructions.
Superscalar: An advanced pipelining technique that enables the processor to execute more
than one instruction per clock cycle by selecting them during execution.
Dynamic Pipeline Scheduling: Hardware support for reordering the order of instruction
execution so as to avoid stalls.
Commit Unit: The unit in a dynamic or out-of-order execution pipeline that decides when it
is safe to release the result of an operation to programmer visible registers and memory.
Reservation Station: A buffer within a functional unit that holds the operands and the
operation.
Reorder Buffer: The buffer that holds results in a dynamically scheduled processor until it is
safe to store the results to memory or a register.
Out-Of-Order Execution: A situation in pipelined execution when an instruction blocked
from executing does not cause the following instructions to wait.
In-Order Commit: A commit in which the results of pipelined execution are written to the
programmer visible state in the same order that instructions are fetched.
Principle of Locality: The principle of locality states that programs access a relatively small
portion of their address space at any instant of time.
Temporal Locality: The principle stating that if a data location is referenced then it will tend
to be referenced again soon.
Spatial locality: The locality principle stating that if a data location is referenced, data
locations with nearby addresses will tend to be referenced soon.
Memory Hierarchy: A structure that uses multiple levels of memories; as the distance from
the processor increases, the size of the memories and the access time both increases. The
faster memories are more expensive per bit than the slower memories and thus are smaller.
Block (or Line): The minimum unit of information that can be either present or not present in
a cache.
Hit Rate: The fraction of memory accesses found in a level of the memory hierarchy.
Miss Rate: The fraction of memory accesses not found in a level of the memory hierarchy.
Hit Rate + Miss Rate = 1
Hit Time: The time required to access a level of the memory hierarchy, including the time
needed to determine whether the access is a hit or a miss.
Miss Penalty: The time required to fetch a block into a level of the memory hierarchy from
the lower level, including the time to access the block, transmit it from one level to the other,
insert it in the level that experienced the miss, and then pass the block to the requestor.
Memory Technologies
Track: One of thousands of concentric circles that makes up the surface of a magnetic disk.
Sector: One of the segments that make up a track on a magnetic disk. A sector is the
smallest amount of information that is read or written on a disk.
Cylinder: The term cylinder is used to refer to all the tracks under the heads at a given point
on all surfaces.
Seek: The process of positioning a read / write head over the proper track on a disk.
Seek Time: The time to move the head to the desired track.
Rotational latency: Also called rotational delay. The time required for the desired sector of
a disk to rotate under the read / write head. Usually assumed to be half the rotation time.
Transfer time: The time to transfer a block of bits. The transfer time is a function of the
sector size, the rotation speed, and the recording density of a track.
Cache: A safe place for hiding or storing things.
Direct-mapped cache: A cache structure in which each memory location is mapped to
exactly one location in the cache. Almost all direct-mapped caches use this mapping to find
a block: (Block address) modulo (Number of blocks in the cache).
Tag: A field in a table used for a memory hierarchy that contains the address information
required to identify whether the associated block in the hierarchy corresponds to a requested
word.
Valid Bit: A field in the tables of a memory hierarchy that indicates that the associated block
in the hierarchy contains valid data.
Cache Miss: A request for data from the cache that cannot be filled because the data is not
present in the cache.
Write-Through: A scheme in which writes always update both the cache and the next lower
level of the memory hierarchy, ensuring that data is always consistent between the two.
Write Buffer: A queue that holds data while the data is waiting to be written to memory.
Write-Back: A scheme that handles writes by updating values only to the block in the cache,
then writing the modified block to the lower level of the hierarchy when the block is replaced.
Split Cache: A scheme in which a level of the memory hierarchy is composed of two
independent caches that operate in parallel with each other, with one handling instructions
and one handling data.
Error Detection Code: A code that enables the detection of an error in data, but not the
precise location and, hence, correction of the error.