0% found this document useful (0 votes)
45 views

Instruction Count and Cpi

Uploaded by

thuw1310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Instruction Count and Cpi

Uploaded by

thuw1310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

INSTRUCTION COUNT AND CPI

INSTRUCTION COUNT AND CPI


4.
Instruction Count for a program
 Determined by program, ISA and compiler

Average cycles per instruction


 Determined by CPU hardware
 If different instructions have different CPI
o Average CPI affected by instruction mix
Formula:

Solution: To solve this problem, we need to calculate the CPI (Cycles per
Instruction) for each code sequence and then determine which one is faster.
Given information:
Instruction class A has a CPI of 1.
Instruction class B has a CPI of 3.
Instruction class C has a CPI of 4.
Code sequence 1 requires 2 million instructions of class A, 1 million instructions of
class B, and 2 million instructions of class C.
Code sequence 2 requires 4 million instructions of class A, 3 million instructions of
class B, and 1 million instructions of class C.
a. Calculating the CPI for each code sequence:
Code sequence 1:
CPI = (2 million × 1) + (1 million × 3) + (2 million × 4) / (2 million + 1 million + 2
million)
CPI = 2 + 3 + 8 / 5 = 2.6
Code sequence 2:
CPI = (4 million × 1) + (3 million × 3) + (1 million × 4) / (4 million + 3 million + 1
million)
CPI = 4 + 9 + 4 / 8 = 2.125
b. Determining the faster code sequence:
The CPI of code sequence 2 (2.125) is lower than the CPI of code sequence 1 (2.6),
which means that code sequence 2 is faster.
To calculate the difference in speed:
Speed difference = (CPI of code sequence 1 - CPI of code sequence 2) / CPI of code
sequence 2 × 100%
Speed difference = (2.6 - 2.125) / 2.125 × 100% = 22.35%
Therefore, code sequence 2 is 22.35% faster than code sequence 1.
5.
Theory:
Instruction Count for a program
 Determined by program, ISA and compiler

Average cycles per instruction


 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix

Formula:
Solution:
A. Calculating the total execution time for the program on 1, 2, 4, and 8
processors:

Single processor (p = 1):


Arithmetic instructions per processor = 2.56 × 10^9
Load/store instructions per processor = 1.28 × 10^9
Branch instructions per processor = 256 × 10^6

Total execution time = [(2.56 × 10^9 × 1) + (1.28 × 10^9 × 12) + (256 × 10^6 × 5)] /
(2 × 10^9)
Total execution time = 9.6 seconds
Relative speedup=9.6/9.6=1

Dual processors (p = 2):


Arithmetic instructions per processor = (2.56 × 10^9) / (0.7 × 2) = 1.82 × 10^9
Load/store instructions per processor = (1.28 × 10^9) / (0.7 × 2) = 0.91 × 10^9
Branch instructions per processor = 256 × 10^6

Total execution time = [(1.82 × 10^9 × 1) + (0.91 × 10^9 × 12) + (256 × 10^6 × 5)] /
(2 × 10^9)
Total execution time = 7.02 seconds
Relative speedup=9.6/7.02=1.37

Quad processors (p = 4):


Arithmetic instructions per processor = (2.56 × 10^9) / (0.7 × 4) = 0.91 × 10^9
Load/store instructions per processor = (1.28 × 10^9) / (0.7 × 4) = 0.46 × 10^9
Branch instructions per processor = 256 × 10^6

Total execution time = [(0.91 × 10^9 × 1) + (0.46 × 10^9 × 12) + (256 × 10^6 × 5)] /
(2 × 10^9)
Total execution time = 3.86 seconds
Relative speedup=9.6/3.86=2.49

Octa processors (p = 8):


Arithmetic instructions per processor = (2.56 × 10^9) / (0.7 × 8) = 0.46 × 10^9
Load/store instructions per processor = (1.28 × 10^9) / (0.7 × 8) = 0.23 × 10^9
Branch instructions per processor = 256 × 10^6

Total execution time = [(0.46 × 10^9 × 1) + (0.23 × 10^9 × 12) + (256 × 10^6 × 5)] /
(2 × 10^9)
Total execution time = 2.25 seconds
Relative speedup=9.6/2.25=4.27
B.if the CPI of arithmetic instructions is doubled to 2, the total execution time for
each processor configuration would increase proportionally.
If the CPI of arithmetic instructions is doubled to 2, the total execution time for
each processor configuration would increase proportionally.
New total execution time (single processor) = [(2.56 × 10^9 × 2) + (1.28 × 10^9 ×
12) + (256 × 10^6 × 5)] / (2 × 10^9)
New total execution time (single processor) = 10.88seconds

New total execution time (dual processors) = [(1.82 × 10^9 × 2) + (0.91 × 10^9 ×
12) + (256 × 10^6 × 5)] / (2 × 10^9)
New total execution time (dual processors) = 7.954 seconds

New total execution time (quad processors) = [(0.91 × 10^9 × 2) + (0.46 × 10^9 ×
12) + (256 × 10^6 × 5)] / (2 × 10^9)
New total execution time (quad processors) = 4.297 seconds

New total execution time (octa processors) = [(0.46 × 10^9 × 2) + (0.23 × 10^9 ×
12) + (256 × 10^6 × 5)] / (2 × 10^9)
New total execution time (octa processors) = 2.468 seconds
C .For 4 processors: If the program is parallelized to run the over multiple cores
then the number of instructions for the arithmetic and load store per processor is
divided by the 0.7 multiply by the number of processor p and the branch
instruction remain same. There are four processors: 1,2,4,8. Therefore,
2560000000x1/0.7x4 + 1280000000x12/0.7×4 + 256000000x5= 7720000000
Now calculate the execution time with the help of following method:
7720000000 /2×10*9 =3.86 sec
Reducing CPI of a single processor to match the performance of 4 processors:
Calculate the clock cycle with the help of following method:
clock cycle =2560000000×1+1280000000 x a+256000000×5 =
2560000000+1280000000 x a+1280000000 =3840000000+1280000000 x a
Now calculate the execution time with the help of following method:
execution time for a processor = clock cycle/ clock rate
Therefore:
CPU execution time = 3840000000+1280000000x a/2×109
3.86 = 1.92 + 0.64 x a
a = 3.03
The reduced CPI is calculated as follows:
Reduced CPI = a / Original CPI for load instructions= 3.03/ 12 = 0.25 (or)25%
Thus, the reduced CPI of load/store instructions is 25%
6
Theory:
Propagation of electronic signals:
o The assumption that electronic signals can travel at a constant speed of
300,000 km/s.
o The time it takes for an electronic signal to travel a certain distance is
directly proportional to the distance.
Clock rate and period:
o The relationship between clock rate (frequency) and clock period, where
the period is the inverse of the frequency.
o The requirement that the time for an electronic signal to travel from one
edge of the chip to the other should be less than or equal to the clock
period.
Practical limitations of chip design:
o The feasibility of chip manufacturing and the size constraints based on
current technology.
o The tradeoff between chip size and clock rate, where higher clock rates
require smaller chip dimensions.
Solution:
Limitation on the diameter for a 1 GHz clock rate:
The time it takes for an electronic signal to travel from one edge of the chip to the
other should be less than or equal to the period of the clock (1/clock rate).
Time for an electronic signal to travel from one edge to the other = Diameter /
(300,000 km/s)
Period of the clock = 1 / (1 GHz) = 1 × 10*-9 s
Diameter ≤ (300,000 km/s) × (1 × 10*-9 s) = 0.3 m = 30 cm
Therefore, the maximum diameter of the chip for a 1 GHz clock rate is 30 cm.

Limitation on the diameter for a 1 THz clock rate:


The time it takes for an electronic signal to travel from one edge of the chip to the
other should be less than or equal to the period of the clock (1/clock rate).
Time for an electronic signal to travel from one edge to the other = Diameter /
(300,000 km/s)
Period of the clock = 1 / (1 THz) = 1 × 10*-12 s
Diameter ≤ (300,000 km/s) × (1 × 10*-12 s) = 0.3 mm
Therefore, the maximum diameter of the chip for a 1 THz clock rate is 0.3 mm.

Feasibility:

A chip with a diameter of 30 cm operating at 1 GHz is feasible, as the dimensions


are within practical limits.
A chip with a diameter of 0.3 mm operating at 1 THz is not feasible, as the
dimensions are extremely small and would be challenging to manufacture with
current technology.
In conclusion, the maximum diameter of the chip is limited by the time it takes for
an electronic signal to travel from one edge of the chip to the other, relative to
the clock rate. While a 1 GHz chip with a 30 cm diameter is feasible, a 1 THz chip
with a 0.3 mm diameter is not feasible with current technology.

You might also like