Performance Analysis for Core 2 and K8: Part 1

Pages: 1 2 3 4 5 6 7 8 9

Branch Prediction

One of the first and most important elements of a modern microprocessor pipeline are the branch predictors. Branch prediction is used to predict the direction and target of a branch so that the processor can speculatively execute instructions after the branch is encountered but before the outcome of the branch is known. This ensures that there are enough instructions in flight so that the functional units are kept busy. Mispredicted branches seriously reduce performance; typically when a branch is mispredicted, the entire pipeline and all work in flight is squashed, wasting both performance and power.

The branch predictors are a major focus area for all of Intel’s microarchitectures and many improvements have been added over time. AMD has adopted some of these refinements; for instance adding indirect branch predictor arrays in Barcelona and later processors. However, the K8 has older and less accurate branch predictors than the Core 2. Figure 5 below shows the percentage of instructions that are branches for the K8 and Core 2.


Figure 5 – Branches per Instruction Retired

Both the Core 2 and K8 tend to encounter a branch every 5-7 instructions. Since each processor has an instruction window of ~100 instructions, this means that on average, there will be as many as 20 predicted branches in-flight at once. If any one of those branches is mispredicted, then any subsequent instructions would be flushed from the pipeline, causing a huge performance hit.


Figure 6 – Mispredicted Branches per Instruction Retired

Microarchitects and engineers measure branch predictor accuracy and quality by Mispredicts Per thousand Instructions (MPKI) – in this case, lower is better. As Figure 6 shows, Intel’s MPKI stays under 7 for all the workloads we measured, while the K8 branch predictors can be up to 2X worse. Intel’s branch predictors are far more accurate, always beating out the K8 branch predictors by 2-4 MPKI across the board. This is unsurprising since Intel has heavily invested in branch prediction with almost every iteration of their microarchitecture. These measurements merely confirm confirms the strength of Intel’s predictors and the results of investing significant resources in this area.


Figure 7 – Branch Predictor Accuracy

Another way to evaluate branch prediction accuracy that is more intuitive, but less useful to architects is the percentage of branches that are correctly predicted (remember that architects prefer to normalize to instructions retired). Figure 7 above shows the accuracy in this manner – note that for this chart, higher is better. The branch prediction accuracy is above 94% for all but one benchmark – a testament to the design of branch predictors at both AMD and Intel. This chart also rephrases Intel’s advantage into more familiar terms – their predictors are any where from 1-4% more accurate with an average difference of 2.7%.

Pages: « Prev   1 2 3 4 5 6 7 8 9   Next »

Discuss (57 comments)