Computer Architecture

SMU CSE 5381/7381

Fall 1998


Chapter 4


Instruction-Level Parallelism (4.1)

Introduction


Instruction-Level Parallelism (cont)

Pipeline Scheduling

Loop Unrolling


Instruction-Level Parallelism (cont)

Dependencies

Data Dependence


Instruction-Level Parallelism (cont)

Handling Data Dependencies

Name Dependencies


Instruction-Level Parallelism (cont)

Control Dependence

Dealing With Control Dependencies


Instruction-Level Parallelism (cont)

Loop-Level Parallelism


Dynamic Scheduling (4.2)

Introduction


Dynamic Scheduling (cont)

Scoreboard

Execution Steps


Dynamic Scheduling (cont)

Execution Steps (cont)

Comments


Dynamic Scheduling (cont)

Scoreboard Structure

Limitations to Performance Increase


Dynamic Scheduling (cont)

Tomasulo Algorithm

Basic Structure


Dynamic Scheduling (cont)

Steps of Execution

Comparison to Scoreboard


Dynamic Scheduling (cont)

Data Structures


Dynamic Hardware Prediction (4.3)

Branch Prediction

Branch Prediction Buffer


Dynamic Hardware Prediction (cont)

Correlating Predictors

Branch-Target Buffer


Dynamic Hardware Prediction (cont)

Variations


Multiple Issue (4.4)

Introduction

Superscalar


Multiple Issue (cont)

Static Superscalar DLX

Dynamic Superscalar DLX


Multiple Issue (cont)

VLIW

Limitations


Compiler Support for ILP (4.5)

Dependence Analysis

Software Pipeline


Compiler Support for ILP (cont)

Trace Scheduling


Hardware Support for Parallelism

Strategies

Conditional Instructions

Limitations


Hardware Support for Parallelism (cont)

Compiler Speculation

Hardware-Software Cooperation


Hardware Support for Parallelism (cont)

Poison Bits

Renaming


Hardware Support for Parallelism (cont)

Hardware-Based Speculation

Advantages


Hardware Support for Parallelism (cont)

Example with Tomasulo

Reorder Buffer


Hardware Support for Parallelism (cont)

Four Steps of Execution

Comments


Studies of ILP (4.7)

Introduction

Model


Studies of ILP (cont)

Perfect World

Window Size


Studies of ILP (cont)

Branch and Jump Prediction

Finite Registers


Studies of ILP (cont)

Alias Analysis

Reality


PowerPC 620 (4.8)

Introduction

Compare to Hypothetical


PowerPC 620 (cont)

Pipeline


PowerPC 620 (cont)

What is a Stall

Fetch Performance


PowerPC 620 (cont)

Issue Performance


PowerPC 620 (cont)

Execute Performance


PowerPC 620 (cont)

Commit Performance

Summary


Fallacies and Pitfalls

Fallacies

Pitfalls


Review: Chapter 3

  1. Given the basic DLX pipeline, what was the ideal CPI? % 1
  2. Once the pipeline is full, what is the maximum number of memory reads per cycle? memory writes? % 2/1
  3. Once the pipeline is full, what is the maximum number of register reads per cycle? register writes? % 2/1
  4. What is the purpose of the pipeline registers?
  5. Give an example of some information that needs to be propagated through the pipeline.
  6. T or F: All stages are used by every type of instruction. % F
  7. Name the three types of hazards. %
  8. What is a stall?
  9. What are the three types of data hazards?
  10. Describe forwarding and give an example of a case where it does not prevent a stall.
  11. Describe the ``software'' solution to control hazards.
  12. Name some techniques for filling a branch-delay slot.
  13. What are precise exceptions? Why are they important?
  14. For the multicycle DLX discussed, what types of data hazards are possible?
  15. Name a problem with out-of-order completion.

Note: These notes are based on the text
Computer Architecture: A Quantitative Approach, Second Edition
J. Hennessy and D. Patterson
Morgan Kaufman, 1990, 1996
ISBN 1-55860-329-8
and are copyright © 1998 Matthew Diaz, Southern Methodist University. These notes may be freely copied and distributed so long as this notice is included.