Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture by jon stokes

Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture by jon stokes

Author:jon stokes [user]
Language: eng
Format: epub
Tags: Computers, Systems Architecture, General, Microprocessors
ISBN: 9781593271046
Google: Q1zSIarI8xoC
Amazon: B004OEJO0A
Publisher: No Starch Press
Published: 2007-11-15T03:45:30.222000+00:00


RS

RS

RS

RS

VPU-1

VSIU-1

VCIU-1 VFPU-1

FPU-1

IU1a-1

IU2-1

LSU-1

VPU-2

Finish

VCIU-2 VFPU-2

FPU-2

Finish

LSU-2

VCIU-3 VFPU-3

FPU-3

Vector

Load-

VCIU-4 VFPU-4

FPU-4

Permute

Store

Unit

FPU-5

Integer Unit

Unit

Finish

Vector ALU

FPU

Memory Access

Vector Arithmetic Logic Units

Scalar Arithmetic Logic Units

Units

Back End

Completion

Queue

Write

Commit Unit

Figure 7-4: The basic microarchitecture of the G4e

144

Chapter 7

Before instructions can enter the G4e’s pipeline, they have to be avail-

able in its 32KB instruction cache. This instruction cache, together with the

32KB data cache, makes up the G4e’s 64KB L1 cache. An instruction leaves

the L1 and goes down through the various front-end stages until it hits the

back end, at which point it’s executed by one of the G4e’s eight execution

units (not counting the branch execution unit, which we’ll talk about in a

second).

As I’ve already noted, the G4e breaks down the G4’s classic, four-stage

pipeline into seven, shorter stages:

G4

G4e

1

Fetch

1

Fetch-1

2

Fetch-2

2

Decode/dispatch

3

Decode/dispatch

4

Issue

3

Execute

5

Execute

6

Complete

4

Write-back

7

Write-back (Commit)

Notice that the G4e dedicates one pipeline stage each to the character-

istic issue and complete phases that bracket the out-of-order execution phase

of a dynamically scheduled instruction’s lifecycle.

Let’s take a quick look at the basic pipeline stages of the G4e, because

this will highlight some of the ways in which the G4e differs from the original

G4. Also, an understanding of the G4e’s more classic RISC pipeline will

provide you with a good foundation for our upcoming discussion of the

Pentium 4’s much longer, more peculiar pipeline.

Stages 1 and 2: Instruction Fetch

These two stages are both dedicated primarily to grabbing an instruction

from the L1 cache. Like its predecessor, the G4, the G4e can fetch up to four

instructions per clock cycle from the L1 cache and send them on to the next

stage. Hopefully, the needed instructions are in the L1 cache. If they aren’t

in the L1 cache, the G4e has to hit the much slower L2 cache to find them,

which can add up to nine cycles of delay into the instruction pipeline.

Stage 3: Decode/Dispatch

Once an instruction has been fetched, it goes into the G4e’s 12-entry instruc-

tion queue to be decoded. Once instructions are decoded, they’re dispatched

at a rate of up to three non-branch instructions per cycle to the proper issue queue.

Note that the G4e’s dispatch logic dispatches instructions to the issue

queues in accordance with “The Four Rules of Instruction Dispatch” on

page 127. The only modification to the rules is in the issue buffer rule; instead Intel’s Pentium 4 vs. Motorola’s G4e: Approaches and Design Philosophies

145

of requiring that the proper execution unit and reservation station be

available before an instruction can be dispatched, the G4e requires that

there be space in one of the three issue queues.

Stage 4: Issue

The issue stage is the place where the G4e differs the most from the G4.

Specifically, the presence of the G4e’s three issue queues endows it with

power and flexibility that the G4 lacks.

As you learned in Chapter 6, instructions can stall in the original G4’s

dispatch stage if there is no execution unit available to take them. The G4e

eliminates this potential dispatch stall condition by placing a set of buffers,

called issue queues, in between the dispatch stage and the reservation

stations. On the G4e, it doesn’t matter if the execution units are busy and

their reservation stations are full; an instruction can still dispatch to the

back end if there is space in the proper issue queue.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.