The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners) by Robert P. Colwell

The Pentium Chronicles: The People, Passion, and Politics Behind Intel's Landmark Chips (Practitioners) by Robert P. Colwell

Author:Robert P. Colwell
Language: eng
Format: epub
Published: 2008-09-05T08:44:00+00:00


The Unbreakable Computer

Several of us P6 architects were interested in designing a computer that would never crash. Anyone who has experienced the utter frustration of a particularly inopportune system crash will empathize. As computer systems visions go, this one is killer. No matter what happens, no matter what breaks in the hardware or the software, the machine can slow down, but it can't stop working. That would be nice, wouldn't it?

That hardware designers have no control over the operating system or the applications should have shown us an upper bound on system stability that fell far short of our "unbreakable" vision. But even things we could control stubbornly refused to configure into anything approaching unbreakability. Permit me a Bill Nye moment (you know, the science guy with the great jingle): Electrically speaking, we live in a noisy, hostile universe. Electromagnetic waves of all frequencies and amplitudes are constantly bombarding people and computing equipment. Very energetic charged particles from the Big Bang or cosmic events collide with atoms in the atmosphere to generate streams of high-energy neutrons, some of which end up smashing into the silicon of microprocessors and generating unexpected electrical currents. Temperatures and power supplies fluctuate. Internal electrical currents generate capacitive and inductive sympathetic currents in adjacent wires. The universe really does conspire against us.

The universe really does conspire against us.

On the basis of the statistics we observe from these and other events, we design recovery mechanisms into our microprocessors. If one of these unfortunate events occurs, the machine can detect the anomaly and correct the resulting error before it can propagate and cause erroneous data to enter the computation stream.

Error detection and correction schemes have their dark side, however. They impose an overhead in performance and complexity and a real cost in die size. Worse, although they help make the machine more reliable, they are not foolproof. For example, if an error correcting code is applied across a section of memory, then, typically, a single-bit error will be correctable, but if two bits are defective, our scheme will note that fact but be unable to correct it. And if more than two bits are erroneous, our scheme may not even notice that any of them are wrong.

The most stringent constraint on the ability to design an unbreakable engine, though, is that while the "state space" of a correctly functioning microprocessor is enormous, the possibility space of a malfunctioning machine is many orders of magnitude larger. Basically, unless you are designing an extremely simple machine, you cannot practically anticipate every way in which the machine might fail, which is what you need for a detection and recovery scheme. Moreover, even if you could somehow catalog and sandbag every single-event failure, it still would not be good enough. Failures can and do occur in pairs, or triples, or n-tuples.

Perhaps the day will come when a very different approach to this problem will present some affordable solutions, but today the best we can do is buttress the machine against its clearest threats and test it extensively and as exhaustively as human resources will permit.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.