Site Reliability Engineering by Betsy Beyer Chris Jones Jennifer Petoff & Niall Richard Murphy

Site Reliability Engineering by Betsy Beyer Chris Jones Jennifer Petoff & Niall Richard Murphy

Author:Betsy Beyer, Chris Jones, Jennifer Petoff & Niall Richard Murphy
Language: eng
Format: epub
Publisher: O'Reilly Media, Inc.
Published: 2016-04-05T16:00:00+00:00


1

For example, see Doorman, which provides a cooperative distributed client-side throttling system.

Chapter 22. Addressing Cascading Failures

Written by Mike Ulrich

If at first you don’t succeed, back off exponentially.

Dan Sandler, Google Software Engineer

Why do people always forget that you need to add a little jitter?

Ade Oshineye, Google Developer Advocate

A cascading failure is a failure that grows over time as a result of positive feedback.1 It can occur when a portion of an overall system fails, increasing the probability that other portions of the system fail. For example, a single replica for a service can fail due to overload, increasing load on remaining replicas and increasing their probability of failing, causing a domino effect that takes down all the replicas for a service.

We’ll use the Shakespeare search service discussed in “Shakespeare: A Sample Service” as an example throughout this chapter. Its production configuration might look something like Figure 22-1.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.