Lean DevOps: A Practical Guide to On Demand Service Delivery by Robert Benefield

Lean DevOps: A Practical Guide to On Demand Service Delivery by Robert Benefield

Author:Robert Benefield [Robert Benefield]
Language: eng
Format: epub, pdf
Publisher: Addison-Wesley Professional
Published: 2022-08-01T16:00:00+00:00


Anyone who has spent a number of years in their career has likely been caught out by a cascade of seemingly little problems that snowballed into a disaster. Some, such as Facebook/Meta’s BGP mishap in October 2021,1 the 365 Main datacenter power failure in 2007,2 or the Azure Leap Year outage in February 2012,3 were all big enough to shake thousands of affected customers out of their catastrophically passive reliance on poorly understood dependencies. However, you do not have to be a large service provider to be lulled into such traps. I have personally witnessed hundreds of lesser-known but no less destructive failure cascades, many of which I had been repeatedly assured beforehand would never happen. By encouraging staff to understand the service ecosystem and the outcomes it is attempting to deliver, you can greatly reduce the potential outage length and the damage it can cause.

1. More details about the October 4 outage, Meta: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/

2. 365 Main’s credibility outage, Gawker: https://www.gawker.com/282257/365-mains-credibility-outage

3. Summary of Windows Azure Service Disruption on February 29, 2012, Microsoft: https://azure.microsoft.com/en-us/blog/summary-of-windows-azure-service-disruption-on-feb-29th-2012/

There are a number of good ways to track supportability. One place to start is to measure how quickly someone can come up to speed on installing and supporting a component or service. I have done this in the past by instituting an onboarding exercise for new staff where they were given the task to install various key service components. This includes finding out how to install various components, then performing the installations and documenting what they experienced. This is followed by a debrief with the new starter so that they can suggest what should be fixed, and describe what value they believe their proposed fixes would bring.

At one company this exercise began by placing a new server fresh out of the box on the person’s desk. They then had to figure out not only how to install the service, but also how to get the server in a rack and networked, get the right operating system on it, and determine all the other configuration and support details. This was an impressively effective way of improving installation and configuration of services. Everyone who went through it could recall all the pain they went through, and were motivated to find ways to continually simplify it.

It is easy to run similar exercises when services are delivered in the cloud, including guiding code through a delivery pipeline or trying to determine how to spin up new service instances. I know several companies do this by having new hires perform pushes to production as part of the onboarding process in order to help them become familiar with the delivery and push process, as well as to gain insight into ways they can be improved.

There is also value in putting in place similar processes that measure how quickly current team members can come fully up to speed on any parts of the ecosystem they have not previously worked on. This is especially beneficial in larger organizations that have a plethora of service components scattered across many areas.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.