Engineering and Management of Data Centers by Jorge Marx Gómez Manuel Mora Mahesh S. Raisinghani Wolfgang Nebel & Rory V. O’Connor

Engineering and Management of Data Centers by Jorge Marx Gómez Manuel Mora Mahesh S. Raisinghani Wolfgang Nebel & Rory V. O’Connor

Author:Jorge Marx Gómez, Manuel Mora, Mahesh S. Raisinghani, Wolfgang Nebel & Rory V. O’Connor
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


2 Related Work

In this section, the related work of this chapter is discussed by introducing the availability management process, availability modeling techniques as well as definitions and solution algorithms for the redundancy allocation problem.

2.1 Availability Management and Modeling

The availability management is an important process of the design stage in IT service management (ITSM) and is included in ITSM frameworks such as the IT Infrastructure Library (Hunnebeck 2011). Its objective is to ensure that an IT service meets its availability objectives cost-effectively. In order to increase the availability of a system, four principle approaches can be distinguished: fault forecasting, fault removal, fault prevention, and fault tolerance (Laprie 1995).

Fault forecasting means that a running system is carefully monitored to estimate future faults so that counter-measures can be applied. Fault removal approaches aim at minimizing the time to recover after a fault occurred. Thus, these two approaches can be applied in the operational phase of an IT service (reactive approaches). On the other hand, fault prevention and fault tolerance techniques can be introduced into the design phase (proactive approaches). However, the effectiveness of fault prevention, as an approach aiming at minimizing the fault probability, is limited since faults can never be excluded (Lee and Anderson 1990). As fault tolerance is defined as an approach ensuring availability even in the presence of faults, this is an effective approach for designing high-availability systems. Normally fault-tolerance is achieved by introducing redundancy mechanisms in which spare components are installed to cover the fault of a primary component (Shooman 2002).

Since important decisions are made in the service design stage that are costly to be corrected afterwards (Terlit and Krcmar 2011), availability modeling techniques should be applied in order to estimate the future service availability (Hunnebeck 2011). For this purpose, measurement- or model-based approaches can be distinguished.

In measurement-based or black-box approaches, no knowledge about the inner structure and behavior of a system is required. Data mining and machine learning techniques are applied to model the relation between input (design parameters) and output values (availability, costs), e.g., in Hoffmann et al. (2004), Silic et al. (2014). Although these approaches are very effective, training examples have to be provided that require running instances of comparable systems which may not be available (Immonen and Niemelä 2008).

In this case, additional information about the system internals has to be utilized which leads to model-based or white-box approaches. Depending on the underlying model, these can be further classified into combinatorial, state-space-based, and hierarchical approaches (Trivedi et al. 2008). In combinatorial models, all components can be characterized by an availability value. On the basis of probability theory assuming independent component faults, system availability can be computed fast and easily. An example are reliability block diagrams which model a series-parallel system (Anon 1981). Redundant components of the same function form subsystems (parallel system) all of which are all crucial for system availability (series system). System availability is defined as the probability that at least one component is available in each subsystem.

However, the assumption of independent faults limits the accuracy of these approaches, especially for software systems (Callou et al.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.