OpenStack for Architects by Ben Silverman

OpenStack for Architects by Ben Silverman

Author:Ben Silverman
Language: eng
Format: epub
Tags: COM091000 - COMPUTERS / Cloud Computing, COM088000 - COMPUTERS / System Administration / General, COM011000 - COMPUTERS / Systems Architecture / General
Publisher: Packt Publishing
Published: 2018-05-31T11:23:04+00:00


The future of OpenStack troubleshooting and Artificial Intelligence-driven operations

As systems and workloads become increasingly abstracted, the velocity, frequency, and variety of data continues to multiply at exponential rates. At one time, many years ago, it was sufficient for administrators to simply log into servers that were unresponsive and comb through a handful of log files in order to determine root cause analysis (RCA).

Today, for example, in OpenStack, there are more than 15 different log files created by OpenStack control plane servers, as well as multiple unique logs in each of the compute servers. All of these logs, combined with logs from the operating systems, routers, switches, load balancers, WAN compressors equals a mountain of data to search in order to find a true incident RCA. The voracity, velocity and volume of data to search through manually decreases an administrator's ability to find RCA and solve issues. This, plus the number of new servers added to enterprises daily are contributing to a climbing Mean Time To Recovery (MTTR). Today, 3 hours is the average time it takes IT professionals to repair a single problem. That is simply unacceptable when we want our latest purchases delivered to our house via drone in 30 minutes.

One of the recent technological use cases around cloud computing has been machine learning (ML) and Artificial Intelligence (AI). The ability to harness capacity on demand and configure large clusters of systems to power massive parallel processing of data is quickly becoming a reality. Some companies are taking this concept of AI one step further and using operational data to enable what is being called Artificial Intelligence Operations (AIOps). These platforms are using correlation and resolution data to help administrators arrive at an RCA quicker by continually scanning log files in real time. As logs stream into the platform they need to recognize where there may be a problem by correlating across multiple log files, correlating the log sections by applying complex rules across multiple devices, and even multiple environments.

One such company, Loom Systems (https://www.loomsystems.com/), has created Loom Cloud Intelligence (LUCI), which provides a solution to the log sprawl problem by applying an AI to monitor all platforms and systems in a single plane of glass. LUCI, shown in the screenshot below, has the ability to see and correlate all IT sources in real time and push alerts via legacy alerting or using ChatOps. LUCI also allows administrators to drill down on alerts and correlated issues and creates stories of incidents based on empirical data retrieved from numerous log files.

LUCI also suggests possible RCA and remediations based on AI-driven problem determination. Loom Systems supports many different platforms including OpenStack, VMware, AWS, Microsoft Azure, and Google Cloud. Loom has a SaaS solution as well as an on-premise solution for those who cannot send log data offsite:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.