Cloud Computing and Big Data by Unknown

Cloud Computing and Big Data by Unknown

Author:Unknown
Language: eng
Format: epub
ISBN: 9783030277130
Publisher: Springer International Publishing


2 Big Data and the Imbalanced Classification Problem

In this section, a brief introduction to the most used Big Data frameworks is presented in Sect. 2.1. Furthermore, a quick review about imbalanced classification and a description of its methods for Big Data are depicted in Sect. 2.2.

2.1 Big Data Technologies

Due to Big Data, new technologies appeared in order to cope with it. Among them, in 2003 and developed by Google, the most significant was born: MapReduce [3]. This framework was design based on a “divide-and-conquer” scheme in order to process Big Data on a cluster using parallel and distributed implementations. MapReduce model presents two stages called Map and Reduce. The former receives data and performs operations in order to transform them. The latter process the results of the previous phase to summarize them. This model works with key-value pairs. In order to process them in parallel, all the pairs of the same key are distributed to the same node.

The most popular open-source frameworks based on MapReduce model programming are Apache Hadoop [8] and Apache Spark [9, 10]. The main difference between them is that Hadoop performs an intensive disk usage, and Spark an intensive memory usage. This generates that Spark outperforms Hadoop. Also Spark provides integration with many libraries such as MLlib [11] (the Machine Learning library), Spark Streaming [12] (to work with streams of data), among others. These are some of the reasons which make Spark the current widespread Big Data framework.

In Sect. 1 two design methods related to the use of data and models distribution were depicted: the local and the global [4]. Depending on which model is applied, the results of the developed algorithm will be approximated or exact.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.