Cloud Computing and Big Data by Unknown
Author:Unknown
Language: eng
Format: epub
ISBN: 9783030277130
Publisher: Springer International Publishing
2 Big Data and the Imbalanced Classification Problem
In this section, a brief introduction to the most used Big Data frameworks is presented in Sect. 2.1. Furthermore, a quick review about imbalanced classification and a description of its methods for Big Data are depicted in Sect. 2.2.
2.1 Big Data Technologies
Due to Big Data, new technologies appeared in order to cope with it. Among them, in 2003 and developed by Google, the most significant was born: MapReduce [3]. This framework was design based on a “divide-and-conquer” scheme in order to process Big Data on a cluster using parallel and distributed implementations. MapReduce model presents two stages called Map and Reduce. The former receives data and performs operations in order to transform them. The latter process the results of the previous phase to summarize them. This model works with key-value pairs. In order to process them in parallel, all the pairs of the same key are distributed to the same node.
The most popular open-source frameworks based on MapReduce model programming are Apache Hadoop [8] and Apache Spark [9, 10]. The main difference between them is that Hadoop performs an intensive disk usage, and Spark an intensive memory usage. This generates that Spark outperforms Hadoop. Also Spark provides integration with many libraries such as MLlib [11] (the Machine Learning library), Spark Streaming [12] (to work with streams of data), among others. These are some of the reasons which make Spark the current widespread Big Data framework.
In Sect. 1 two design methods related to the use of data and models distribution were depicted: the local and the global [4]. Depending on which model is applied, the results of the developed algorithm will be approximated or exact.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7810)
Grails in Action by Glen Smith Peter Ledbrook(7719)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6840)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6839)
Running Windows Containers on AWS by Marcio Morales(6367)
Kotlin in Action by Dmitry Jemerov(5092)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(5070)
Combating Crime on the Dark Web by Nearchos Nearchou(4648)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4616)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4437)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4335)
The Age of Surveillance Capitalism by Shoshana Zuboff(3979)
Python for Security and Networking - Third Edition by José Manuel Ortega(3895)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3555)
Learn Wireshark by Lisa Bock(3531)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3528)
Mastering Python for Networking and Security by José Manuel Ortega(3376)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3356)
Blockchain Basics by Daniel Drescher(3325)
