Moving to the Cloud–Developing Apps in the New World of Cloud Computing by Dinkar Sitaram & Geetha Manjunath
Author:Dinkar Sitaram & Geetha Manjunath
Language: eng
Format: epub
ISBN: 978-1-59749-725-1
Publisher: Syngress
Published: 2012-03-25T16:00:00+00:00
• partition(key2) : Determine a data partition and return the reducer number
The partition function is given the key with the number of reducers and returns the index of the desired reducer.
The combine and partition functions help in optimizing the execution of the parallel algorithm. The combine function reduces the unnecessary communication between the Mapper and Reducer functions by performing local consolidation of co-located data with same keys. The partition function can be used for efficient partitioning of the input data, for subsequent parallel execution. Typically, different records from data sources (could be different files or a set of lines from a given file or rows of a database) are used as the partitioning basis. Other sophisticated techniques such as horizontal partitioning in databases and data sharding, described in the earlier section, can also be implemented within this function. Sharding is most effective in a shared-nothing architecture such as the one in MapReduce and it can also use replication of shared data to achieve good performance.
Ideally, the communication between the Input data and the Mapper task can be minimized if we run the Mapper logic at the data split (without moving the data). However, this depends upon where the input data itself is stored and if it is possible to execute Mapper processes on the same node. For HDFS and Cassandra, it is possible to compute the Mapper task on the storage node itself and the Job Tracker takes the responsibility of co-locating the Mapper with the data split it processes, hence significantly reducing the data movement. On the other hand, pure data stores such as Amazon S3 do not allow execution of Mapper logic at the storage node. When running on Amazon Hadoop, it is necessary to create a Hadoop cluster in EC2, copy the data from S3 to EC2 (which is free), store intermediate results from MapReduce steps in HDFS on EC2, and write the final results back to S3 [37].
In general, the MapReduce APIs are very simple to use and allow specification of the parallelism in the application within the specific design paradigm of distributed merge-sort. The MapReduce platform (Hadoop for example) is expected to take care of automatic parallelization, fault tolerance load balancing, data distribution and network performance when implemented on a large network of clusters – as far as the specific application is concerned.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7782)
Grails in Action by Glen Smith Peter Ledbrook(7696)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6565)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6554)
Running Windows Containers on AWS by Marcio Morales(6083)
Kotlin in Action by Dmitry Jemerov(5066)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(4916)
Combating Crime on the Dark Web by Nearchos Nearchou(4498)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4414)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4337)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4172)
The Age of Surveillance Capitalism by Shoshana Zuboff(3950)
Python for Security and Networking - Third Edition by José Manuel Ortega(3740)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3508)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3407)
Mastering Python for Networking and Security by José Manuel Ortega(3344)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3330)
Blockchain Basics by Daniel Drescher(3294)
Learn Wireshark by Lisa Bock(3261)
