Big Data Computing by Vivek Kale
Author:Vivek Kale [Kale, Vivek]
Language: eng
Format: epub
Published: 2016-10-31T13:20:28+00:00
234
Big Data Computing
of a reliable distributed computing approach that would scale to the demand of the vast
amount of website data that the tool would be collecting. A year later, Google published
papers on the GFS and MapReduce, an algorithm and distributed programming plat-
form for processing large data sets; they utilized large numbers of commodity servers
and built GFS and MapReduce in a way that assumed hardware failures would be com-
monplace and were simply something that the software needed to deal with.
Hadoop was modeled after two papers produced by Google, one of the many
companies to have these kinds of data-intensive processing problems. The
first, presented in 2003, describes a pragmatic, scalable, distributed file sys-
tem optimized for storing enormous data sets called the Google Filesystem,
or GFS. In addition to simple storage, GFS was built to support large-scale,
data-intensive, distributed processing applications. The following year, another
paper, titled “Map-Reduce: Simplified Data Processing on Large Clusters,” was pre-
sented, defining a programming model and accompanying framework that provided
automatic parallelization, fault tolerance, and the scale to process hundreds of tera-
bytes of data in a single job over thousands of machines. When paired, these two
systems could be used to build large data processing clusters on relatively inexpen-
sive commodity machines. These papers directly inspired the development of
Hadoop Distributed File System and Hadoop MapReduce, respectively.
10.3.1 Apache Hadoop
In 2006, after struggling with the same “big data” challenges related to indexing massive amounts of information for its search engine, and after watching the progress of the Nutch project, Yahoo! hired Doug Cutting and decided to adopt Hadoop as its distributed framework for solving its search engine challenges. Yahoo! spun out the storage and processing parts of Nutch to form Hadoop as an open source Apache project, and the Nutch web
crawler remained its own separate project. Shortly thereafter, Yahoo! began rolling out
Hadoop as a means to power analytics for various production applications. The platform
was so effective that Yahoo! merged its search and advertising into one unit to better leverage Hadoop technology.
In the past 10 years, Hadoop has evolved from its search engine-related origins to one
of the most popular general-purpose computing platforms for solving big data challenges.
It is rapidly becoming the foundation for the next generation of data-based applications.
It is predicted that Hadoop will be driving a big data market that should hit more than $23
billion by 2016. Since the launch of the first Hadoop-centered company, Cloudera, in 2008, dozens of Hadoop-based start-ups have attracted hundreds of millions of dollars in venture capital investment. Simply put, organizations have found that Hadoop offers a proven approach to big data analytics.
Apache Hadoop has revolutionized data management and processing. Hadoop’s techni-
cal capabilities have made it possible for organizations across a range of industries to solve problems that were previously impractical. These capabilities include the following:
1. Scalable processing of massive amounts of data on commodity hardware
2.
Flexibility for data processing, regardless of the format and structure (or lack of
structure) of the data
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
The Brazilian Economy since the Great Financial Crisis of 20072008 by Philip Arestis Carolina Troncoso Baltar & Daniela Magalhães Prates(310373)
International Integration of the Brazilian Economy by Elias C. Grivoyannis(111319)
The Art of Coaching by Elena Aguilar(53422)
Flexible Working by Dale Gemma;(23320)
How to Stop Living Paycheck to Paycheck by Avery Breyer(19779)
Thinking, Fast and Slow by Kahneman Daniel(12421)
The Acquirer's Multiple: How the Billionaire Contrarians of Deep Value Beat the Market by Tobias Carlisle(12379)
The Radium Girls by Kate Moore(12094)
The Art of Thinking Clearly by Rolf Dobelli(10598)
Hit Refresh by Satya Nadella(9188)
The Compound Effect by Darren Hardy(9056)
Tools of Titans by Timothy Ferriss(8493)
Atomic Habits: Tiny Changes, Remarkable Results by James Clear(8408)
Turbulence by E. J. Noyes(8113)
A Court of Wings and Ruin by Sarah J. Maas(7947)
Change Your Questions, Change Your Life by Marilee Adams(7847)
Nudge - Improving Decisions about Health, Wealth, and Happiness by Thaler Sunstein(7758)
How to Be a Bawse: A Guide to Conquering Life by Lilly Singh(7547)
Win Bigly by Scott Adams(7263)