Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem by Douglas Eadline

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem by Douglas Eadline

Author:Douglas Eadline
Language: eng
Format: epub
Publisher: Addison-Wesley Professional
Published: 2016-07-14T16:00:00+00:00


7. Essential Hadoop Tools

In This Chapter:

The Pig scripting tool is introduced as a way to quickly examine data both locally and on a Hadoop cluster.

The Hive SQL-like query tool is explained using two examples.

The Sqoop RDBMS tool is used to import and export data from MySQL to/from HDFS.

The Flume streaming data transport utility is configured to capture weblog data into HDFS.

The Oozie workflow manager is used to run basic and complex Hadoop workflows.

The distributed HBase database is used to store and access data on a Hadoop cluster.

The Hadoop ecosystem offers many tools to help with data input, high-level processing, workflow management, and creation of huge databases. Each tool is managed as a separate Apache Software foundation project, but is designed to operate with the core Hadoop services including HDFS, YARN, and MapReduce. Background on each tool is provided in this chapter, along with a start to finish example.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.