Home > Computers & Technology > Databases & Big Data > Data Processing

Getting Started with Impala by Russell John

Author:Russell, John , Date: January 31, 2016 ,Views: 319

Getting Started with Impala by Russell John

Author:Russell, John
Language: eng
Format: epub, pdf
ISBN: 9781491905777
Publisher: O'Reilly Media
Published: 2014-09-24T16:00:00+00:00

Understanding Cluster Topology

As a developer, you might work with a different cluster setup than is actually used in production. Here are some things to watch out for, to understand the performance and scalability implications as your application moves from a dev/test setup into production:

For basic functional testing, you might use a single-node setup, perhaps running inside a virtual machine. You can check SQL compatibility, try out built-in functions, check data type compatibility and experiment with CAST(), see that your custom UDFs work correctly, and so on. (Perhaps with relatively small data volume, just to check correctness.)

To see what happens with distributed queries, you could use a relatively small cluster, such as two or four nodes. This allows you to see some performance and scalability benefits from parallelizing the queries. On a dev/test cluster, the name node is probably on the same host as one of the data nodes, which is not a problem when the cluster is running under a light workload.

For production, you’ll probably have a separate host for the name node, and a substantial number of data nodes. Here, the chances of a node failing are greater. (In this case, rerun any queries that were in flight.) Or one node might experience a performance issue, dragging down the response time of queries. (This type of problem is best detected with monitoring software such as Cloudera Manager.) Also, this is the time to double-check the guideline about installing Impala on all the data nodes in the cluster (to avoid I/O slowdown due to remote reads) and only on the data nodes (to avoid using up memory and CPU unnecessarily on the name node, which has a lot of work to do on a busy cluster).

Download

Getting Started with Impala by Russell John.epub
Getting Started with Impala by Russell John.pdf

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Access	Data Mining
Data Modeling & Design	Data Processing
Data Warehousing	MySQL
Oracle	Other Databases
Relational Databases	SQL

Popular ebooks

The Mikado Method by Ola Ellnestam Daniel Brolund(25283)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(19521)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(7712)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(7707)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(7569)
Driving Data Quality with Data Contracts by Andrew Jones(7368)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(7332)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(7063)
Weapons of Math Destruction by Cathy O'Neil(6370)
Learning SQL by Alan Beaulieu(6345)
Big Data Analysis with Python by Ivan Marin(6004)
Data Engineering with dbt by Roberto Zagni(4973)
Solidity Programming Essentials by Ritesh Modi(4632)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(4444)
Pandas Cookbook by Theodore Petrou(4136)
Blockchain Basics by Daniel Drescher(3639)
Natural Language Processing with Java Cookbook by Richard M. Reese(3203)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(3103)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2981)
Feature Store for Machine Learning by Jayanth Kumar M J(2968)