Home > Computers & Technology > Databases & Big Data

SQL on Big Data by Sumit Pal

Author:Sumit Pal , Date: February 4, 2018 ,Views: 265

SQL on Big Data by Sumit Pal

Author:Sumit Pal
Language: eng
Format: epub, pdf
Publisher: Apress, Berkeley, CA

Queries can be submitted to Impala through either Impala Shell or JDBC/ODBC drivers. Once a query is submitted, the query planner process turns the query request into a collection of plan fragments, and then the coordinator initiates execution on a remote impalad. Intermediate results are streamed between impalad’s before the query results are streamed back to client.

The coordinator orchestrates interactions between impalad across all the data nodes and also aggregates results produced by each data node. impalad also exposes a remote procedure call (RPC) interface that other impalad can use to connect to exchange data. Also, this interface allows the coordinator to assign work for each impalad.

Following query submission by the client, the usual steps of query validation, syntax and semantic analysis, are done before the query is optimized by the query engine. Every query is first validated syntactically and semantically, to make sure that there are no errors in the user’s query, both from a syntax perspective as well as from a semantic perspective, ensuring that the query makes sense. The metadata for the query exists in the Hive metastore. After this step, query planning occurs, whereby Impala tries to figure out the best way to solve the query to get the results. The EXPLAIN query dumps the output of what is going on within Impala to figure out the way to solve the problem. The EXPLAIN query provides an outline of the steps that impalad will perform and the relevant details on how the workload will be distributed among the nodes.

Optimization involves generating the best physical plan for the query, in terms of cost of execution as well as code generation of the query for faster execution on the hardware. The optimized query is submitted to the coordinator process, which orchestrates the query execution. When a query executes, the coordinator orchestrates interactions between Impala nodes, and once the result is available from each impalad process, it aggregates the results. This coordinator process is part of the impalad process and resides in all the nodes. Any node can act as the query coordinator. It is the coordinator that assigns work units to the impalad processes.

Once the work is assigned to each impalad process by the coordinator, the impalad process works with the storage engine to implement the query operators, for example, constant folding, predicate pushdown, etc., so as to extract only the relevant portions of the data that are really needed to satisfy the query.

The execution engine (executor process) in the impalad daemon executes the optimized query by reading from the data source at high speeds. It leverages all the disks and their controllers to read at an optimized speed and executes query fragments that have been optimized by the code optimizer, which includes Impala LLVM and the code-generation process. Impala executor serves hundreds of plan fragments at any given time.

Impala does nothing special for failover. Because HDFS provides failover using replication, if the impalad daemon is installed on the replicated nodes, the Impala process will seamlessly start using the impalad on the replicated nodes.

Download

SQL on Big Data by Sumit Pal.epub
SQL on Big Data by Sumit Pal.pdf

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design