Practical Apache Spark by Subhashini Chellappan & Dharanitharan Ganesan

Practical Apache Spark by Subhashini Chellappan & Dharanitharan Ganesan

Author:Subhashini Chellappan & Dharanitharan Ganesan
Language: eng
Format: epub
ISBN: 9781484236529
Publisher: Apress


6.Run the same SQL query to find the number of unique IP addresses in each location directly on the json file created without creating a DataFrame.

Points to Remember

Spark SQL is the Spark module for processing structured data.

DataFrame is a Dataset organized as named columns, which makes querying easy. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood.

Dataset is a new interface added in Spark SQL that provides all the RDD benefits with the optimized Spark SQL execution engine.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.