Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana

Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana

Author:Rajanarayanan Thottuvaikkatumana
Language: eng
Format: epub
Publisher: Packt Publishing


Figure 13

In the preceding section, Spark DataFrames were created to get the datasets for the number of action movies and drama movies released over the period of the last 10 years. The data was collected into Python collection objects and line graphs were drawn in the same figure.

Python, in conjunction with the matplotlib library, is very rich in terms of methods to produce publication-quality charts and plots. Spark can be used as the workhorse for processing the data coming from heterogeneous sources of data, and the results can also be saved to a wide variety of data formats.

Those who are exposed to the Python data analysis library pandas will find it easy to understand the material covered in this chapter because Spark DataFrames designed from the ground up by taking inspiration from the R DataFrame as well as pandas.

This chapter has covered only a few sample charts and plots that can be created using the matplotlib library. The main idea of this chapter was to help the reader understand the capability of using this library in conjunction with Spark, where Spark is doing the data processing, and matplotlib is doing the charting and plotting.

The data file used in this chapter is read from a local filesystem. Instead of this, it can be read from HDFS or any other Spark-supported data source.

When using Spark as the primary framework for data processing, the most important point to keep in mind is that any possible data processing is to be done by Spark, mainly because Spark can do data processing in the best way. Only the processed data is to be returned to the Spark driver program for doing the charting and plotting.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(73919)
What's Done in Darkness by Kayla Perrin(26960)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(20859)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(20615)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(20538)
The Fifty Shades Trilogy & Grey by E L James(19461)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19381)
Shot Through the Heart by Mercy Celeste(19242)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(17388)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17356)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(17188)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(17099)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16713)
The Subtle Art of Not Giving a F*ck by Mark Manson(14831)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14443)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(14219)
Cozy crochet hats: 7 Stylish and Beginner-Friendly Patterns from Baby Beanies to Trendy Bucket Hats by Vanilla Lazy(13504)
Scorched Earth by Nick Kyme(13096)
Reichel W. Numerical methods for Electrical Engineering, Meteorology,...2022 by Unknown(12980)
Drei Generationen auf dem Jakobsweg by Stein Pia(11259)