Learning Real-time Processing with Spark Streaming by Gupta Sumit
Author:Gupta, Sumit
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2015-09-27T16:00:00+00:00
We are done with our setup. Now the logs are being generated in real time and we need to consume them and perform further analysis.
Let us move to the next section where we will discuss the transformation operations provided by Spark Streaming.
Functional operations
Discretized streams or DStream are nothing but a series of RDD which are represented by org.apache.spark.streaming.dstream.DStream.scala and org.apache.spark.streaming.dstream.PairDStreamFunctions.scala. It defines various higher-order functions like map and reduce that accepts and apply a given function to each element of the RDD and produces a new RDD. As per Wikipedia, higher-order functions are those functions that either accept a function as input or output a function. It is a common concept, defined in functional programming languages like Clojure, Lisp, Erlang, Haskell and now in Java-8 too.
The following are the different higher-order functions and operations provided by DStream:
flatMap(flatMapFunc): DStream[U]—Similar to map(…) but, before returning, it flattens the results and then returns the final result set.
forEachRDD(forEachFunc): Applies the given function to all RDDs in a given stream. It is a special type of function and it is worth noting that the given function is applied to all RDDs on the driver node itself but there could be actions defined in the RDD and all those actions are performed over the cluster. forEach is categorized as an output operator and, by default, output operations are executed one-at-a-time, sequentially in the order they are defined in the application.
filter(filterFunc): DStream[T]—Applies the provided function to all the elements of RDD and generates the RDD only for those elements which return TRUE.
map(mapFunc): DStream[U]—Applies a given function mapFunc to all elements of RDD and generates a new RDD.
mapPartitions(mapPartFunc, preservePartitioning): Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDD in the invoking DStream. Applying mapPartitions() to an RDD applies the given function to each partition of the RDD.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Coding Theory | Localization |
Logic | Object-Oriented Design |
Performance Optimization | Quality Control |
Reengineering | Robohelp |
Software Development | Software Reuse |
Structured Design | Testing |
Tools | UML |
Deep Learning with Python by François Chollet(12526)
Hello! Python by Anthony Briggs(9871)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9761)
The Mikado Method by Ola Ellnestam Daniel Brolund(9752)
Dependency Injection in .NET by Mark Seemann(9297)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8262)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7745)
Grails in Action by Glen Smith Peter Ledbrook(7671)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7521)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(6759)
Microservices with Go by Alexander Shuiskov(6526)
Practical Design Patterns for Java Developers by Miroslav Wengner(6422)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6401)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6383)
Angular Projects - Third Edition by Aristeidis Bampakos(5785)
The Art of Crafting User Stories by The Art of Crafting User Stories(5313)
NetSuite for Consultants - Second Edition by Peter Ries(5254)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5075)
Kotlin in Action by Dmitry Jemerov(5023)
