Essential PySpark for Scalable Data Analytics by Sreeram Nudurupati

Essential PySpark for Scalable Data Analytics by Sreeram Nudurupati

Author:Sreeram Nudurupati [Sreeram Nudurupati]
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2021-10-29T00:00:00+00:00


Summary

In this chapter, you learned about the concept of ML and the different types of ML algorithms. You also learned about some of the real-world applications of ML to help businesses minimize losses and maximize revenues and accelerate their time to market. You were introduced to the necessity of scalable ML and two different techniques for scaling out ML algorithms. Apache Spark's native ML Library, MLlib, was introduced, along with its major components.

Finally, you learned a few techniques to perform data wrangling to clean, manipulate, and transform data to make it more suitable for the data science process. In the following chapter, you will learn about the send phase of the ML process, called feature extraction and feature engineering, where you will learn to apply various scalable algorithms to transform individual data fields to make them even more suitable for data science applications.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.