Spark for Python Developers by 2015

Spark for Python Developers by 2015

Author:2015
Language: eng
Format: mobi, epub
Publisher: Packt Publishing


Supervised and unsupervised learning

We delve more deeply here in to the traditional machine learning algorithms offered by Spark MLlib. We distinguish between supervised and unsupervised learning depending on whether the data is labeled. We distinguish between categorical or continuous depending on whether the data is discrete or continuous.

The following diagram explains the Spark MLlib supervised and unsupervised machine learning algorithms and preprocessing techniques:

The following supervised and unsupervised MLlib algorithms and preprocessing techniques are currently available in Spark:

Clustering: This is an unsupervised machine learning technique where the data is not labeled. The aim is to extract structure from the data:K-Means: This partitions the data in K distinct clusters

Gaussian Mixture: Clusters are assigned based on the maximum posterior probability of the component

Power Iteration Clustering (PIC): This groups vertices of a graph based on pairwise edge similarities

Latent Dirichlet Allocation (LDA): This is used to group collections of text documents into topics

Streaming K-Means: This means clusters dynamically streaming data using a windowing function on the incoming data



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.