Python: Data Analytics and Visualization by Vo.T.H Phuong & Czygan Martin & Kumar Ashish & Raman Kirthi

Python: Data Analytics and Visualization by Vo.T.H Phuong & Czygan Martin & Kumar Ashish & Raman Kirthi

Author:Vo.T.H, Phuong & Czygan, Martin & Kumar, Ashish & Raman, Kirthi [Vo.T.H, Phuong]
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2017-03-30T17:00:00+00:00


The goal of this algorithm is to attain a configuration of cluster centers and cluster observation so that the overall J squared error function or J-score is minimized:

Here, c=number of clusters, ci=number of points in the cluster, and Vi=centroid of the ith cluster.

The J squared error function can be understood as the sum of the squared distance of points from their respective cluster centroids. A smaller value of J squared function implies tightly packed and homogeneous clusters. This also implies that most of the points have been placed in the right clusters.

Let us try the k-means clustering algorithm for clustering some random numbers between 0 and 1. The Python library and Scipy have some inbuilt methods to perform the algorithm and return a list defining which observation belongs to which cluster:

Define a set of observations consisting of random numbers ranging from 0 to 1. In this case, we have defined an observation set of 30x3:Import numpy as np obs=np.random.random(90).reshape(30,3) obs



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.