Data Science Foundations by Murtagh Fionn

Data Science Foundations by Murtagh Fionn

Author:Murtagh, Fionn
Language: eng
Format: epub
Publisher: CRC Press LLC


5.10 Linear Time and Direct Reading Hierarchical Clustering

5.10.1 Linear Time, or O(N) Computational Complexity, Hierarchical Clustering

A point of departure for our work has been the computational ‘objective of bypassing computationally demanding hierarchical clustering methods (typically quadratic time, or O(N2) for N input observation vectors), but also having a framework that is of great practical importance in terms of the application domains.

Agglomerative hierarchical clustering algorithms are based on pairwise distances (or dissimilarities) implying computational time that is O(N2) where N is the number of observations. The implementation required to achieve this is, for most agglomerative criteria, the nearest neighbour chain, together with the reciprocal nearest neighbours, algorithm (furnishing inversion-free hierarchies whenever Bruynooghe’s reducibility property (see [167]) is satisfied by the cluster criterion).

This quadratic time requirement is a worst-case performance result. It is most often the average time also since the pairwise agglomerative algorithm is applied directly to the data without any preprocessing speed-ups (such as preprocessing that facilitates fast nearest neighbour finding). An example of a linear average-time algorithm for (worst-case quadratic computational time) agglomerative hierarchical clustering is given in [162].

With the Baire-based hierarchical clustering algorithm, we have an algorithm for lineartime, worst-case hierarchical clustering. It can be characterized as a divisive rather than an agglomerative algorithm.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.