PDF

Description

Hierarchical clustering is a common method used to determine clusters of similar data points in multi-dimensional spaces. O(n^2) algorithms, where n is the number of points to cluster, have long been known for this problem. This paper discusses parallel algorithms to perform hierarchical clustering using various distance metrics. I describe O(n) time algorithms for clustering using the single link, average link, complete link, centroid, median, and minimum variance metrics on an n node CRCW PRAM and O(n log n) algorithms for these metrics (except average link and complete link) on n\log n node butterfly networks or trees. Thus, optimal efficiency is achieved for a significant number of processors using these distance metrics. A general algorithm is given that can be used to perform clustering with the complete link and average link metrics on a butterfly. While this algorithm achieves optimal efficiency for the general class of metrics, it is not optimal for the specific cases of complete link and average link clustering.

Details

Files

Statistics

from
to
Export
Download Full History