Estimating Generalized Dunn's Cluster Validity Indices for Big Data
- Resource Type
- Conference
- Authors
- Rathore, Punit; Ghafoori, Zahra; Bezdek, James C.; Palaniswami, Marimuthu; Leckie, Christopher
- Source
- 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) SMC Systems, Man, and Cybernetics (SMC), 2018 IEEE International Conference on. :656-661 Oct, 2018
- Subject
- Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Clustering algorithms
Approximation algorithms
Partitioning algorithms
Big Data
Indexes
Couplings
Measurement
- Language
- ISSN
- 2577-1655
Dunn's internal cluster validity index and its generalizations assess partition quality. For partitions of n samples of p-dimensional feature vector data, all but two of the generalized Dunn's indices (GDIs) have quadratic time complexity O(pn^2), so computation is untenable for very large values of n. In this paper, we present two methods for approximating GDIs based on Maximin (MM) Sampling. MM sampling identifies a skeleton of the full partition that usually contains some of the boundary points in each cluster which are used to compute GDIs. We compare our algorithms with a support vector machine based boundary extraction method and a random sampling based estimation method. Our experiments on four real and synthetic datasets show that computing approximations to (three) GDIs with the MM skeleton is both computationally tractable and reliably accurate.