Slice_OP: Selecting Initial Cluster Centers Using Observation Points

Publication Date - 2018-12-29 00:00:00

Slice_OP: Selecting Initial Cluster Centers Using Observation Points

2018-12-29

This paper proposes a new algorithm, Slice_OP, which selects the initial cluster centers on high-dimensional data. A set of observation points is allocated to transform the high-dimensional data into one-dimensional distance data. Multiple Gamma models are built on distance data, which are fitted with the expectation-maximization algorithm. The best-fitted model is selected with the second-order Akaike information criterion. We estimate the candidate initial centers from the objects in each component of the best-fitted model. A cluster tree is built based on the distance matrix of candidate initial centers and the cluster tree is divided into K branches. Objects in each branch are analyzed with k-nearest neighbor algorithm to select initial cluster centers. The experimental results show that the Slice_OP algorithm outperformed the state-of-the-art Kmeans++ algorithm and random center initialization in the k-means algorithm on synthetic and real-world datasets.

