Spherical k-Means Clustering

Kurt Hornik; Ingo Feinerer; Martin Kober; Christian Buchta

doi:10.18637/jss.v050.i10

Kurt Hornik, Ingo Feinerer, Martin Kober, Christian Buchta

Abstract

Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents.

This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.

Files:

Paper R package (skmeans) Replication materials (code for examples/simulations and data) R package (tm.corpus.Oz.Books)

Published:

Sep 18, 2012

DOI:

10.18637/jss.v050.i10

Main Article Content

Abstract

Article Details

Article Sidebar