StreamKM++
Research Area:  Practical theory for clustering algorithms  
Status:  Finished  
Description:  
StreamKM++ is a new kmeans clustering algorithm for data streams. It computes a small weighted sample of the data stream and solves the kmeans problem on this sample using the kmeans++ algorithm by Arthur and Vassilvitskii. We use two new techniques. First, we use nonuniform sampling similar to the kmeans++ algorithm. This leads to a rather easy implementable algorithm and to a runtime which has only a low dependency on the dimensionality of the data. Second, we develop a new data structure called coreset tree in order to significantly speed up the time necessary for sampling nonuniformly during the coreset construction.
StreamKM++ was published in: Ackermann, Lammersen, Märtens, Raupach, Sohler, Swierkot: "StreamKM++: A Clustering Algorithm for Data Streams". In Proceedings of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX 2010)
