Practical theory for clustering algorithms
Description
Prof. Dr. rer. nat. Johannes Blömer, Universität Paderborn
Prof. Dr. rer. nat. Christian Sohler, TU Dortmund
Project website
Clustering is the partition of a set of objects into subsets (cluster) such that objects inside one cluster are similar to each other while objects from different clusters are not. In order to compute such a partition it is required to define what is meant by similarity. E.g. if we have a map and choose the recorded cities as our set of objects, we can interpret a short distance between the cities as similarity. Then the clustering corresponds to a division of our map into urban areas. Clustering algorithms try to compute clusterings that are as good as possible. One application area is data reduction. For that purpose every cluster is replaced by a proper substitute.
In a research project aided by the DFG we concentrate on two aspects. On the one hand we analyse algorithms of widespread use in order to find out how good or bad they are in theory. On the other hand we try to design provably good algorithms. Particular similarity measures like asymmetric measures are of special interest throughout our examination. One example of an asymmetric measure is the Kullback-Leibler divergence that enables us to talk about the similarity of probability distributions. The figure shows the asymmetry of the KLD. The two colored curves have the same distance to the marked center, in one case measured from the curve to the center and in the other case the other way around.
Publications
- M. R. Ackermann, J. Blömer and C. Scholz. "Hardness and Non-Approximability of Bregman Clustering Problems", Electronic Colloquium on Computational Complexity, ECCC, Vol. 18. 2011, pp. 1-20. [More]
- M. R. Ackermann, J. Blömer, D. Kuntze and C. Sohler. "Analysis of Agglomerative Clustering". 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011. 2011. pp. 308-319. [More]
- M. R. Ackermann, J. Blömer and C. Sohler. "Clustering for Metric and Non-Metric Distance Measures", ACM Transactions on Algorithms, Vol. 6, August, 2010, pp. 59:1-26. [More]
- M. R. Ackermann. "Algorithms for the Bregman k-Median Problem". 2010. [More]
- M. Monemizadeh. "Non-uniform Sampling in Clustering and Streaming". TU Dortmund. 2010. [More]
- M. Monemizadeh and D. P. Woodruff. "1-Pass Relative-Error L_p-Sampling with Applications". SODA. 2010. pp. 1143-1160. [More]
- M. R. Ackermann and J. Blömer. "Bregman Clustering for Separable Instances". Proceedings of the 12th Scandinavian Symposium and Workshop on Algorithm Theory, SWAT 2010. 2010. pp. 212-223. [More]
- M. R. Ackermann, C. Lammersen, M. Märtens, C. Raupach, C. Sohler and K. Swierkot. "StreamKMtt ++: A Clustering Algorithm for Data Streams". Proceedings of the Workshop on Algorithm Engineering and Experiments, ALENEX 2010. 2010. pp. 175-187. [More]
- A. Krüger, V. Leutnant, R. Häb-Umbach, M. R. Ackermann and J. Blömer. "On the Initialisation of Dynamic Models for Speech Features". Sprachkommunikation 2010 -- Beiträge der 9. ITG-Fachtagung. 2010. pp. 25:1-4. [More]
- D. Feldman, M. Monemizadeh, C. Sohler and D. P. Woodruff. "Coresets and sketches for high dimensional subspace approximation problems". Proceedings of the 21th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010. 2010. pp. 630-649. [More]
- M. R. Ackermann and J. Blömer. "Coresets and Approximate Clustering for Bregman Divergences". SODA 2009. 2009. pp. 1088-1097. [More]
- D. Feldman, M. Monemizadeh and C. Sohler. "A PTAS for k-means clustering based on weak coresets". Proceedings of the 23th annual symposium on Computational geometry, SoCG 2007. 2007. pp. 11-18. [More]