Practical theory for clustering algorithms
Description
Prof. Dr. rer. nat. Johannes Blömer, Universität Paderborn
Prof. Dr. rer. nat. Christian Sohler, TU Dortmund
Project website
Clustering is the partition of a set of objects into subsets (cluster) such that objects inside one cluster are similar to each other while objects from different clusters are not. In order to compute such a partition it is required to define what is meant by similarity. E.g. if we have a map and choose the recorded cities as our set of objects, we can interpret a short distance between the cities as similarity. Then the clustering corresponds to a division of our map into urban areas. Clustering algorithms try to compute clusterings that are as good as possible. One application area is data reduction. For that purpose every cluster is replaced by a proper substitute.
In a research project aided by the DFG we concentrate on two aspects. On the one hand we analyse algorithms of widespread use in order to find out how good or bad they are in theory. On the other hand we try to design provably good algorithms. Particular similarity measures like asymmetric measures are of special interest throughout our examination. One example of an asymmetric measure is the Kullback-Leibler divergence that enables us to talk about the similarity of probability distributions. The figure shows the asymmetry of the KLD. The two colored curves have the same distance to the marked center, in one case measured from the curve to the center and in the other case the other way around.
Publications
- M. R. Ackermann, J. Blömer and C. Scholz. "Hardness and Non-Approximability of Bregman Clustering Problems", Electronic Colloquium on Computational Complexity, ECCC, Vol. 18. 2011, pp. 1-20. [More]
- M. R. Ackermann, J. Blömer, D. Kuntze and C. Sohler. "Analysis of Agglomerative Clustering". 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011. 2011. pp. 308-319. [More]
- M. R. Ackermann, J. Blömer and C. Sohler. "Clustering for Metric and Non-Metric Distance Measures", ACM Transactions on Algorithms, Vol. 6, August, 2010, pp. 59:1-26. [More]
- M. R. Ackermann. "Algorithms for the Bregman k-Median Problem". 2010. [More]
- M. Monemizadeh. "Non-uniform Sampling in Clustering and Streaming". TU Dortmund. 2010. [More]