Projects

  • StreamKM++

     

    StreamKM++ is a new k-means clustering algorithm for data streams. It computes a small weighted sample of the data stream and solves the k-means problem on this sample using the k-means-++ algorithm by Arthur and Vassilvitskii. We use two new techniques. First, we use non-uniform sampling similar to the k-means++ algorithm. This leads to a rather easy implementable algorithm and to a runtime which has only a low dependency on the dimensionality of the data. Second, we develop a new data structure called coreset tree in order to significantly speed up the time necessary for sampling non-uniformly during the coreset construction.

    We compared our algorithm experimentally with two well-known streaming implementations (BIRCH and StreamLS). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and signicantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We also gave theoretical justification that our sample set is small in low dimensional spaces.

     

    StreamKM++ was published in: Ackermann, Lammersen, Märtens, Raupach, Sohler, Swierkot: "StreamKM++: A Clustering Algorithm for Data Streams". In Proceedings of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX 2010)


    Note that our algorithm was integrated into MOA, a framework for data stream mining, as a package.

    For more information on StreamKM++, see also the StreamKM++ Website.

  • SCIL

     

    The SCIL (Symbolic Constraints in Integer Linear-Programming) software package facilitates and accelerates the implementation of exact solution algorithms for hard optimization problems. It has been applied successfully in many different areas. SCIL is based on an efficient branch-and-cut algorithm. However, in contrast to other software, it allows to specify symbolic constraints instead of linear constraints. Symbolic constraints play an important role in Constraint Programming and make SCIL a user-friendly tool, allowing even non-experts to easily access powerful state-of-the-art optimization techniques.

  • Generic Dantzig-Wolfe Decomposition

     

    Many recent successful approaches to solve structured mathematical programs build explicitly or implicitly on Dantzig-Wolfe reformulation (DWR). "Structurued" in this context refers to the property that a complex optimization problem may contain (much) easier to solve (rather local) subproblems which are linked by (rather global) constraints. In order to allow also non-experts to benefit from the advantages of the method, and to give experts a platform for standardized computational experiements, we develop a mixed integer programming solver which is based on DWR and which automatically detects decomposable structures in mixed integer programs.


    Our implementation based on the branch-cut-and-price framework SCIP, and in fact turns it into a branch-cut-and-price solver. It detects matrix structures (block-diagonal, bordered block-diagonal, arrowhead) and performs a full DWR without any user interaction. The package inherits the state-of-the-art MIP solving machinery, and complements it by several features particular to column generation and branch-and-price, among them tailored branching rules (eg., Ryan/Foster), branching and cutting on original variables, etc.

     

    GCG 1.0.0 has been released on August, 1 and is available for download here.

  • LinTim

     

    This project makes extensive use of the software toolbox LinTim, a collection of real public transportation data and algorithms for planning lines, timetables, schedules and for delay management. For more information, please visit the LinTim -homepage.

  • ExactIP

     

    Our objective is to design a framework that can exactly solve MIP instances that are given by rational data. For a feasible instance, a true optimum solution will be computed; for an infeasible instance, a provably correct infeasibility certificate will be issued. Based on the software SCIP (Solving Constraint Integer Programs) we want to provide a fully fledged production code which includes all state-of-the-art components of modern MIP solvers to achieve reasonable running times.

  • Diffusion-based Partitioning (DibaP)

     

    Diffusion-based Partitioning (DibaP)


    DibaP is a tool for graph partitioning and repartitioning. While its MPI parallel version focusses on repartitioning, the sequential and thread-parallel version can also partition graphs from scratch. With these implementations of our algorithms, we could improve the best edge cut results for a number of popular benchmark graphs.

  • Exact computation of the star discrepancy of a pointset

     

    Programs for the exact computation of the star discrepancy of a pointset: discr_calc.tar.gz. Includes an implementation of the DEM algorithm. Faster, but inexact algorithms for this problem are forthcoming.

    For inquiries about the library for randomized and derandomized rounding, please contact Magnus Wahlström.

  • A Generator for Dynamic Clustered Random Graphs

     

    This section offers supplementary information on our random generator for graphs which are both dynamic and have an implanted clustering. A full description including data structures, usage and background can be found in our technical report.

     

    Download

    Downloading and using our generator is free. Please use this link to obtain the command-line version of the generator in this jar-file: dcrgenerator.jar

    Using the generator is thus done by calling java -jar dcrgenerator.jar . Although all command-line parameters are supplied with reasonable default values, you will have to look them up in our documentation if you intend to use the generator. This document is also listed in the electronic archive of KIT: technical report.

     

    License: GPL

    The version of our generator which is to be included in visone will be available as soon as visone fully incorporates dynamics in a future release.

  • Dynamic arXiv Collaboration Graphs

     

    Since 1992 the arXiv.org e-Print archive is a popular repository or scienti fic e-prints, stored in several categories alongside timestamped metadata. Our Collecting Spider, see below, can be used to collect data from the arXiv.org e-Print archive. The Scheduler, see below, extracts networks of collaboration between scientists based on coauthorship. For each e-print it adds equally weighted clique-edges among the contributors such that each author gains a total edge weight of 1.0 per e-print contributed to. It lets e-prints time out after two years and removes disconnected authors.

     

    The Collecting Spider

    The arXiv API provides an interface to the arXiv database, returning query results in the format of an Atom XML feed. Our arXiv spider is a simple Python program which automates querying the database and parsing the results.

    Download: arxivspider.zip

    License: GPL

     

    The Scheduler

    Our tools for extracting a stream of graph events from arXiv dumps are available here. Note that for outputting intermediate graphs (can be enabled in the source code), the yfiles graph libraries are required, otherwise the corresponding functions will need to be removed from the source files.

    Download: scheduler.zip

    License: GPL

  • SAT related software

     

    Our SAT related software can basically be classified into two categories. We implement different SAT solvers to evaluate SAT-solving heuristics in single-threaded and multi-threaded environments. Moreover, we develop tools to analyse and visualise SAT instances and the process of state-of-the-art SAT solvers.

  • Simultaneous Graph Drawing Tool (SGDT)

     

    SGDT is an interactive user-friendly tool for displaying simultaneous drawings of a set of up to 32 graphs in 2.5D style.

  • Open Graph Drawing Framework (OGDF)

     

    OGDF is a self-contained C++ class library for the automatic layout of diagrams. OGDF offers sophisticated algorithms and data structures to use within your own applications or scientific projects. The library is available under the GNU General Public License.

  • CluE

     

    CluE Website


    CluELab - Clustering Environment


    CluE is a template based C++ library for clustering arbitrary data. All of CluE's algorthims can be instantiated for any data type. The algorithms only need implementations of so called attachments for a new data type, for example to compute the distance between two of the objects. The library is still in alpha stadium, but so far there are implementations of two agglomerative algorithms, the famous Lloyd's algorithm and two coreset algorithms.

     

    CluELab - Clustering Environment Laboratory


    The CluELab application is based on the CluE library. With CluELab one can easily analyse the clusterings computed by CluE's algorithms. So far it supports a data type for d-dimensional points and is able to visualize 2 dimensional points.

  • SeqAn

     

    SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.

    The library provides basic data structures such as various efficient string implementations and graphs, basic algorithms commonly used in bioinformatics such as DP alignment algorithms, online string search algorithms and graph algorithms. Furthermore it provides, advanced data structures such as various indices, algorithms on these data structures and advanced algorithms like string filters and heuristic sequence alignment and realignment algorithms.

  • The Multi-Core Standard Template Library (MCSTL)

     

    The Multi-Core Standard Template Library (MCSTL) is a parallel implementation of the standard C++ library. It makes use of multiple processors and/or multiple cores of a processor with shared memory. It blends in transparently and there is in principle no change necessary in the program itself. Applications that make use of the STL algorithms like sort, random_shuffle, partial_sum or for_each, benefit from the improved speed due to parallelism. MCSTL cooperates particularly well with the STXXL, the STL library for huge data sets in external memory.

  • Standard Template Library for Extra Large Data Sets (STXXL)

     

    Standard Template Library for Extra Large Data Sets (STXXL)

    The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

     

    The following are some key features of STXXL:
    · Transparent support of parallel disks. The library provides implementations of basic parallel disk algorithms. STXXL is the only external memory algorithm library supporting parallel disks.
    · The library is able to handle problems of very large size (up to dozens of terabytes).
    · Improved utilization of computer resources. STXXL supports explicitly overlapping between I/O and computation. STXXL implementations of external memory algorithms and data structures benefit from overlapping of I/O and computation.
    · Small constant factors in I/O volume. A unique library feature "pipelining'' can save more than half the number of I/Os performed by many applications.
    · Shorter development times due to well known STL-compatible interfaces for external memory algorithms and data structures. STL algorithms can be directly applied to STXXL containers; moreover the I/O complexity of the algorithms remains optimal in most of the cases.

Results 1 - 17 of 17