Practical Approximate Pattern Matching with Index Structures

Description

Prof. Dr. Ernst W. Mayr, Technische Universität München

 

Project website

 

In this research project we apply the whole spectrum of algorithm engineering to index structures for approximate pattern matching. In contrast to the case of exact searching there is a big gap between theory and practice for the approximate case. We want to close this gap with our project. Some of the approaches which yield theoretical advances, seem to be not applicable in practice due to big constants.

A library of pattern matching algorithms would be of great benefit for other researches and computer scientists, providing efficient methods for index-based approximate pattern matching. The library is supposed to be an interface for the ongoing progress in the field of indexes and algorithms. The implementations form a working basis for the further algorithm engineering process, but as well can be used by e.g. biologists to solve real world problems, like analyzing biological data. The library is supposed to offer index structures and search algorithms, developed and optimized espacially considering the following aspects:

  • Efficient applicability to for big, real world problem instances, while using general and flexible error models.
  • Index structures and search algorithms in secondary memory.
  • Distributed index structures and search algorithms.
  • Cache-efficient search algorithms.
  
  

Publications

  • A. Dau and J. Krugel. "tt-analyze and tt-generate: Tools to Analyze and Generate Sequences with Trained Statistical Properties". German Conference on Bioinformatics 2011, Weihenstephan, Germany (GCB 2011). 2011. [More] 
  • T. Stadler. "Konstruktion von komprimierten Indizes für riesige Texte". Technische Universität München. 2011. [More] 
  • H. Poppe. "Approximative Suche in Textindizes". Technische Universität München. 2010. [More] 
  • A. Dau. "Analyse der Struktur und statistischer Eigenschaften von Texten und Erzeugung zufälliger Texte". Technische Universität München. 2010. [More] 
[ Back ]