An Efficient Combinatorial Approach for Solving the DNA Motif Finding Problem

Filippo Geraci, Marco Pellegrini and M.Elena Renda


Istituto di Informatica e Telematica (IIT)
Consiglio Nazionale delle Ricerche (CNR)
I-56100
Pisa (PI) ITALY

Contacts:
Filippo.Geraci_AT_iit.cnr.it
Marco.Pellegrini_AT_iit.cnr.it
Elena.Renda_AT_iit.cnr.it


Abstract. The detection of an over-represented sub-sequence in a set of (carefully chosen) DNA sequences is often the main clue leading to the investigation of a possible functional role for such a subsequence. Over-represented substrings (with possibly local mutations) in a biological string are termed motifs. A typical functional unit that can be modeled by a motif is a Transcription Factor Binding Site (TFBS), a portion of the DNA sequence apt to the binding of a protein that participates in complex transcriptomic biochemical reactions. In the literature it has been proposed a simplified combinatorial problem called the planted (l-d)-motif problem (known also as the (l-d) Challenge Problem) that captures the essential combinatorial nature of the motif finding problem. In this paper we propose a novel graph-based algorithm for solving a refinement of the (l-d) Challenge Problem . Experimental results show that instances of the (l-d) Challenge Problem considered difficult for competing state of the art methods in literature can be solved efficiently in our framework.

 


BibTex

@InProceedings{Geraci_et_alISDA09,
author = "Geraci, Filippo and Pellegrini, Marco and Renda, M. Elena",
title = "An Efficient Combinatorial Approach for Solving the DNA Motif Finding Problem",
booktitle = "Proc. of the 9th International Conference on Intelligent Systems Design and Applications (ISDA’09)",
year = "2009",
pages = "335--340"
address = "Pisa, Italy"
}