New algorithms for finding monad patterns in DNA sequences

Authors

    Authors

    R. V. Satya;A. Mukherjee

    Comments

    Authors: contact us about adding a copy of your work at STARS@ucf.edu

    Keywords

    Computer Science, Artificial Intelligence; Computer Science, Theory &; Methods

    Abstract

    In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt(2)l(d)sigma(d)), where t is the number of input sequences, n is the length of each input sequence, and sigma = \Sigma\ is the size of the alphabet. The first algorithm that we present in this paper takes O(n(2)t(2)l(d/2)) time and O(ntl(d/2)sigma(d/2)) space; and the second algorithm takes O(n(3)t(3)l(d/2)sigma(d/2)) time using O(l(d/2)sigma(d/2)) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.

    Journal Title

    String Processing and Information Retrieval, Proceedings

    Volume

    3246

    Publication Date

    1-1-2004

    Document Type

    Article

    Language

    English

    First Page

    273

    Last Page

    285

    WOS Identifier

    WOS:000224377200040

    ISSN

    0302-9743; 3-540-23210-9

    Share

    COinS