New algorithms for finding monad patterns in DNA sequences
Computer Science, Artificial Intelligence; Computer Science, Theory &; Methods
In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt(2)l(d)sigma(d)), where t is the number of input sequences, n is the length of each input sequence, and sigma = \Sigma\ is the size of the alphabet. The first algorithm that we present in this paper takes O(n(2)t(2)l(d/2)) time and O(ntl(d/2)sigma(d/2)) space; and the second algorithm takes O(n(3)t(3)l(d/2)sigma(d/2)) time using O(l(d/2)sigma(d/2)) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.
String Processing and Information Retrieval, Proceedings
"New algorithms for finding monad patterns in DNA sequences" (2004). Faculty Bibliography 2000s. 4772.