2012, Number 2
<< Back Next >>
Rev Mex Ing Biomed 2012; 33 (2)
A Consensus Algorithm for Approximate Pattern Matching in Protein Sequences
Alba A, Rubio-Rincon M, Rodrguez-Kessler M, Arce-Santana ER, Mendez MO
Language: Spanish
References: 21
Page: 87-99
PDF size: 783.78 Kb.
ABSTRACT
In bioinformatics, one of the main tools which allow scientists to nd
common characteristics in protein or DNA sequences of dierent species
is the approximate matching of strings. From the computational point of
view, the diculty of approximate string matching lies in nding adequate
measures to eciently compare two strings, since, in many cases, one is
interested in performing searches in real time, within large databases. In
this paper we propose a novel method for approximate string matching
based on a generalization of the algorithm proposed by Baeza-Yates and
Perleberg in 1996 for computing the Hamming distance between two
sequences. In addition, a post-processing stage which signicantly reduces
the number of false positives is presented. The proposed method has been
evaluated in synthetic cases of random sequences, and with real cases of
plant protein sequences. Results show that the proposed algorithm is
highly ecient in computational terms and in specicity, especially when
compared against a previously published method, which is based on the
phase correlation function.
REFERENCES
Smith TF, Waterman MS. Identication of common molecular subsequences". Journal of Molecular Biology, 1981; 147: 195-197.
Altschul SF, GishW, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool". Journal of Molecular Biology, 1990; 215: 403-410.
Hamming RW. Error detecting and error correcting codes". Bell System Technical Journal, 1950; 29(2): 147-160.
Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady, 1966; 10: 707-710.
Howie JM. Automata and Languages. Oxford University Press, 1991.
Ukkonen E. Algorithms for approximate string matching". Inf. Control, 1985; 64(1- 3): 100-118.
Jokinen P, Tarhio J, Ukkonen E. A comparison of approximate string matching algorithms". Softw. Pract. Exper., 1996; 26(12): 1439-1458.
Navarro G. A guided tour to approximate string matching". ACM Computing Surveys, 2001; 33(1): 31-88.
Navarro G, Baeza-Yates RA, Sutinen E, Tarhio J. Indexing methods for approximate string matching". IEEE Data Engineering Bulletin, 2001; 24(4): 19-27.
Boytsov L. Indexing methods for approximate dictionary searching: Comparative analysis". J. Exp. Algorithmics, 2011; 16: 1.1:1.1-1.1:1.91.
Buhler J. Ecient large-scale sequence comparison by locality sensitive hashing". Bioinformatics, 2001; 17(5): 419-428.
Baeza-Yates RA, Perleberg CH. Fast and practical approximate string matching". Inf. Process. Lett., 1996; 59(1): 21-27.
Alba A, Rodriguez-Kessler M, Arce- Santana ER, Mendez MO. Approximate string matching using phase correlation". Proceedings of the 34th Annual International Conference of the IEEE EMBS, 2012; pp. 6309-6312.
Frigo M, Johnson SG. The Design and Implementation of FFTW3". Proc. IEEE, 2005; 93(2): 216-231.
Baeza-Yates RA, Gonnet GH. A new approach to text searching". Proceedings of the 12th Annual ACM-SIGIR Conference on Information Retrieval, 1989; pp. 168-175.
Wu S, Manber U. Fast Text Searching With Errors". Technical Report TR 91-11, Department of Computer Science, University of Arizona, 1991.
Ochoa-Alfaro A, Rodrguez-Kessler M, Perez-Morales M, Delgado-Sanchez P, Cuevas-Velazquez C, Gomez-Anduro G, Jimenez-Bremont J. Functional characterization of an acidic SK3 dehydrin isolated from an Opuntia streptacantha cDNA library". Planta, 2012; 235: 565- 578.
Hundertmark M, Hincha DK. LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana". BMC Genomics, 2008; 9: 118.
Jimenez-Bremont JF, Maruri-Lopez I, Ochoa-Alfaro A, Delgado-Sanchez P, Bravo J, Rodrguez-Kessler M. LEA gene introns: is the intron of dehydrin genes a characteristic of the serinesegment?". Plant Mol Biol Rep. (DOI: 10.1007/s11105-012-0483-x). In press.
Allagulova CR, Gimalov FR, Shakirova FM, Vakhitov VA. The plant dehydrins: structure and putative functions". Biochemistry (Moscow), 2003; 68: 945- 951.
Kosova K, Prasil IT, Vtamvas P. Role of dehydrins in plant stress response" En: Pessarakli M, editor, Handbook of Plant and Crop Stress, 3rd ed., CRC Press (Florida), 2010: 239-285.