From sequencing to hardware acceleration of DNA alignment software: A integral review

D Pacheco Bautista; M González Pérez; I Algredo Badillo

2015, Number 3

<< Back

Rev Mex Ing Biomed 2015; 36 (3)

From sequencing to hardware acceleration of DNA alignment software: A integral review

Pacheco BD, González PM, Algredo BI

Full text

How to cite this article

Language: Spanish
References: 43
Page: 257-275
PDF size: 955.41 Kb.

ABSTRACT

In recent years, impressive progress has occurred in the machines of massively parallel sequencing, also called of nextgeneration sequencing (NGS), for example, recent machines like Illumina HiSeq are capable of generating millions of reads in a single run. However, these technologies are limited to sequence only small fragments of genetic material (35 to 1100 nucleotides), so that for complete-genome sequencing, it is necessary to divide the chain, to sequence the fragments, and, subsequently, to assemble the obtained short readings. In this paper, the recent NGS sequencing technologies are reviewed and compared, analyzing the problem of sequence assembly, and formally establishing the problem of alignment. Also, it is examined the main alignment programs and the algorithms that support them. Finally, after concluding that sequencing technologies have speed that exceeds 10 times to the speed of the alignment programs, the hardware acceleration is reviewed as an alternative to accelerate these programs. This work, which is a comprehensive analysis and review, aims to contribute to the development of the research in the area of bioinformatics in the country.

REFERENCES

Frese, K.S., Katus, H.A. and Meder, B. “Next-Generation Sequencing: From understanding biology to personalized medicine”. Biology, Vol. 2, pp. 378-398, 2013.
Sanger, F., Nicklen, S. and Coulson, A.R. “DNA sequencing with chainterminating inhibitors”. PNAS, Vol. 74, No. 12, pp. 5463-5467, 1977.
Maxam, A. and Gilbert, A. “A new method for sequencing DNA”. PNAS, Vol. 74, No. 2, pp. 560-564, 1977.
Venter, C., et al. “The sequence of the human genome”. Science, Vol. 291, pp. 1304-1351, 2001.
Liu, L., et al. “Comparison of Next generation sequencing systems”. Journal of Biomedicine and Biotechnology, pp. 1-11, 2012.
Myllykangas, S., Buenrostro, J. and Ji, H.P. “Overview of sequencing technology platforms”. [book auth.] Naiara Rodríguez Ezpeleta, Michael Hackenberg and Ana M. Aransay. Bioinformatics for high throughput sequencing. s.l. : Springer, 2012, pp. 11-25.
Quail, M.A., et al. “A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers”. BMC Genomics, Vol. 13. No. 341, 2012.
Pop, M. “Shotgun sequence assembly”. Advances in Computers, Vol. 60, pp. 193-248, 2004.
Kim, R.Y., et al. “The future of personalized care in colorectal cancer”. Personalized Medicine, Vol. 8, No. 3, pp. 331-345, 2011.
Li, Z., et al. “Comparison of the two major classes of assembly algorithms”. Briefings in Functional Genomics, Vol. 11, No. 1, pp. 25-37, 2012.
Miller, J.R., Koren, S. and Sutton, G. “Assembly algorithms for nextgeneration sequencing data”. Genomics, vol. 95, no. 6, pp. 315-327, 2010.
Altschul, S„ et al. “Basic local alignment search tool”. Journal of Molecular Biology, Vol. 215, No. 3, pp. 403-410, 1990.
Muse, S. Genomics and bioinformatics. [book auth.] John D. Enderle, Susan M. Blanchard and Joseph D. Bronzino. Introduction to Biomedical Engineering. 2. s.l. : Elsevier, 2005, pp. 799-831.
Fonseca, N.A, et al. “Tools for mapping high-throughput sequencing data”. Bioinformatics, Vol. 28, No. 24, pp. 3169-3177, 2012.
Shang, J., et al. “Evaluation and comparison of multiple aligners for nextgeneration sequencing data analysis”. BioMed Research International, Vol. 2014.
Ruffalo, M., LaFramboise, T. and Koyutürk, M. “Comparative analysis of algorithms for nextgeneration sequencing read alignment”. Bioinformatics, Vol. 27, No. 20, pp. 2790-2796, 2011.
Li, H. and Homer, N. “A survey of sequence alignment algorithms for nextgeneration sequencing”. Briefings in Biointormatics, Vol. 2, No. 5, pp. 473- 483, 2010.
Li, H., Ruan, J. and Durbin, R. “Mapping short DNA sequencing reads and calling variants using mapping quality scores”. Genome Research, Vol. 18, pp. 1851-1858, 2008.
Campagna, D., et al. “PASS: a program to align short sequences”. Bioinformatics, Vol. 25, No. 7, pp. 967- 968, 2009.
Ning, Z., Cox, A. and Mullikin, J. “SSAHA: A fast search method for large DNA databases”. Genome Research, Vol. 11, No. 10, pp. 1725-1729, 2001.
Li, R., et al. “SOAP: short oligonucleotide alignment program”. Bioinformatics, Vol. 24, No. 5, pp. 713-714, 2008.
Smith, A.D., Xuan, Z. and Zhang, M.Q. “Using quality scores and longer reads improves accuracy af Solexa read mapping”. BMC Bioinformatics, Vol. 9, No. 128, 2008.
Jiang, H. and Wong, W. “SeqMap: mapping massive amount of oligonucleotides to the genome”. Bioinformatics, Vol. 24, No. 20, p. 2395, 2008.
Lin, H., et al. “Zoom! Zillions of oligos mapped”. Bioinformatics, Vol. 24, No. 21, pp. 2431-2437, 2008.
Rizk, G. and Lavenier, D. “GASSST: Global alignment short sequence search tool”. Bioinformatics, Vol. 26, No. 20, pp. 2534-2540, 2010.
Homer, N., Merriman, B. and Nelson, S. “BFAST: an alignment tool for large scale genome resequencing”. PLoS ONE, Vol. 4, 2009.
David, M., et al. “SHRiMP2”, Bioinformatics, Vol. 27, No. 7, pp. 1011-1012, 2011.
Smith, T.F. and Waterman, M. S. “Identification of common molecular subsequences”. Journal of Molecular Biology, Vol. 147, No. 1, pp. 195-197, 1981.
Needleman, S.B. and Wunsch, C.D. “A general method applicable to the search for similarities in the aminoacid sequence of two proteins”. Journal of Molecular Biology, Vol. 48, No. 3, pp. 443-453, 1970.
Burrows, M. and Wheeler, D.J. “A block sorting lossless data compression algorithm”. Systems Research Center, Digital Equipment Corporation. Palo Alto, California : s.n., 1994. Reporte Técnico. 124.
Ferragina, P. and Manzini, G. “Opportunistic data structures with applications”. Redondo Beach, CA: IEEE, 2000. Foundations of computer science. pp. 390-398.
Li, R., et al., et al. “SOAP2: an improved ultrafast tool for short read alignment”. Bioinformatics, Vol. 25, No. 15, pp. 1966-1967, 2009.
Langmead, B., et al. “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome”. Genome Biology, Vol. 10, No. 3, 2009.
Li, H. and Durbin, R. “Fast and accurate short read alignment with Burrows-Wheeler transform”. Bioinformatics, Vol. 25, No. 14, pp. 1754-1760, 2009.
Navarro, G., et al. “Indexing methods for aproximate string matching”. IEEE Data Engineering Bulleting, Vol. 24, No. 4, pp. 19-27, 2001.
Schbath, S., et al. “Mapping reads on a genomic sequence: an practical comparative analysis”. Statistics for systems biology group. Paris, Francia : s.n., 2011. Reporte técnico. 34.
Che, S., et al. “Accelerating computeintensive applications with GPUs and FPGAs”. Anahem, CA : IEEE, 2008. Application specific processors, SASP2008. pp. 101-107.
Liu, C.M., et al. “SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads”. Bioinformatics Advance Access Published, Vol. 28, No. 6, pp. 878-879, 2012.
Liu, Y., Schmidt, B. and Maskell, D.L. “Cushaw: a cuda compatible short read aligner to large genomes based on the burrows-wheeler transform”. BMC Research Notes, Vol. 5, No. 1, p. 27, 2012.
Nelson, C., et al. “Shepard: A fast exact match short read aligner”. Formal Methods and Models for Codesign (MEMOCODE), 2012 10th IEEE/ACM International Conference on. pp. 91-94.
Fernandez, E., Najjar, W. and Lonardi, S. “String matching in hardware using the FM-Index”. Salt, Lake City, UT: IEEE, 2011. IEEE International Symposium on Field-Programmable Custom Computing Machines. pp. 218- 225.
Arram, J., et al. “Reconfigurable acceleration of short read mapping”. Seattle, WA : IEEE, 2013. 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines. pp. 210-217.
Waidyasooriya, H.M., Hariyama, M. and Kameyama, M. “Implementation of a custom harwdware-accelerator for short-read mapping using Burrows- Wheeler Alignment”. Osaka, Japan: IEEE, 2013. 35th Annual International Conference of the IEEE EMBS. pp. 651- 654.