Next Generation Sequencing and Sequence Assembly by Ali Masoudi-Nejad Zahra Narimani & Nazanin Hosseinkhan

Next Generation Sequencing and Sequence Assembly by Ali Masoudi-Nejad Zahra Narimani & Nazanin Hosseinkhan

Author:Ali Masoudi-Nejad, Zahra Narimani & Nazanin Hosseinkhan
Language: eng
Format: epub
Publisher: Springer New York, New York, NY


The problem of repeats can be resolved by high coverage of sequences, but existing errors in sequence data don’t allow the repeat discovery task to be very easy. To resolve the repeats that are longer than reads, paired-ends are needed (paired-end [mate-pair] technologies are described in Chap. 1). This is a more complicated task than resolving repeats shorter than read sizes using single reads. Inexact repeats can be separated by the high-stringency alignment of reads and finding read correlations using different base call patterns in them [11]. The task of resolving repeats will be explained later for each assembly algorithm in Chap. 4. All these, in addition to the size of genomes and large number of reads, make assembly a complicated problem requiring an efficient solution and data structure design and computationally high-performance platforms. Intelligent heuristics and tricks play an important role in overcoming these difficulties.

It is important to note that assembling is not applied only to genomic data, and, for example, assembling transcriptomics data (such as expressed sequence tags—“ESTs”), which gives a view of the biological state of a cell, is something that is also very important in practice. However, the challenges in various kinds of data are different. The discontinuity of transcriptomics data results in less contiguity than genomic data. Since repeats mainly exist in intron regions of the genome, repeat is not a major issue in assembling transcriptomics data. But since transcription from a single part of the genome can be done in different patterns (i.e., from different start and end positions), this adds an additional complexity to the assembly of transcriptomics data. Algorithmic approaches are needed to handle other situations referring to ESTs—for example, different rate of expression (highly expressed genes), alternative splicing, and paralogous genes. These problems are even more serious with the contamination of the CDNA library by genomic data [12].



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.