Big Data Analysis for Bioinformatics and Biomedical Discoveries by Ye Shui Qing

Big Data Analysis for Bioinformatics and Biomedical Discoveries by Ye Shui Qing

Author:Ye, Shui Qing [Ye, Shui Qing]
Language: eng
Format: epub, pdf
Published: 2015-11-21T08:00:06+00:00


134 ◾ Big Data Analysis for Bioinformatics and Biomedical Discoveries

can determine if a position is methylated or not. After read mapping, a

potential methylated site from all the aligned short reads can be summa-

rized, each having the same genomic location, that is, summarizing them

on one row: counting how many methylated and how many unmethylated

from all reads at the same site. Figure 8.1 exhibits the flowchart of how the

procedures are performed.

For the methylation pipeline presented in Figure 8.1, Bismark is applied

and is used together with Bowtie in this flowchart. The working procedure

of Bismark begins with read conversion, in which the sequence reads are

first transformed into completely bisulfite-converted forward (C->T) and

its cognate reverse read (G->A conversion of the reverse strand) versions,

before they are aligned to similarly converted versions of the genome

(also C->T and G->A converted). Bismark aligns all four possible align-

ments for each read and pick the best alignment, that is, sequence reads

that produce a unique best alignment from the four alignment processes

against the bisulfite genomes (which are running in parallel) are then

compared to the normal genomic sequence, and the methylation state of

all cytosine positions in the read is inferred. For use with Bowtie1, a read

is considered to align uniquely if a single alignment exists that has with

fewer mismatches to the genome than any other alternative alignment if

any. For Bowtie2, a read is considered to align uniquely if an alignment

has a unique best alignment score. If a read produces several alignments

with the same number of mismatches or with the same alignment score,

Converted genome

(C -> T.G -> A)

Quality

control

Bowtie

Methylation

calling

Output

Short reads

SAM

FASTQ

Bismark

Aligned reads

Sequencing

Downstream analysis

Human genome

machine

FIGURE 8.1 Flowchart of DNA methylation data analysis.

Methylome-Seq Data Analysis   ◾  135

a read (or a read-pair) is discarded altogether. Finally, Bismark output its

calling results in SAM format with several new extended fields added and

also throw away a few fields from original Bowtie output.

After methylation calling on every sites detected, we need to deter-

mine methylation status based on a population of the same type of cells

or short reads on each cytosine sites. There will be two alternative statuses

to appear on each site: either methylated or unmethylated due to random

errors for various reasons, see a demonstration in Figure 8.2a. Therefore,

statistical method is needed to determine if a site is really methylated or

not. Figure 8.2b demonstrates this scenario. Although bisulfite treatment

is used to check if a base C is methylated or not, there are a lot of reasons

that may give different outcomes, and we want to statistically test which

outcome is the dominant one and conclude a true methylation status on

each site. In Figure 8.2a, there are two CpG sites in the DNA sequence,

the first C is methylated and not converted after bisulfite treatment as in

highlighted area, the second C is not methylated and it is converted to T.

Therefore, after bisulfite treatment, all sites with methylated cytosine are

most likely not impacted, whereas unmethylated Cs are most probably

A

A

T

T

C

C Methylated site

Bisulfite treatment

G

G

T

T

T

T

C

T Unmethylated site

G

G

T

T

(a)

A population

of short reads

(b)

FIGURE 8.2 Population of short reads in DNA methylation. (a) Bisulphite treat-

ment of a short read. (b) A population of treated short reads.

136 ◾ Big Data Analysis for Bioinformatics and Biomedical Discoveries converted to Ts.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
The Patient Will See You Now by Eric Topol(1210)
Undoctored: Why Health Care Has Failed You and How You Can Become Smarter Than Your Doctor by Davis William(997)
The Creative Destruction of Medicine by Eric Topol(780)
Big Data Analysis for Bioinformatics and Biomedical Discoveries by Ye Shui Qing(731)
Participatory Healthcare by Jan Oldenburg(698)
Mistreated: Why We Think We're Getting Good Health Care and Why We're Usually Wrong by Robert Pearl(667)
Secondary Analysis of Electronic Health Records by MIT Critical Data(616)
Biomedical Informatics by Edward H. Shortliffe & James J. Cimino(603)
Wheat Belly (Revised and Expanded Edition) by William Davis(596)
Algorithms for Data Science by Brian Steele John Chandler & Swarna Reddy(454)
Fundamentals of Clinical Data Science by Pieter Kubben & Michel Dumontier & Andre Dekker(435)
Energy-Efficient Algorithms and Protocols for Wireless Body Sensor Networks by Rongrong Zhang & Jihong Yu(420)
Augmented Reality Art by Vladimir Geroimenko(418)
Second International Handbook of Internet Research by Unknown(405)
Clinical Informatics Board Review and Self Assessment by Scott Mankowitz(403)
Cyberphysical Systems for Epilepsy and Related Brain Disorders by Nikolaos S. Voros & Christos P. Antonopoulos(390)
Digital Health by Homero Rivas & Katarzyna Wac(388)
Blockchain Technology for Smart Cities by Dhananjay Singh & Navin Singh Rajput(381)
Information Retrieval: A Biomedical and Health Perspective by William Hersh(378)
Comprehensive Healthcare Simulation: Mastery Learning in Health Professions Education by William C. McGaghie & Jeffrey H. Barsuk & Diane B. Wayne(359)