Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

fkrueger replied

09-12-2013, 05:48 AM
The data used as a test data set is from the 2009 Lister et al paper, the reads were not specifically trimmed for adapters but just shortened to 50bp. Still, a lot of the reads suffer from poor quality sequence (as was the norm back in those days) and possibly adapter contamination. I am sure if you would remove them you would also see an increased mapping efficiency. If you follow this QC and trimming guide you should see fairly good results for your application (very long reads might need some specific attention though, e.g. using Bowtie2 for mapping).

The test dataset is meant as a quick test that the program runs correctly after installation, and was not intended to showcase a staggeringly high mapping efficiency of Bisulfite-Seq in general .
Leave a comment:
litc replied

09-11-2013, 06:58 PM
sorry fkrueger, I have made an wrong conclusion for writing a wrong sequence of OB(-) and OB(+)

in the my above post, the OB(-) should be TTAGTGT, OB(+) should be ACACTAA.
OB(C>T) ATATTAA
OB(G>A) ACACTAA, can map to Genome(G>A),

so in this case, all the stand(OT, OB) can theoretically map to either Genome(C>T) or Genome(G>A) no matter whether there was(were) methylated base(s) in the original strands. so the mapping efficiency of BS-seq can not be too low.

I finally found out the reason why my data's mapping efficiency is 0.1%， my data is 250PE, it is the adapter in the last part of the read that cause the failure of mapping. After trimming the reads to 50bp, it can map to 76%. But I don't know why the Bismark test dataset(http://www.bioinformatics.babraham.a...d.html#bismark) be with a low mapping efficiency of 47.6%, it make me confusion and give me an impression that the BS-seq's mapping efficiency is low.
Leave a comment:
litc replied

09-11-2013, 06:06 PM
Thank fkrueger, you are very kind. I want to know the reason deeply.

here is an example:
genome sequence is: ACGCTGA
the real sample's sequence is:
ACGCTGA
TGCGACT
the Red"C" is methylated base.

then:
Genome(C>T) is ATGTTGA
Genome(G>A) is ACACTAA
OT(+) is ACGTTGA
OB(-) is TGTGATT
OB(+) is AATCACA

In the directional library, both OT and OB strand can be sequenced.
OT(C>T) ATGTTGA, which can be map to Genome(C>T)
OB(C>T) AATTATA, can not be map to Genome(C>T) or Genome(G>A)
OB(G>A) AATCACA, can not be map to Genome(C>T) or Genome(G>A)

so, in this example, only OT can be aligned, OB can not, so is this the problem of low mapping efficiency for BS-seq?

Last edited by litc; 09-11-2013, 06:07 PM. Reason: formatting for easy reading.
Leave a comment:
fkrueger replied

09-11-2013, 01:01 AM
The mapping efficiency for very short bisulfite converted sequences is substantially lower than for 'normal' sequencing, but for read lengths of 40bp or longer the difference is only a few percent. Fig. 2a of this review compares the mapping efficiencies of BS-Seq vs. normal alignments as a function of read length.

0.1% mapping efficiency sounds very very low, this is already something you would probably see if you aligned sequences to a wrong genome ... (e.g. human/mouse).
Leave a comment:
litc replied

09-10-2013, 11:19 PM
I want to know about mapping efficiency of bisulfite-sequencing, I have tested the test data(Bismark test dataset on http://www.bioinformatics.babraham.a...d.html#bismark), it's mapping efficiency is of 47.6%, also,my own bisulfite-sequencing data with mapping efficiency of 0.1%(this may be caused by mostly lab stuff's wrong protocol).

I want to know if the mapping efficiency of bisulfite-sequencing is lower than other normal sequencing? Can every template's C>T version and G>A version of OT stand and OB stand map to Geneome(C>T) and Genome(G>A)?
Leave a comment:
fkrueger replied

09-03-2013, 04:37 AM
Originally posted by frozenlyse View Post

Ah sure, that makes sense - I may test it out for myself on the unaligned read from a cancer cell line with some known translocations and see if anything falls apart - if I get around to testing it I'll let you know how it goes.

Great, I'll be interested to hear about the outcome!

Last edited by fkrueger; 09-11-2013, 12:54 AM. Reason: typo
Leave a comment:
frozenlyse replied

09-03-2013, 04:36 AM
Ah sure, that makes sense - I may test it out for myself on the unaligned read from a cancer cell line with some known translocations and see if anything falls apart - if I get around to testing it I'll let you know how it goes.
Leave a comment:
fkrueger replied

09-03-2013, 04:30 AM
Hi Aaron,
I have to admit that I haven't spent any time thinking about whether it would be possible or if it would be difficult to allow these settings. I would imagine that just enabling these options in the code would probably lead to some other part failing in some way, even though it is difficult to predict how. This is something that sounds very straight forward to implement, but might turn out to be surprisinglly difficult ...
Leave a comment:
frozenlyse replied

09-03-2013, 04:19 AM
Hi,

I was wondering about using WGBS data for structural variant prediction - according to the bismark manual, the bowtie2 paired end options --no-mixed and --no-discordant are always set on - is there any way of disabling this apart from editing the source code? Perhaps change these options to --allow-mixed and --allow-discordant so that the default behaviour does not change? It seems a bit odd to have options which impossible to turn off!
Cheers,
Aaron
Leave a comment:
dpryan replied

08-27-2013, 07:17 AM
Offhand, I can't think of any application where this would cause a problem. With genome viewers, you need to coordinate sort anyway and the pairing isn't done at the read-name level (there's no fast index for querying the position of reads in BAM files by name).
Leave a comment:
oria34 replied

08-27-2013, 12:33 AM
Hi all,

Running the last version of Bismark and focusing on the name of the reads (we discuss about that a while ago here) I have found that the names of the pairs are no longer /1 & /2. In my case both member of a pair are names "...../1" & "....../1".

I know it doesn't matter too much since Bismark do the methylation call properly but I was wondering whether it can interfere with other downstream applications or genome viewers.

Left alignment
----------------------
Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
Location = groupXXI:767
Alignment start = 756 (+)
Cigar = 99M
Mapped = yes
Mapping quality = 255
----------------------
Base = A
Base phred quality = 39
----------------------
Pair start = groupXXI:900 (-)
Pair is mapped = yes
Insert size = 242
Pair orientation = F2R1
----------------------
Second in pair
-------------------
XG = GA
NM = 16
XM = ...........x.............h..........x...xh......hh..xh..........x.h...
x........h..x.....x.h........

XR = GA
XX = 11G13G10G3GG6GG2GG10G1G3G8G2G5G1G8
-------------------Right alignment
----------------------
Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
Location = groupXXI:767
Alignment start = 900 (-)
Cigar = 98M
Mapped = yes
Mapping quality = 255
----------------------
----------------------
Pair start = groupXXI:756 (+)
Pair is mapped = yes
Insert size = -242
Pair orientation = F2R1
----------------------
First in pair
-------------------
XG = GA
NM = 19
XM = ........x..hh..x.Z..xh....x.....xh.....h.........Z...xh......x........
....xh..xh.......x.....Z....

XR = CT
XX = 8G2GG2G4GG4G5GG5G13GG6G12GG1TGG7G10
-------------------
Leave a comment:
fkrueger replied

08-24-2013, 12:24 PM
Seems like a point worthy addressing... But now I shall focus on holiday!
Leave a comment:
gerald2545 replied

08-24-2013, 12:12 PM
jute a follow-up, the same file took 1500 minutes to sort with -k3,3 parameter

gerald
Leave a comment:
gerald2545 replied

08-23-2013, 06:28 AM
Thank you Felix for your time, but don't forget : you are on holidays

Gérald
Leave a comment:
fkrueger replied

08-23-2013, 05:37 AM
Sorting by chromosome in addition to the position might indeed be a relict of former versions of the script back when files weren't sorted into individual chromosome files. I'll take a look at this once I am back, but for the moment you should be fine just deleting the -k 3,3 from the sort command.
Leave a comment:

Previous 1 14 21 22 23 24 25 26 27 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News