Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

gene_x replied

09-11-2014, 07:51 AM
Originally posted by fkrueger View Post

Of you can, just use the Bismark files as they are generated. If you have to
using samtools sort -n should do the trick as well.

OK. I'm running that now.

Another question, I'm not sure if my previous run without doing any sorting was correct.

Here is my command

Code:

samtools merge input.bam plate1/plate1_all_sort.bam plate2/plate2_all_sort.bam deduplicate_bismark -p --bam input.bam bismark_methylation_extractor -p --no_overlap --ignore 3 --ignore_r2 3 --bedGraph --counts --zero_based --report input.deduplicated.bam 2> input.meth_extractor_log.txt

The resulting input.deduplicated.zero.cov file have methylation coverage on bases that are not Cs. Here are the top rows:

chr2 133 134 0 0 8
chr2 134 135 0 0 1
chr2 228 229 0 0 9
chr2 229 230 0 0 2
chr2 262 263 0 0 15
chr2 263 264 0 0 2
chr2 304 305 0 0 13
chr2 305 306 0 0 1
chr2 316 317 0 0 11
chr2 317 318 0 0 1
chr2 318 319 0 0 12
chr2 319 320 0 0 1
chr2 326 327 0 0 11

Do you have any suggestions what could go wrong?
Leave a comment:
fkrueger replied

09-11-2014, 07:45 AM
Of you can, just use the Bismark files as they are generated. If you have to
using samtools sort -n should do the trick as well.
Leave a comment:
gene_x replied

09-11-2014, 07:36 AM
Originally posted by fkrueger View Post

Indeed, you should not sort the files by coordinates at all before running the deduplication.

Should I sort it by read names using -n in samtools sort just like I listed before doing deduplication? Or should I not sort it at all?

Last edited by gene_x; 09-11-2014, 07:43 AM.
Leave a comment:
fkrueger replied

09-11-2014, 05:47 AM
Indeed, you should not sort the files by coordinates at all before running the deduplication.
Leave a comment:
gene_x replied

09-11-2014, 05:41 AM
Originally posted by fkrueger View Post

It should be generally possible to merge several bam files before running the extractor without a problem, and files do not have to be sorted for this. You only need to ensure that read 1 and read 2 always follow each other in the bam file, else you would run into problems with stand identity and the no_overlap function. Some merging tools do not guarantee this order, in which case it might help to sort the reads by read id to team pairs up again. I'm currently at a conference so I'm somewhat slow replying.

I indeed run into problem after sorting bam files based on chromosomal coordinates.. the methylation extractor report an error saying that read1 and read2 are not the same and suggested me to use an unsorted bam..

I then tried to sort the deduplicated bam with the command

Code:

samtools sort -n deduplicated.bam deduplicated_sort

And then run methylation extractor on the deduplicated_sort.bam, however, the same error message came up again and methylation extractor couldn't proceed.. Is it because after deduplication, read1 and read2 within one bam file don't always match even after I sort them based on read names? Should I not sort anything from the beginning at all?

Thanks!
Leave a comment:
fkrueger replied

09-11-2014, 03:46 AM
Originally posted by gene_x View Post

Hi, Felix,
I have another question about the alignment files. I just checked the bam file after running bismark alignment and it is unsorted. Then using the bam output to do deduplication, the resulting deduplicated.bam is also unsorted. My first question is deduplication step doesn't require the input bam to be sorted?

I was trying to merge several deduplicated.bam files from different lanes into one big final bam and run methylation extractor on it and the resulting coverage file (.zero.cov with --zero_based specified) has methylation calls on non-Cs locations like chr1 133 134. That's how I traced it back to the issue of unsorted bam..

So it's probably better to implement sorting into the bismark pipeline?

It should be generally possible to merge several bam files before running the extractor without a problem, and files do not have to be sorted for this. You only need to ensure that read 1 and read 2 always follow each other in the bam file, else you would run into problems with stand identity and the no_overlap function. Some merging tools do not guarantee this order, in which case it might help to sort the reads by read id to team pairs up again. I'm currently at a conference so I'm somewhat slow replying.
Leave a comment:
dpryan replied

09-11-2014, 01:41 AM
Originally posted by AndrewMartins

Your post is about a tool that can be used for to would be completely obscured because these would likely happen at various positions within the read. i can told you for use another site of tool its mainly supply pipeline on customers.

Gotta love incoherent spam
Leave a comment:
gene_x replied

09-10-2014, 08:30 AM
sorted bam

Hi, Felix,
I have another question about the alignment files. I just checked the bam file after running bismark alignment and it is unsorted. Then using the bam output to do deduplication, the resulting deduplicated.bam is also unsorted. My first question is deduplication step doesn't require the input bam to be sorted?

I was trying to merge several deduplicated.bam files from different lanes into one big final bam and run methylation extractor on it and the resulting coverage file (.zero.cov with --zero_based specified) has methylation calls on non-Cs locations like chr1 133 134. That's how I traced it back to the issue of unsorted bam..

So it's probably better to implement sorting into the bismark pipeline?
Leave a comment:
christophe replied

09-08-2014, 01:02 PM
Originally posted by adeirossi View Post

Hi Christophe,

I have also seen a falling number of methylation calls for read 2 in my data. I have always assumed this was because I ran bismark_methylation_extractor with the --no_overlap option and many of the read pairs had overlaps (i.e. insert length less than twice the read length). With --no_overlap, when the extractor encounters an overlap between read 1 and read 2, I believe it uses the read 1 calls only and discards the read 2 calls, which would explain why the number of read 2 calls decreases as you go toward its 3' end. Because this "no overlap" filtering occurs after mapping, I assume there is no impact on the alignment of read 2.

If, on the other hand, you are not using the --no_overlap option or you don't have many reads pairs with inserts <200 bp, then I have no idea what's going on.

Andrew

Hi Andrew and Felix,

Sorry for this late message.

Thanks Andrew for your reply to my question regarding M-bias.
I effectively ran the Extractor with the --no_overlap option on before. As suggested I just rerun Bismark Methyl Extractor without the --no_overlap option and it worked. I do not see the number of calls falling anymore for CHH and CHG.

Thanks Felix for confirming Andrew's answer. Yes, you are right, the CpG calls was also falling slightly but better now.

I also noticed that the R2 calls counts are more heterogeneous across the read length than the number of calls across R1. I do not know if anyone else noticed that.

Thanks,
Best,
Christophe
Attached Files

M.bismark_bt2_pe.deduplicated.M-bias_R1.png (7.1 KB, 41 views)

M.bismark_bt2_pe.deduplicated.M-bias_R2.png (8.2 KB, 32 views)
Leave a comment:
fkrueger replied

09-04-2014, 12:38 PM
Originally posted by gene_x View Post

You mean "This should affect all C contexts and not only CHH?
How do you click the CHH and CHG away? I can only zoom in certain bases...

If you click on e.g. 'CHH total calls' in the legend that track will disappear/show again and the scale adjust accordingly.

Can you help me clarify one thing? In the M-bias plot for read2, is position 1 the start of the read2? I'm a bit confused about the orientation of the read2. I thought in M-bias plot, position 1 is the the end of read2 (after read 2 is reverse complemented)? probably my understanding was wrong...

The first position does indeed correspond to the first base or read 2 as it is read by the sequencer. If this was the reverse complemented end of the sequenced fragment most of the artefacts would be completely obscured because these would likely happen at various positions within the read.

Also, in M-bias plot for read2, the first 2 bases have much lower average CpG methylation, that suggests they should be ignored in the downstream analysis? I could use the -ignore_r2 option in methylation extractor to do it?
Thanks!

The first positions of read 2 are the result of carrying out the end-repair/A-tailing procedure with unmethylated cytosines. Sonication is quite likely to leave lagging ends, so these would be filled in with unmethylated Cs (of course only at cytosine positions) which would subsequently be converted to Ts in the bisulfite reaction. If you then sequence Read 2, which is the reverse complement of such a T, it would appear as A and be called as unmethylated. As you rightly said this can be remedied by using --ignore_r2 2.
Leave a comment:
gene_x replied

09-04-2014, 08:56 AM
Originally posted by fkrueger View Post

It is perfectly normal to see a decline of total calls towards the end of read 2, see an example here. This should affect all C contexts and only CHH, if you click the CHH and CHG away in the linked graph you will see it much better because the higher number of CHH calls obscures the other contexts somewhat.

You mean "This should affect all C contexts and not only CHH?
How do you click the CHH and CHG away? I can only zoom in certain bases...

The reason for this is simply that the --no_overlap function looks for overlaps only for Read 2 (since historically the basecall qualities for Read 1 were often better), so as soon as read 2 overlaps with already covered positions its methylation calls are being ignored. Again this is fine, because overlapping positions have already been extracted for Read 1 and are thus not lost.

Can you help me clarify one thing? In the M-bias plot for read2, is position 1 the start of the read2? I'm a bit confused about the orientation of the read2. I thought in M-bias plot, position 1 is the the end of read2 (after read 2 is reverse complemented)? probably my understanding was wrong...

Also, in M-bias plot for read2, the first 2 bases have much lower average CpG methylation, that suggests they should be ignored in the downstream analysis? I could use the -ignore_r2 option in methylation extractor to do it?

Thanks!
Leave a comment:
fkrueger replied

09-04-2014, 01:14 AM
Good idea, shall have that for the next release.
Leave a comment:
dpryan replied

09-04-2014, 01:12 AM
Out of curiosity, is there a reason that --no_overlap is an option rather than just being the default behaviour? Not using it would result in incorrect metrics/statistics, so I would think that defaulting to not doing that would make it rather easy for people to shoot themselves in the foot.
Leave a comment:
fkrueger replied

09-04-2014, 01:01 AM
It is perfectly normal to see a decline of total calls towards the end of read 2, see an example here. This should affect all C contexts and only CHH, if you click the CHH and CHG away in the linked graph you will see it much better because the higher number of CHH calls obscures the other contexts somewhat.

The reason for this is simply that the --no_overlap function looks for overlaps only for Read 2 (since historically the basecall qualities for Read 1 were often better), so as soon as read 2 overlaps with already covered positions its methylation calls are being ignored. Again this is fine, because overlapping positions have already been extracted for Read 1 and are thus not lost.
Leave a comment:
gene_x replied

09-03-2014, 07:55 PM
no_overlap

hi, Felix,

I just tried bismark on PE bisulfite seq data and I initially didn't use the no_overlap option and the M-bias plot looks like OK. However, with no_overlap option, there is a drop of CHH total calls at the end of the reads. I'm not sure why this would happen and how to interpret the pattern. Do you think there is something wrong with the library, any ideas?

Without no_overlap

With no_overlap
Leave a comment:

Previous 1 6 13 14 15 16 17 18 19 26 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News