Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

akramdi replied

10-03-2017, 07:54 AM
Hi Felix,

Thanks for the quick reply,

So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files

This statement clears things up for me. To make things clearer, here are the cases I was wondering about when using Bismark and how I see things now:

- If AS and XS are equally good ==> Read is considered ambiguous because Bismark does not report random alignments for the reason you stated.
- If AS is better than XS ==> Read is not considered ambiguous because, even though it produces multiple alignments, the reported one is the best and the rest are not equally good. Right?
- If XS is better than AS ==> not sure whether that's possible because I would expect the reported alignment to be the best (unless the read is part of a concordantly-aligned pair, says Bowtie2 manual). How do you deal with this case?

Good to know about "--ambig_bam" option, I've been working with an older version of Bismark so I didn't see it.. I've updated now.
Leave a comment:
fkrueger replied

10-03-2017, 01:51 AM
Hi Amira,

I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?

>> Just generally, if a read aligns with Bowtie2 it will get an alignment score (the AS:i field):

Code:

AS:i:<N> Alignment score. Can be negative. Only present if SAM record is for an aligned read.

If a read has a second alignment, the alignment score of the second best alignment is reported as well:

Code:

XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.

So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files. In Bismark we do not want to go down the route of reporting a random alignment if there are several different (equally good) alignments, because if we can’t be sure where a read came from we also don’t want to assign a methylation status to that region. If you are interested in seeing where in the genome a read aligned to you might want to consider using the option --ambig_bam (even if this this procedure won't give you any methylation information).

Does this help clearing things up?
Leave a comment:
akramdi replied

10-02-2017, 08:15 AM
Hi Felix,

I'm a bit confused about the reads reported as "ambiguous" by Bismark, which made me doubt my understanding of how Bismark reports alignments (using Bowtie2).

So to my understanding, Bismark does not use -a mode nor -k N mode, it uses Bowtie2 default mode: "search for multiple alignments, report the best one" and how hard to looks for alignments is determined by the effort options (-D,-R). So reads that have multiple distinct alignments are reported once: their best alignment is reported or a randomly chosen alignment when equally-good choices are found.

On the other hand, ambiguous reads are defined as:

--ambiguous Write all reads which produce more than one valid alignment with the same number of lowest
mismatches or other reads that fail to align uniquely to a file in the output directory.
Written reads will appear as they did in the input, without any of the translation of quality
values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two
parallel files with _1 and _2 inserted in theit filenames, i.e. _ambiguous_reads_1.txt and
_ambiguous_reads_2.txt. These reads are not written to the file specified with --un.

I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?
Same with reads that fail to align uniquely, I think they should be reported once (by the way, to me, failing to align uniquely is equivalent to producing more than one valid alignment..)

Could you please correct my understanding of how Bismark reports alignments and anything I wrote that might be off..

Thanks a lot!!

Amira
Leave a comment:
twotwo replied

08-08-2017, 05:53 AM
Thanks fkrueger!
Leave a comment:
fkrueger replied

08-08-2017, 02:53 AM
Originally posted by twotwo View Post

Hi, fkrueger,
Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.

We tend to use SeqMonk for this purpose (https://www.bioinformatics.babraham....jects/seqmonk/). It is a very fast and powerful genome viewer and quantitation tool. Some examples can be found in the documentation of this training course: https://www.bioinformatics.babraham....ing.html#bsseq.
Leave a comment:
twotwo replied

08-07-2017, 01:30 PM
Hi, fkrueger,
Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.
Leave a comment:
fkrueger replied

07-28-2017, 12:58 AM
Erm, what would you like to compare exactly? SeqMonk can certainly do a number of things, I suggest you follow the guidelines and practical of this methylation analysis course: http://www.bioinformatics.babraham.a...ing.html#bsseq.

Cheers, Felix
Leave a comment:
twotwo replied

07-27-2017, 12:27 PM
Hi, fkrueger,
If I want to compare paired sample (one vs one). Can I do it with seqmonk in unix? Like using some unix command, and obtain a table with p-value per probe?
Leave a comment:
Juulluu21 replied

06-28-2017, 09:40 PM
We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

Both group was sequenced in paired-end fashion.

I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

C methylated in CpG context: 0.6%

C methylated in CHG context: 0.5%

C methylated in CHH context: 0.7%

Whereas after running Bismark on my Control files I got these percentages:

C methylated in CpG context: 99.6%

C methylated in CHG context: 99.3%

C methylated in CHH context: 99.9%

So, how would I interpret my data?

a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

Thank you very much for reading this long post!!

Bests!!!
Leave a comment:
Juulluu21 replied

06-28-2017, 08:56 PM
We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

Both group was sequenced in paired-end fashion.

I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

C methylated in CpG context: 0.6%

C methylated in CHG context: 0.5%

C methylated in CHH context: 0.7%

Whereas after running Bismark on my Control files I got these percentages:

C methylated in CpG context: 99.6%

C methylated in CHG context: 99.3%

C methylated in CHH context: 99.9%

So, how would I interpret my data?

a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

Thank you very much for reading this long post!!

Bests!!!
Leave a comment:
Juulluu21 replied

06-28-2017, 08:50 PM
Data Analysis with Bismark

We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

Both group was sequenced in paired-end fashion.

I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

C methylated in CpG context: 0.6%

C methylated in CHG context: 0.5%

C methylated in CHH context: 0.7%

Whereas after running Bismark on my Control files I got these percentages:

C methylated in CpG context: 99.6%

C methylated in CHG context: 99.3%

C methylated in CHH context: 99.9%

So, how would I interpret my data?

a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

Thank you very much for reading this long post!!

Bests!!!
Leave a comment:
twotwo replied

05-12-2017, 12:43 PM
Thank you very much!
Leave a comment:
fkrueger replied

05-12-2017, 12:33 PM
I am not quite sure if I understand your question here to be honest.

Code:

chr11 113509 113509 100 4 0

This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

Just to remind you this this the format:

Code:

The coverage output looks like this (tab-delimited, 1-based genomic coords): ============================================================================================================================================ <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count non-methylated>

I hope this helps.
Leave a comment:
twotwo replied

05-12-2017, 10:03 AM
Originally posted by fkrueger View Post

You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix

Hi, Felix,
Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?

chr11 110190 110190 100 1 0
chr11 110212 110212 100 2 0
chr11 113465 113465 100 1 0
chr11 113509 113509 100 4 0
chr11 113510 113510 100 1 0
chr11 113525 113525 100 2 0
chr11 113526 113526 100 1 0
chr11 123421 123421 100 1 0
chr11 123450 123450 100 1 0
chr11 123849 123849 100 5 0
Leave a comment:
fkrueger replied

05-12-2017, 07:32 AM
You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

Best, Felix
Leave a comment:

Previous 1 2 3 4 5 12 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News