Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • akramdi
    replied
    Hi Felix,

    Thanks for the quick reply,

    So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files
    This statement clears things up for me. To make things clearer, here are the cases I was wondering about when using Bismark and how I see things now:

    - If AS and XS are equally good ==> Read is considered ambiguous because Bismark does not report random alignments for the reason you stated.
    - If AS is better than XS ==> Read is not considered ambiguous because, even though it produces multiple alignments, the reported one is the best and the rest are not equally good. Right?
    - If XS is better than AS ==> not sure whether that's possible because I would expect the reported alignment to be the best (unless the read is part of a concordantly-aligned pair, says Bowtie2 manual). How do you deal with this case?

    Good to know about "--ambig_bam" option, I've been working with an older version of Bismark so I didn't see it.. I've updated now.

    Leave a comment:


  • fkrueger
    replied
    Hi Amira,

    I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?

    >> Just generally, if a read aligns with Bowtie2 it will get an alignment score (the AS:i field):

    Code:
    AS:i:<N>	Alignment score. Can be negative. Only present if SAM record is for an aligned read.
    If a read has a second alignment, the alignment score of the second best alignment is reported as well:

    Code:
    XS:i:<N>	Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.

    So if a read has an alignment which is at least equally good as the best alignment, the read is considered ambiguous, and as such written out to the ambiguous FastQ files. In Bismark we do not want to go down the route of reporting a random alignment if there are several different (equally good) alignments, because if we can’t be sure where a read came from we also don’t want to assign a methylation status to that region. If you are interested in seeing where in the genome a read aligned to you might want to consider using the option --ambig_bam (even if this this procedure won't give you any methylation information).

    Does this help clearing things up?

    Leave a comment:


  • akramdi
    replied
    Hi Felix,

    I'm a bit confused about the reads reported as "ambiguous" by Bismark, which made me doubt my understanding of how Bismark reports alignments (using Bowtie2).

    So to my understanding, Bismark does not use -a mode nor -k N mode, it uses Bowtie2 default mode: "search for multiple alignments, report the best one" and how hard to looks for alignments is determined by the effort options (-D,-R). So reads that have multiple distinct alignments are reported once: their best alignment is reported or a randomly chosen alignment when equally-good choices are found.

    On the other hand, ambiguous reads are defined as:

    --ambiguous Write all reads which produce more than one valid alignment with the same number of lowest
    mismatches or other reads that fail to align uniquely to a file in the output directory.
    Written reads will appear as they did in the input, without any of the translation of quality
    values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two
    parallel files with _1 and _2 inserted in theit filenames, i.e. _ambiguous_reads_1.txt and
    _ambiguous_reads_2.txt. These reads are not written to the file specified with --un.
    I think that reads that produce more that one valid alignment should be reported based on Bowtie2 default mode (not all alignments, only one). How does having the same number of lowest mismatches affect how these reads are reported?
    Same with reads that fail to align uniquely, I think they should be reported once (by the way, to me, failing to align uniquely is equivalent to producing more than one valid alignment..)

    Could you please correct my understanding of how Bismark reports alignments and anything I wrote that might be off..

    Thanks a lot!!

    Amira

    Leave a comment:


  • twotwo
    replied
    Thanks fkrueger!

    Leave a comment:


  • fkrueger
    replied
    Originally posted by twotwo View Post
    Hi, fkrueger,
    Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.
    We tend to use SeqMonk for this purpose (https://www.bioinformatics.babraham....jects/seqmonk/). It is a very fast and powerful genome viewer and quantitation tool. Some examples can be found in the documentation of this training course: https://www.bioinformatics.babraham....ing.html#bsseq.

    Leave a comment:


  • twotwo
    replied
    Hi, fkrueger,
    Is there any annotation file for the methylation data? Like after I do the alignment, I can get a probe like: chr, start position.... How can I know which gene is it from? Thank you.

    Leave a comment:


  • fkrueger
    replied
    Erm, what would you like to compare exactly? SeqMonk can certainly do a number of things, I suggest you follow the guidelines and practical of this methylation analysis course: http://www.bioinformatics.babraham.a...ing.html#bsseq.

    Cheers, Felix

    Leave a comment:


  • twotwo
    replied
    Hi, fkrueger,
    If I want to compare paired sample (one vs one). Can I do it with seqmonk in unix? Like using some unix command, and obtain a table with p-value per probe?

    Leave a comment:


  • Juulluu21
    replied
    We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

    Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

    Both group was sequenced in paired-end fashion.

    I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

    C methylated in CpG context: 0.6%

    C methylated in CHG context: 0.5%

    C methylated in CHH context: 0.7%

    Whereas after running Bismark on my Control files I got these percentages:

    C methylated in CpG context: 99.6%

    C methylated in CHG context: 99.3%

    C methylated in CHH context: 99.9%

    So, how would I interpret my data?

    a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

    b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

    c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

    Thank you very much for reading this long post!!

    Bests!!!

    Leave a comment:


  • Juulluu21
    replied
    We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

    Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

    Both group was sequenced in paired-end fashion.

    I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

    C methylated in CpG context: 0.6%

    C methylated in CHG context: 0.5%

    C methylated in CHH context: 0.7%

    Whereas after running Bismark on my Control files I got these percentages:

    C methylated in CpG context: 99.6%

    C methylated in CHG context: 99.3%

    C methylated in CHH context: 99.9%

    So, how would I interpret my data?

    a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

    b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

    c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

    Thank you very much for reading this long post!!

    Bests!!!

    Leave a comment:


  • Juulluu21
    replied
    Data Analysis with Bismark

    We have sequenced a genome using Illumina's True-seq bisulfite sequencing kit. After getting back the seq, we are analyzing methylation rate using Bismark. I Need help with the interpretation of the result and proper way of normalization.

    Before sequencing: Sample DNA was divided into 2 groups: 1. Bisulfite treatment was carried out and DNA was subsequently sequenced (group 1, methylated group) 2. DNA was sequenced without bisulfite treatment (group 2, control group)

    Both group was sequenced in paired-end fashion.

    I am using Bismark to analyze the seq and trying to get the methylation rate in this particular genome. After running Bismark on Methylated files I got this finale percentages:

    C methylated in CpG context: 0.6%

    C methylated in CHG context: 0.5%

    C methylated in CHH context: 0.7%

    Whereas after running Bismark on my Control files I got these percentages:

    C methylated in CpG context: 99.6%

    C methylated in CHG context: 99.3%

    C methylated in CHH context: 99.9%

    So, how would I interpret my data?

    a. Is 0.6 % (CpG) the actual methylation percentage in my genome?

    b. I have found in some literatures that if CpG, CHG, and CHH percentages are very close, that means that genome actually does not do methylation. Is it true?

    c. What was the purpose of using the control group (group 2)? Do I still need any spike-in control to normalize the data? If so, what that could be?

    Thank you very much for reading this long post!!

    Bests!!!

    Leave a comment:


  • twotwo
    replied
    Thank you very much!

    Leave a comment:


  • fkrueger
    replied
    I am not quite sure if I understand your question here to be honest.

    Code:
    chr11 113509 113509 100 4 0
    This example line means that for the position 113509 on chromosome 11 you had 4 methylation calls in total that were methylated (in the entire dataset), and 0 calls that were unmethylated. This translates into a 100% methylation percentage at this position (column 4). Also, the positions here are simply cytosines in the genome but not SNP.

    Just to remind you this this the format:

    Code:
    The coverage output looks like this (tab-delimited, 1-based genomic coords):
    ============================================================================================================================================
    
    <chromosome>  <start position>  <end position>  <methylation percentage>  <count methylated>  <count non-methylated>
    I hope this helps.

    Leave a comment:


  • twotwo
    replied
    Originally posted by fkrueger View Post
    You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

    To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

    Best, Felix
    Hi, Felix,
    Thanks for your quick answer. Here is the head of the coverage file. Does that mean that I should merge the data (get all the information for one SNP) and get the methylation percentage?


    chr11 110190 110190 100 1 0
    chr11 110212 110212 100 2 0
    chr11 113465 113465 100 1 0
    chr11 113509 113509 100 4 0
    chr11 113510 113510 100 1 0
    chr11 113525 113525 100 2 0
    chr11 113526 113526 100 1 0
    chr11 123421 123421 100 1 0
    chr11 123450 123450 100 1 0
    chr11 123849 123849 100 5 0

    Leave a comment:


  • fkrueger
    replied
    You should probably look at the coverage file because this will also tell you how many counts you saw methylated or unmethylated. If you see 100% then I would suspect you saw only a single call for this position, which in this case happened to be methylated.

    To compare different samples we tend to use SeqMonk, a lightweight but fast and powerful genome browser and analysis tool. Here are some presentations to about what methylation analysis in SeqMonk looks like. https://www.bioinformatics.babraham....ing.html#bsseq

    Best, Felix

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
31 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
33 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X