Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seems like there are a lot of pipelines out there that get big differences in methylation results. Is there a paper out that that shows a complete workflow and has control data and output results I can compare?

    I can get bismark to work but I'm doing epigenetic sequence capture and want to know my results are in line with industry standard methods.

    Comment


    • How do you handle control DNA?

      I have Bisulfite Sequencing data with lambda control (NC_001416) spike in.

      I'm looking at the alignment files (SAM) and I see there are reads aligned to the lambda (NC_001416) reference.

      So how do I separate methyl stats of these two samples? bismark_methylation_extractor seems to give me only one set of statistics and doesn't differentiate the control group from the sample group.

      I don't see any guidance in the manual or tutorials.

      This command is not ideal: --split_by_chromosome
      Last edited by Jon17; 09-30-2016, 07:18 AM.

      Comment


      • You could either align the to the lambda sequence by itself which should be really quick (probably several hundred million sequences/hour), or if you have the output already I would probably just filter for reads containing the Lambda sequence (NC_001416) and then run the bismark_methylation_extractor on that new SAM file. A command like this should get you the alignments:

        cat current_alignment_file.sam | grep NC_001416 > lambda_alignments.sam

        bismark_methylation_extracor --bedGraph --gzip lambda_alignments.sam

        If you are starting from BAM files the commands need to include Samtools but the idea is the same. I hope this helps. Cheers, Felix

        Comment


        • How does Bismark handle multiple read groups?

          I know you can set the read group flag in Bismark but that's only for 1 read group. What if you have multiple? Does Bismark align those read-groups separately against the reference? Do any re-calibrations?

          GATK seems to imply recalibration (BQSR) is necessary for SNP calling. What about methylation calls?

          I'm wondering if I should:

          1) Run Bismark on each read group separately
          2) Combine the multiple bam files
          3) Re-calibrate base scores (BQSR)
          4) Run bismark_methylation_extractor on the final re-calibrated bam file

          Comment


          • Hi Jon,

            Bismark allows you to specify a read group for each sample, so that if you were to combine several different samples afterwards you could still tell which read came from which sequencing run. Other than that I am afraid the methylation extractor doesn't do anything sophisticated such as the BQSR step, it will simply extract the methylation information from the BAM files as is.

            In contrast to tools like GATK which specifically ask you not to perform any quality trimming up-front but rather use soft clipping during the alignment step, Bismark is designed to to be used only with properly adapter and base call quality trimmed data (Trim Galore uses a Phred score cut-off of 20 by default). Because of this additional step we do not check the basecall quality again during the extraction process. I hope this helps, Best, Felix

            Comment


            • Thank you for the response. I'm currently using GATK and Picard for alignment metrics (depth of coverage, insert size, on target bases, FOLD_80_BASE_PENALTY, etc). Given I'm using trim_galore and bismark, would you consider that use of GATK & Picard ok? Or should I use a different tool? What do you recommend?

              If Bismark is "designed to to be used only with properly adapter and base call quality trimmed data", what are the pitfalls?

              Comment


              • Hi Jon,

                I would assume that GATK and Picard will probably run but I don't know exactly what they need to look at the metrics you mentioned, you might have to ask their developers to be honest. We personally use SeqMonk to get all of the stats we are interested in. It is a genome browser that does all sorts of other relevant quantitations as well. Cheers, Felix

                Comment


                • Is there a way to use your tools to get CpG coverage? For instance

                  55% of CpG sites are at 30x
                  45% of CpG sites are at 20x
                  25% of CpG sites are at 10x

                  As well as methylation rates given certain depths at CpG sites?

                  55% of CpG sites are methylated at 30x
                  45% of CpG sites are methylated at 20x
                  25% of CpG sites are methylated at 10x

                  Comment


                  • Not directly I am afraid, but one could probably write a quick script (or even use some clever awk/sed constructs) to extract this kind of information from either the coverage files or genome-wide methylation reports.

                    Comment


                    • Hi Felix,
                      When I tried to run the recently added script:
                      Code:
                      filter_non_conversion
                      I got the following error message:
                      Code:
                      syntax error at ~/bismark-0.17.0/filter_non_conversion line 280, near "last nless ("
                      Execution of ~/bismark-0.17.0/filter_non_conversion aborted due to compilation errors.

                      Comment


                      • Sorry for this, I have just fixed this typo. Please clone the latest development version here:

                        A tool to map bisulfite converted sequence reads and determine cytosine methylation states - FelixKrueger/Bismark


                        Cheers, Felix

                        Comment


                        • Directional/Non-directional amplicons

                          Hi (hopefully) Felix,

                          I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

                          This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

                          Final Alignment report
                          ======================
                          Sequences analysed in total: 57545
                          Number of alignments with a unique best hit from the different alignments: 36008
                          Mapping efficiency: 62.6%
                          Sequences with no alignments under any condition: 21427
                          Sequences did not map uniquely: 110
                          Sequences which were discarded because genomic sequence could not be extracted: 0

                          Number of sequences with unique best (first) alignment came from the bowtie output:
                          CT/CT: 1667 ((converted) top strand)
                          CT/GA: 13421 ((converted) bottom strand)
                          GA/CT: 1088 (complementary to (converted) top strand)
                          GA/GA: 19832 (complementary to (converted) bottom strand)

                          Final Cytosine Methylation Report
                          =================================
                          Total number of C's analysed: 448283

                          Total methylated C's in CpG context: 22378
                          Total methylated C's in CHG context: 6624
                          Total methylated C's in CHH context: 5983

                          Total unmethylated C's in CpG context: 9707
                          Total unmethylated C's in CHG context: 221431
                          Total unmethylated C's in CHH context: 182160

                          C methylated in CpG context: 69.7%
                          C methylated in CHG context: 2.9%
                          C methylated in CHH context: 3.2%

                          Comment


                          • Originally posted by seqfast View Post
                            Hi (hopefully) Felix,

                            I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

                            This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

                            Final Alignment report
                            ======================
                            Sequences analysed in total: 57545
                            Number of alignments with a unique best hit from the different alignments: 36008
                            Mapping efficiency: 62.6%
                            Sequences with no alignments under any condition: 21427
                            Sequences did not map uniquely: 110
                            Sequences which were discarded because genomic sequence could not be extracted: 0

                            Number of sequences with unique best (first) alignment came from the bowtie output:
                            CT/CT: 1667 ((converted) top strand)
                            CT/GA: 13421 ((converted) bottom strand)
                            GA/CT: 1088 (complementary to (converted) top strand)
                            GA/GA: 19832 (complementary to (converted) bottom strand)

                            Final Cytosine Methylation Report
                            =================================
                            Total number of C's analysed: 448283

                            Total methylated C's in CpG context: 22378
                            Total methylated C's in CHG context: 6624
                            Total methylated C's in CHH context: 5983

                            Total unmethylated C's in CpG context: 9707
                            Total unmethylated C's in CHG context: 221431
                            Total unmethylated C's in CHH context: 182160

                            C methylated in CpG context: 69.7%
                            C methylated in CHG context: 2.9%
                            C methylated in CHH context: 3.2%
                            Hi seqfast,

                            When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

                            To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

                            So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

                            Comment


                            • Thank you Felix,

                              I've been told to PCR primers are 'agnostic' to bisulfite conversion, but I believe there will still be some effects and a likelihood of some strand-favoring in the amplification.

                              Appreciate your quick reply, and thanks for a very valuable tool!

                              -sf

                              Comment


                              • Hi,

                                I am trying to launch methylation_extractor and it does not like my files =( It says:

                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1654, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1657, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1660, <IN> line 27.
                                Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1663, <IN> line 27.
                                Use of uninitialized value $read_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
                                Use of uninitialized value $genome_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
                                Unexpected combination of read and genome conversion: '' / ''
                                The line 27 is the first read of sam file:

                                Code:
                                27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG    16      chr10   71747   255     51M     *       0       0       CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC     CG     3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG     NM:i:9  MD:Z:2AAA6CA3A2A2A24A3  XM:Z:Z..x........................x..z..h...h.......hhx..
                                The methylation extractor worked well for me before (even for the same files), what can cause the problem? (I've tried to play with cmd modes, did not help...)

                                The command line: ./bismark_methylation_extractor -s test.sam

                                Thanks!
                                Last edited by Kuckunniwi; 04-10-2017, 12:59 PM. Reason: added the command line

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 11-08-2024, 11:09 AM
                                0 responses
                                35 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-08-2024, 06:13 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X