Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Ttsutsui
    Junior Member
    • Nov 2017
    • 2

    Bismark PE mapping low efficiency

    Hi all,

    I am now learning WGBS analysis using Bismark ver1.9.

    I'm facing low mapping efficiency problem. When I use with PE mode, Mapping efficiency turn to be 1.8%. But when I use either of that sequence in SE mode, this gives me 88% mapping efficiency.
    My sample is not PBAT.

    I can't solve this problem by myself. Could anyone answer my problem?

    Followings are my procedure.
    1. remove poor read quality reads.
    2. remove adaptor sequence.
    3. convert hg19 refgenome by bismark_genome_preparation
    4. try mapping using bismark either PE mode or SE mode

    PE mode
    Code:
    bismark -q --bowtie2 -N 0 -L 20 -u 10000 -X 2000 --score_min L,0,-0.6 /refgenome --1 R1.fastq --2 R2.fastq --sam  -o ./bismark_result
    ======================
    Sequence pairs analysed in total: 10000
    Number of paired-end alignments with a unique best hit: 175
    Mapping efficiency: 1.8%

    Sequence pairs with no alignments under any condition: 9817
    Sequence pairs did not map uniquely: 8
    Sequence pairs which were discarded because genomic sequence could not be extracted: 0

    Number of sequence pairs with unique best (first) alignment came from the bowtie output:
    CT/GA/CT: 78 ((converted) top strand)
    GA/CT/CT: 0 (complementary to (converted) top strand)
    GA/CT/GA: 0 (complementary to (converted) bottom strand)
    CT/GA/GA: 97 ((converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 6079

    Total methylated C's in CpG context: 201
    Total methylated C's in CHG context: 7
    Total methylated C's in CHH context: 23
    Total methylated C's in Unknown context: 0

    Total unmethylated C's in CpG context: 157
    Total unmethylated C's in CHG context: 1271
    Total unmethylated C's in CHH context: 4420
    Total unmethylated C's in Unknown context: 14


    C methylated in CpG context: 56.1%
    C methylated in CHG context: 0.5%
    C methylated in CHH context: 0.5%
    C methylated in unknown context (CN or CHN): 0.0%
    =====================


    SE mode
    Code:
     bismark -q --bowtie2 -N 0 -L 20 --score_min L,0,-0.6 /refgenome --se R1.fastq --sam  -o ./bismark_result
    ======================
    Sequences analysed in total: 3014078
    Number of alignments with a unique best hit from the different alignments: 2664742
    Mapping efficiency: 88.4%

    Sequences with no alignments under any condition: 107753
    Sequences did not map uniquely: 241583
    Sequences which were discarded because genomic sequence could not be extracted: 10

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 1329899 ((converted) top strand)
    CT/GA: 1334833 ((converted) bottom strand)
    GA/CT: 0 (complementary to (converted) top strand)
    GA/GA: 0 (complementary to (converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 43696755

    Total methylated C's in CpG context: 1442578
    Total methylated C's in CHG context: 29761
    Total methylated C's in CHH context: 110235
    Total methylated C's in Unknown context: 654

    Total unmethylated C's in CpG context: 395138
    Total unmethylated C's in CHG context: 9033660
    Total unmethylated C's in CHH context: 32685383
    Total unmethylated C's in Unknown context: 13481

    C methylated in CpG context: 78.5%
    C methylated in CHG context: 0.3%
    C methylated in CHH context: 0.3%
    C methylated in Unknown context (CN or CHN): 4.6%
    ===================================

    Thanks alot,
    Taiki
  • fkrueger
    Senior Member
    • Sep 2009
    • 627

    #2
    Hi Tsutsui,

    In a case like yours Read 1 seems to be absolutely fine, and your library is directional, so that looks all fine. If I had to guess what the reason for the low mapping efficiency in PE mode is I would consider one of the following options:

    1. The FastQ files for R1 and R2 are not in the same order. Going back to the raw FastQ files and trimming with Trim Galore in --paired mode will fix this problem.

    2. Read 2 has particularly poor qualities or suffered a disastrous fault during the run. The FastQC profile of R2 might tell you if this was the case. Again, Trim Galore should fix quality issues from at least on the 3' end.

    3. The R2 was somehow special, e.g. the first 8bp could be a UMI sequence that prevents the reads from mapping? To see if there is a general mappability problem with R2 alone you can run the same SE command as for read1, but you need to also include --pbat. If that efficiency is equally high as R1 then the read order is the most likely suspect.

    Let me know how you are getting on. I could also offer to take a quick look for you if you could send some 100-200K reads via email.

    Cheers, Felix

    Comment

    • Ttsutsui
      Junior Member
      • Nov 2017
      • 2

      #3
      Hi Felix,

      Thank you for your kind reply.
      I tried Trim Galore in stead of fastq_quality_filter which I previously used.

      In the end, I found that Trim Galore works fine!
      I got 84% mapping efficiency using -PE in bismark.

      Thank you Felix.

      Comment

      • shawpa
        Member
        • Aug 2011
        • 73

        #4
        when to merge PE and SE alignments in Bismark

        Due to some R2 quality issues (I think), I am getting low paired end mapping efficiencies. When I align the unmapped reads in single end mode, I am able to recover quite a few of the reads. I am unsure where in the pipeline I can "merge" the outputs for the paired-end and single-end alignments. Can both files be given to the methylation extractor for 1 file output or do I just need to merge the counts in the reports such as the coverage output after.

        Comment

        • fkrueger
          Senior Member
          • Sep 2009
          • 627

          #5
          When you have both paired-end (PE) and single-end (SE) alignments I would methylation extract the files separately (the methylation extractor should auto-detect what to do), and then use the CpG* output files from both PE and SE as input for bismark2bedGraph to generate a coverage file. The command should be something like this:

          Code:
          bismark2bedGraph --buffer 10G -o output_file CpG*
          I hope this is what you were looking for?

          Comment

          • shawpa
            Member
            • Aug 2011
            • 73

            #6
            Thanks so much. That sounds like it will work.

            Comment

            • shawpa
              Member
              • Aug 2011
              • 73

              #7
              Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?

              Comment

              • fkrueger
                Senior Member
                • Sep 2009
                • 627

                #8
                Originally posted by shawpa View Post
                Just for clarification... R2 singles need to be aligned in pbat mode to get proper mapping?
                yes, that's correct.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 11:10 AM
                0 responses
                5 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                41 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                102 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                123 views
                0 reactions
                Last Post SEQadmin2  
                Working...