No announcement yet.

Low mapping efficiency of WGBS data for DNA methylation

  • Filter
  • Time
  • Show
Clear All
new posts

  • Low mapping efficiency of WGBS data for DNA methylation

    I am analysing WGBS data (Illumina Hiseq) of Bovine FAT tissues for differential methylation. I used TrimGalore for adaptor removal and qulity check for the the pilot sample . All the adoptors were removed and quality was good (both survived paired end reads 98%). Then I used Bismark for unique alignment to the Bisulfite converted reference genome and I got 57.8% mapping efficiency.
    I want to ask wether 57.8% mapping efficiency is good to proceed further ? What is gold standard for mapping efficiency in WGBS? Please guide me.

    Naveed Jhamat.

  • #2
    Hi Naveed,

    ~58% doesn't sound too bad to me, I just checked a few sample we had run over here (100bp single-end) and we also got mapping efficiencies around 53%. I did notice however that the number of ambiguous alignments was fairly large (around 37%) compared to human or mouse samples, so that is certainly one aspect that can bring mapping efficiencies down. Other factors may include read-length, software version and parameters used, single-end or paired-end libraries, or contamination with other organisms or spike-ins (you could test this out using FastQ screen). If you've got any specific questions just drop me a line. Cheers, Felix


    • #3
      Thanks Felix

      Please find here the more detail about my output and guide.
      I had about 40 million PE reads in the sample (WGBS) with 126 bp length.
      I executed trimgalore with following script:-
      trim_galore -o testing --paired sample_L001_R1_001.fastq.gz \
      Every thing was clear in FastQC screen except Kmer content and per base sequence content.
      I downloaded the latest version of Bovine genome (Bos_taurus8.UMD.3.1.fa) from Ensemble for bismark_genome_preparation.

      Then I executed following bismark script and got 57.6% mapping efficiecy:-
      bismark --path_to_bowtie /path/ --samtools_path /path/ -p 4 -N 1 --bowtie2 \
      --unmapped --basename output /path_to_ref_genome/ \
      -1 sample_L001_R1_001_val_1.fq.gz \
      -2 sample_L001_R2_001_val_2.fq.gz

      Here is the text report generated by bismark that you can see and guide me to proceed further. Thanking you in anticipation. / Naveed Jhamat.

      Final Alignment report
      Sequence pairs analysed in total: 40502929
      Number of paired-end alignments with a unique best hit: 23323660
      Mapping efficiency: 57.6%
      Sequence pairs with no alignments under any condition: 9511716
      Sequence pairs did not map uniquely: 7667553
      Sequence pairs which were discarded because genomic sequence could not be extracted: 0

      Number of sequence pairs with unique best (first) alignment came from the bowtie output:
      CT/GA/CT: 12000211 ((converted) top strand)
      GA/CT/CT: 0 (complementary to (converted) top strand)
      GA/CT/GA: 0 (complementary to (converted) bottom strand)
      CT/GA/GA: 11323449 ((converted) bottom strand)

      Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

      Final Cytosine Methylation Report
      Total number of C's analysed: 1091274815

      Total methylated C's in CpG context: 64749692
      Total methylated C's in CHG context: 5352112
      Total methylated C's in CHH context: 8195938
      Total methylated C's in Unknown context: 2884

      Total unmethylated C's in CpG context: 23570501
      Total unmethylated C's in CHG context: 277948464
      Total unmethylated C's in CHH context: 711458108
      Total unmethylated C's in Unknown context: 7633

      C methylated in CpG context: 73.3%
      C methylated in CHG context: 1.9%
      C methylated in CHH context: 1.1%
      C methylated in unknown context (CN or CHN): 27.4%


      • #4
        Hi Naveed,

        That looks all very sensible to me. If I were you I would a) make sure you were using the latest version (0.16.3), b) maybe lose the parameter -N 1 so that the default of 0 is used (I am still investigating reports that -N 1 sometimes does weird things...). And finally c) seeing that you have specified --unmapped you could try to align just the unmapped Read 1 in single-end mode (with default parameters) to see if a substantial fraction of unmapped pairs would align as single-end file. And yea maybe run FastQ_Screen on the data to see if there are any contaminations. If you wanted you could also email me say 100K reads and I run FastQ_Screen for you over here where we have set it up already. Cheers, Felix