Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low mapping efficiency of WGBS data for DNA methylation

    Hi,
    I am analysing WGBS data (Illumina Hiseq) of Bovine FAT tissues for differential methylation. I used TrimGalore for adaptor removal and qulity check for the the pilot sample . All the adoptors were removed and quality was good (both survived paired end reads 98%). Then I used Bismark for unique alignment to the Bisulfite converted reference genome and I got 57.8% mapping efficiency.
    I want to ask wether 57.8% mapping efficiency is good to proceed further ? What is gold standard for mapping efficiency in WGBS? Please guide me.

    Regards,
    Naveed Jhamat.

  • #2
    Hi Naveed,

    ~58% doesn't sound too bad to me, I just checked a few sample we had run over here (100bp single-end) and we also got mapping efficiencies around 53%. I did notice however that the number of ambiguous alignments was fairly large (around 37%) compared to human or mouse samples, so that is certainly one aspect that can bring mapping efficiencies down. Other factors may include read-length, software version and parameters used, single-end or paired-end libraries, or contamination with other organisms or spike-ins (you could test this out using FastQ screen). If you've got any specific questions just drop me a line. Cheers, Felix

    Comment


    • #3
      Thanks Felix

      Please find here the more detail about my output and guide.
      I had about 40 million PE reads in the sample (WGBS) with 126 bp length.
      I executed trimgalore with following script:-
      trim_galore -o testing --paired sample_L001_R1_001.fastq.gz \
      sample_L001_R2_001.fastq.gz
      Every thing was clear in FastQC screen except Kmer content and per base sequence content.
      I downloaded the latest version of Bovine genome (Bos_taurus8.UMD.3.1.fa) from Ensemble for bismark_genome_preparation.

      Then I executed following bismark script and got 57.6% mapping efficiecy:-
      bismark --path_to_bowtie /path/ --samtools_path /path/ -p 4 -N 1 --bowtie2 \
      --unmapped --basename output /path_to_ref_genome/ \
      -1 sample_L001_R1_001_val_1.fq.gz \
      -2 sample_L001_R2_001_val_2.fq.gz

      Here is the text report generated by bismark that you can see and guide me to proceed further. Thanking you in anticipation. / Naveed Jhamat.

      Final Alignment report
      ======================
      Sequence pairs analysed in total: 40502929
      Number of paired-end alignments with a unique best hit: 23323660
      Mapping efficiency: 57.6%
      Sequence pairs with no alignments under any condition: 9511716
      Sequence pairs did not map uniquely: 7667553
      Sequence pairs which were discarded because genomic sequence could not be extracted: 0

      Number of sequence pairs with unique best (first) alignment came from the bowtie output:
      CT/GA/CT: 12000211 ((converted) top strand)
      GA/CT/CT: 0 (complementary to (converted) top strand)
      GA/CT/GA: 0 (complementary to (converted) bottom strand)
      CT/GA/GA: 11323449 ((converted) bottom strand)

      Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

      Final Cytosine Methylation Report
      =================================
      Total number of C's analysed: 1091274815

      Total methylated C's in CpG context: 64749692
      Total methylated C's in CHG context: 5352112
      Total methylated C's in CHH context: 8195938
      Total methylated C's in Unknown context: 2884


      Total unmethylated C's in CpG context: 23570501
      Total unmethylated C's in CHG context: 277948464
      Total unmethylated C's in CHH context: 711458108
      Total unmethylated C's in Unknown context: 7633


      C methylated in CpG context: 73.3%
      C methylated in CHG context: 1.9%
      C methylated in CHH context: 1.1%
      C methylated in unknown context (CN or CHN): 27.4%

      Comment


      • #4
        Hi Naveed,

        That looks all very sensible to me. If I were you I would a) make sure you were using the latest version (0.16.3), b) maybe lose the parameter -N 1 so that the default of 0 is used (I am still investigating reports that -N 1 sometimes does weird things...). And finally c) seeing that you have specified --unmapped you could try to align just the unmapped Read 1 in single-end mode (with default parameters) to see if a substantial fraction of unmapped pairs would align as single-end file. And yea maybe run FastQ_Screen on the data to see if there are any contaminations. If you wanted you could also email me say 100K reads and I run FastQ_Screen for you over here where we have set it up already. Cheers, Felix

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X