Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strong CpG methylation bias between R1 and R2

    Hello,
    I'm new in bioinformatics, and for first time training I've got the set of WGBS 100bp PE reads from few human cancer tissues.
    I've filtered reads with prinseq, sorted, and aligned them with bismark in PE mode to hg38 (prepared with bismark) from ucsc.
    Mapping efficiency is ~20% with ~80% C's methylated in CpG context.
    OK, low mappability of reads from BS treated DNA has been mentioned many times.
    Then I tried to map reads 1 and 2 separately in SE mode.
    Read 1: mapping efficiency ~60% with ~80% C's methylated in CpG context.
    Read 2: mapping efficiency ~50% with ~40% C's methylated in CpG context.
    additional trimming by 10-20 nt from any end of read2 slightly increase mappability, but doesn't affect methylation rate.
    This result seems extremely odd to me.
    If DNA was treated with BS, how can it happen that only read2 in pair shows 2X less methylation in CpG context?
    Does anybody have a fresh look?
    Thank you in advance.

  • #2
    Would you have following information:
    1- Kit or method used for library prep
    2- Read length
    3- Library peak size
    4- FastQC output for reads

    Comment


    • #3
      This is what I could extract from core lab personnel:

      1- Kit or method used for library prep

      Genomic DNA was extracted from tissue, BS treated, sonicated, end repaired, dA-tailed. Then standard illumina adaptors were used for PE sequencing.

      2- Read length

      100bases (adaptors already trimmed)

      3- Library peak size

      ~200nt

      4- FastQC output for reads

      sorry, I can't attach picture right now, but fastQC report is good for all reads median quality at 5' end is 30, at 3' end is ~15. And I preformed quality trimming with threshold over 15.

      Comment


      • #4
        Generally there are three WGBS library prep methods:
        1- Post-ligation bisulfite conversion: DNA fragmentation and standard library preparation with methylated adapters followed by bisulfite conversion and amplification
        2- Post-bisulfite conversion library preparation by second strand synthesis of converted ssDNA followed by standard end repair, A tailing and adapter ligation and PCR amplification of double stranded DNA.
        3- Post-bisulfite conversion library preparation by synthesise of second strand with random primers appended with one partial Illumina adapter sequence and tagging the 3’ end of new strand with Terminal Tagging Oligo appended with other partial Illumina adapter followed by PCR amplification.

        I assume your library was prepared with method 1. Peak size of 200 on average would have insert size of 75 nt so I would expect that large number of reads have been trimmed at 5’ end.

        It would be interesting to see the FastQC “per base sequence content” plot for reads and that should show similar portion of converted Cs. For an example see following plots for low diversity RRBS library that shows low %C in R1 and correspondingly low %G in R2. If your plots show similar C and G then issue could be analysis step.

        RRBS.pdf

        Comment


        • #5
          Something in this description seems wrong. After bisulfite conversion the DNA should be (mostly) single stranded (since the bisulfite conversion requires single stranded DNA). Thus the standard end-repair, A-tailing and Illumina adapter ligation with Y-adapters will not work.

          Originally posted by zubr View Post
          This is what I could extract from core lab personnel:

          1- Kit or method used for library prep

          Genomic DNA was extracted from tissue, BS treated, sonicated, end repaired, dA-tailed. Then standard illumina adaptors were used for PE sequencing.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            06-06-2024, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:58 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:18 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:04 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-03-2024, 06:55 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Working...
          X