Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

explain unimodal GC-content bias

  • Filter
  • Time
  • Show
Clear All
new posts

  • explain unimodal GC-content bias


    I am a statistician rather than geneticist/biologist so would really be grateful if someone can explain the cause/origin of GC-content bias with sequencing coverage. Many studies have observed a unimodal relationship where coverage decreases at high AT or high GC.
    From what I understand, since AT bonds are weaker than GC bonds, in the PCR step, fragments with extreme GC (strong bonds) may not denature completely to form the single stranded DNA, hence we see a trend of decreasing coverage as GC increases.
    But what about the decreasing coverage in regions of extreme low AT?
    Can anyone explain?

  • #2
    Who knows?

    I think you are right to focus on PCR, because libraries constructed with no "enrichment" PCR give much less coverage bias.

    But you could run down a laundry list of potential issues with high-GC/high-AT and PCR. They could involve a higher extent of ssDNA secondary structure as a result of the effective drop in sequence complexity with high-GC or high-AT, some issue with the polymerase not "liking" high-GC/AT sequence, unequal depletion of dNTP reactant pools or a host of other possible causes.

    Maybe someone will post a link to a paper that addresses this issue. No doubt there are some out there. Actually, since you raise the question, maybe you could do the search? If you do, please post your results.



    • #3
      Excerpt from "Summarizing and correcting the GC content bias in high-throughput sequencing" by Benjamini and Speed (2012), which gives some suggestions and citations:

      While GC effect is commonly corrected for, until recently studies regarding the nature of this bias have been rare. Dohm et al. (2008, 1) first described the effect of the GC on fragment coverage in Illumina GA. ... Identifying the source of the bias was also hard, because the composition of the DNA molecule can affect many stages of the protocol. Sequence-related biases in the priming (9), size selection (3), PCR (10) and probability of sequencing errors (11–13) have all been found. In a recent analysis (12), PCR was shown to play the dominant role in the stages before the sequencing. While sequencing protocols have partially evolved to accommodate this new understanding (10,12), estimation and correction methods have not.
      The full paper with references for more details are here: