Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GC-bias

    Hello!

    As far as I know, Pacbio sequencing shouldn't have any GC-bias at all, or at least very small.

    When I was comparing GC-bias in a couple of samples (using Picard's CollectGcBiasMetrics) sequenced with different technologies I noticed that the Pacbio graph appeared particularly strange: http://i57.tinypic.com/2rxbgc0.png
    Basically an inverted normal-distribution rather than a flat line if no bias.

    The GC-bias is calculated from a bamfile where the PBcR are aligned using BLASR back to the scaffolds produced in an assembly.

    The graphs I produced for other technologies looks more or less as expected.

    Anyone know what is going on, or am I doing something terribly wrong?
    Something due to the fact that the reads are error-corrected?

  • #2
    There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
      Thank you for your answer.
      To see if this was the case I tried to run BLASR on all my filtered reads (Polymerase read quality > 0.75, Readlength > 50) which is not error corrected.
      BLASR was run just as the previous one, and again plotted GC bias with Picard.
      The graph looks almost the same: http://imgur.com/d5pdxsC
      Could it be a problem with Picard and the longer reads, or do I really have a bias like that?

      Comment


      • #4
        There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
          Here is the gc dist of the reads http://imgur.com/EqLnvp0
          But that doesn't really reveal gc-related coverage-bias?

          Comment


          • #6
            Well... if the gc-distribution of the reads looks identical to the gc-distribution of the reference, that implies no bias, assuming all of the reads originated from that reference. Mapping will give you a much better indication of gc bias, but then you no longer know if the gc bias came from the raw reads or from the mapping. So with PacBio reads, which are hard to map due to the high error rates, I'd just compare the ref gc dist to the read gc dist without mapping.

            Note that the reference could be wrong, too. High and low gc areas have lower complexity and thus are more likely to be repetitive, and collapsed by an assembler. So if you have a poor assembly or a highly-repetitive organism, it's possible that the seemingly higher coverage of extreme gc areas is actually due to the fact that they are collapsed repeats.

            Comment


            • #7
              Mapping with high error rates is not an issue, but there could be edge effects if you are mapping to short contigs. Is this using a short read de novo assembly?

              Looking at GC content of the original reads without mapping can be problematic if read trimming isn't working well. I believe it is working somewhat well with the latest software release.

              -mark

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X