Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Myggan
    Junior Member
    • Aug 2013
    • 3

    GC-bias

    Hello!

    As far as I know, Pacbio sequencing shouldn't have any GC-bias at all, or at least very small.

    When I was comparing GC-bias in a couple of samples (using Picard's CollectGcBiasMetrics) sequenced with different technologies I noticed that the Pacbio graph appeared particularly strange: http://i57.tinypic.com/2rxbgc0.png
    Basically an inverted normal-distribution rather than a flat line if no bias.

    The GC-bias is calculated from a bamfile where the PBcR are aligned using BLASR back to the scaffolds produced in an assembly.

    The graphs I produced for other technologies looks more or less as expected.

    Anyone know what is going on, or am I doing something terribly wrong?
    Something due to the fact that the reads are error-corrected?
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.

    Comment

    • Myggan
      Junior Member
      • Aug 2013
      • 3

      #3
      Originally posted by Brian Bushnell View Post
      There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
      Thank you for your answer.
      To see if this was the case I tried to run BLASR on all my filtered reads (Polymerase read quality > 0.75, Readlength > 50) which is not error corrected.
      BLASR was run just as the previous one, and again plotted GC bias with Picard.
      The graph looks almost the same: http://imgur.com/d5pdxsC
      Could it be a problem with Picard and the longer reads, or do I really have a bias like that?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?

        Comment

        • Myggan
          Junior Member
          • Aug 2013
          • 3

          #5
          Originally posted by Brian Bushnell View Post
          There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
          Here is the gc dist of the reads http://imgur.com/EqLnvp0
          But that doesn't really reveal gc-related coverage-bias?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Well... if the gc-distribution of the reads looks identical to the gc-distribution of the reference, that implies no bias, assuming all of the reads originated from that reference. Mapping will give you a much better indication of gc bias, but then you no longer know if the gc bias came from the raw reads or from the mapping. So with PacBio reads, which are hard to map due to the high error rates, I'd just compare the ref gc dist to the read gc dist without mapping.

            Note that the reference could be wrong, too. High and low gc areas have lower complexity and thus are more likely to be repetitive, and collapsed by an assembler. So if you have a poor assembly or a highly-repetitive organism, it's possible that the seemingly higher coverage of extreme gc areas is actually due to the fact that they are collapsed repeats.

            Comment

            • mchaisso
              Member
              • Apr 2008
              • 84

              #7
              Mapping with high error rates is not an issue, but there could be edge effects if you are mapping to short contigs. Is this using a short read de novo assembly?

              Looking at GC content of the original reads without mapping can be problematic if read trimming isn't working well. I believe it is working somewhat well with the latest software release.

              -mark

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 05:03 AM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              15 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              185 views
              0 reactions
              Last Post seqadmin  
              Working...