Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GC-bias

    Hello!

    As far as I know, Pacbio sequencing shouldn't have any GC-bias at all, or at least very small.

    When I was comparing GC-bias in a couple of samples (using Picard's CollectGcBiasMetrics) sequenced with different technologies I noticed that the Pacbio graph appeared particularly strange: http://i57.tinypic.com/2rxbgc0.png
    Basically an inverted normal-distribution rather than a flat line if no bias.

    The GC-bias is calculated from a bamfile where the PBcR are aligned using BLASR back to the scaffolds produced in an assembly.

    The graphs I produced for other technologies looks more or less as expected.

    Anyone know what is going on, or am I doing something terribly wrong?
    Something due to the fact that the reads are error-corrected?

  • #2
    There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
      Thank you for your answer.
      To see if this was the case I tried to run BLASR on all my filtered reads (Polymerase read quality > 0.75, Readlength > 50) which is not error corrected.
      BLASR was run just as the previous one, and again plotted GC bias with Picard.
      The graph looks almost the same: http://imgur.com/d5pdxsC
      Could it be a problem with Picard and the longer reads, or do I really have a bias like that?

      Comment


      • #4
        There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
          Here is the gc dist of the reads http://imgur.com/EqLnvp0
          But that doesn't really reveal gc-related coverage-bias?

          Comment


          • #6
            Well... if the gc-distribution of the reads looks identical to the gc-distribution of the reference, that implies no bias, assuming all of the reads originated from that reference. Mapping will give you a much better indication of gc bias, but then you no longer know if the gc bias came from the raw reads or from the mapping. So with PacBio reads, which are hard to map due to the high error rates, I'd just compare the ref gc dist to the read gc dist without mapping.

            Note that the reference could be wrong, too. High and low gc areas have lower complexity and thus are more likely to be repetitive, and collapsed by an assembler. So if you have a poor assembly or a highly-repetitive organism, it's possible that the seemingly higher coverage of extreme gc areas is actually due to the fact that they are collapsed repeats.

            Comment


            • #7
              Mapping with high error rates is not an issue, but there could be edge effects if you are mapping to short contigs. Is this using a short read de novo assembly?

              Looking at GC content of the original reads without mapping can be problematic if read trimming isn't working well. I believe it is working somewhat well with the latest software release.

              -mark

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin




                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin


                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              23 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 09:45 AM
              0 responses
              200 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-03-2024, 08:54 AM
              0 responses
              209 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 07-02-2024, 03:00 PM
              0 responses
              192 views
              0 likes
              Last Post seqadmin  
              Working...
              X