Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GC-bias

    Hello!

    As far as I know, Pacbio sequencing shouldn't have any GC-bias at all, or at least very small.

    When I was comparing GC-bias in a couple of samples (using Picard's CollectGcBiasMetrics) sequenced with different technologies I noticed that the Pacbio graph appeared particularly strange: http://i57.tinypic.com/2rxbgc0.png
    Basically an inverted normal-distribution rather than a flat line if no bias.

    The GC-bias is calculated from a bamfile where the PBcR are aligned using BLASR back to the scaffolds produced in an assembly.

    The graphs I produced for other technologies looks more or less as expected.

    Anyone know what is going on, or am I doing something terribly wrong?
    Something due to the fact that the reads are error-corrected?

  • #2
    There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
      Thank you for your answer.
      To see if this was the case I tried to run BLASR on all my filtered reads (Polymerase read quality > 0.75, Readlength > 50) which is not error corrected.
      BLASR was run just as the previous one, and again plotted GC bias with Picard.
      The graph looks almost the same: http://imgur.com/d5pdxsC
      Could it be a problem with Picard and the longer reads, or do I really have a bias like that?

      Comment


      • #4
        There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
          Here is the gc dist of the reads http://imgur.com/EqLnvp0
          But that doesn't really reveal gc-related coverage-bias?

          Comment


          • #6
            Well... if the gc-distribution of the reads looks identical to the gc-distribution of the reference, that implies no bias, assuming all of the reads originated from that reference. Mapping will give you a much better indication of gc bias, but then you no longer know if the gc bias came from the raw reads or from the mapping. So with PacBio reads, which are hard to map due to the high error rates, I'd just compare the ref gc dist to the read gc dist without mapping.

            Note that the reference could be wrong, too. High and low gc areas have lower complexity and thus are more likely to be repetitive, and collapsed by an assembler. So if you have a poor assembly or a highly-repetitive organism, it's possible that the seemingly higher coverage of extreme gc areas is actually due to the fact that they are collapsed repeats.

            Comment


            • #7
              Mapping with high error rates is not an issue, but there could be edge effects if you are mapping to short contigs. Is this using a short read de novo assembly?

              Looking at GC content of the original reads without mapping can be problematic if read trimming isn't working well. I believe it is working somewhat well with the latest software release.

              -mark

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advanced Tools Transforming the Field of Cytogenomics
                by seqadmin


                At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                09-26-2023, 06:26 AM
              • seqadmin
                How RNA-Seq is Transforming Cancer Studies
                by seqadmin



                Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                09-07-2023, 11:15 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:14 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-29-2023, 09:38 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-27-2023, 06:57 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-26-2023, 07:53 AM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Working...
              X