Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PacBio consensus quality

    Hello

    I have sequenced a BAC clone with PacBio RSII
    To make the assembly I used Facon through pbbioconda and for polishing I used quiver

    To have an estimation of the consensus quality I re map the original bam reads file against the consensus

    How to estimate a mean quality value, in other world a consensus Phred score for the base calls of the consensus ... :-)

    Thank you in advance
    Philippe
    Last edited by phleroy; 01-25-2019, 01:07 AM.

  • #2
    We polish with arrow and just list one of the outputs as fastq "-o sample_consensus.fastq" and it generates a fastq file with a consensus for each contig and the quality score. You might check if quiver has the same option, or switch to arrow (here's a blog about doing so https://dazzlerblog.wordpress.com/tag/arrow/ ).
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Thank you very much for this suggestion
      We do have the possibility to obtain a fastq file with quiver with the option -o out.fastq as you have mentionned for arrow

      The question, is then, how you recover the mean QV for the consensus?

      into the fastq file we can see :
      @000000F|quiver
      ATCATTGTTACTACTAGAGGAAGAATCTTTCTTG ...
      +
      "RQQPQQQRQQQQQSRRQSTSSQRQRSSSRRRQQRSRRRQRSRQ ...

      I guess the quality value for each consensus nucleotide is the second line ? but how to calculate it ?

      Thank you again for any help
      Philippe

      Comment


      • #4
        You can convert the Phred QV scores to probabilities then sum over the probabilities over the entire sequence to get the expected number of errors.

        You can use this Python script to calculate expected acc from a FASTQ files (though this is in a repo meant for PacBio transcriptome data, this script is generic):
        Miscellaneous collection of Python and R scripts for processing Iso-Seq data - Magdoll/cDNA_Cupcake

        Comment


        • #5
          Thank you so much, I will try this option as soon as possible and tell you :-)

          Comment


          • #6
            I tried the python script (calc_expected_accuracy_from_fastq.py) on our fastq consensus sequence which was obtained with quiver and obtained as expected the "expected_accurancy" which was : expected_accuracy=0.997

            In a previous analysis I used two smrtlink python scripts to estimate the mean_QV
            - summarize_coverage.py to obtain a alignment summary gff file
            - polished_assembly.py to obtain the csv file which gives the a mean_qv of 48.65

            I have the feeling that the two values estimate different metrics ? I am not a specialist of this area and I am curious to have any remarks or suggestion

            Nevertheless, these two values : mean_qv and expected_accuracy should give an estimation of the quality of the consensus assembly. I just need to understand precisely what interpretation to have for each value

            Thank you in advance
            Philippe

            Comment


            • #7
              If you assemble a set of reads, then use them to polish the assembly, there is no way to measure any truly meaningful consensus quality without an orthogonal datatype, or knowledge of ground truth. The expected accuracy from the fastq that results from polishing is highly dependent on the consensus algorithm and may not be a true indication of the quality of the consensus.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X