Header Leaderboard Ad


PacBio consensus quality



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • PacBio consensus quality


    I have sequenced a BAC clone with PacBio RSII
    To make the assembly I used Facon through pbbioconda and for polishing I used quiver

    To have an estimation of the consensus quality I re map the original bam reads file against the consensus

    How to estimate a mean quality value, in other world a consensus Phred score for the base calls of the consensus ... :-)

    Thank you in advance
    Last edited by phleroy; 01-25-2019, 01:07 AM.

  • #2
    We polish with arrow and just list one of the outputs as fastq "-o sample_consensus.fastq" and it generates a fastq file with a consensus for each contig and the quality score. You might check if quiver has the same option, or switch to arrow (here's a blog about doing so https://dazzlerblog.wordpress.com/tag/arrow/ ).
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com


    • #3
      Thank you very much for this suggestion
      We do have the possibility to obtain a fastq file with quiver with the option -o out.fastq as you have mentionned for arrow

      The question, is then, how you recover the mean QV for the consensus?

      into the fastq file we can see :

      I guess the quality value for each consensus nucleotide is the second line ? but how to calculate it ?

      Thank you again for any help


      • #4
        You can convert the Phred QV scores to probabilities then sum over the probabilities over the entire sequence to get the expected number of errors.

        You can use this Python script to calculate expected acc from a FASTQ files (though this is in a repo meant for PacBio transcriptome data, this script is generic):


        • #5
          Thank you so much, I will try this option as soon as possible and tell you :-)


          • #6
            I tried the python script (calc_expected_accuracy_from_fastq.py) on our fastq consensus sequence which was obtained with quiver and obtained as expected the "expected_accurancy" which was : expected_accuracy=0.997

            In a previous analysis I used two smrtlink python scripts to estimate the mean_QV
            - summarize_coverage.py to obtain a alignment summary gff file
            - polished_assembly.py to obtain the csv file which gives the a mean_qv of 48.65

            I have the feeling that the two values estimate different metrics ? I am not a specialist of this area and I am curious to have any remarks or suggestion

            Nevertheless, these two values : mean_qv and expected_accuracy should give an estimation of the quality of the consensus assembly. I just need to understand precisely what interpretation to have for each value

            Thank you in advance


            • #7
              If you assemble a set of reads, then use them to polish the assembly, there is no way to measure any truly meaningful consensus quality without an orthogonal datatype, or knowledge of ground truth. The expected accuracy from the fastq that results from polishing is highly dependent on the consensus algorithm and may not be a true indication of the quality of the consensus.


              Latest Articles


              • seqadmin
                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                by seqadmin

                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                01-24-2023, 01:19 PM
              • seqadmin
                Introduction to Single-Cell Sequencing
                by seqadmin
                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                01-09-2023, 03:10 PM