Announcement

Collapse
No announcement yet.

'n' in PacBio assembled sequences?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 'n' in PacBio assembled sequences?

    I used Celera to assemble PacBio with correction with 454 sequences.

    However, I found letter n (other than atcg) in the assembled result. Why ? and how do I fix it?

  • #2
    You can get an "N" in assemblies from all read types, and typically this means there was no clear consensus - some of the reads suggested on base, other reads another. Some assemblies might use other IUPAC ambiguity codes if they can tell for example the base is either an A or C.

    Such positions could be SNPs if you are sequencing a mixed population, or different alleles if you are sequencing something with two (or more) copies of each chromosome, or errors in assembly (e.g. merging two similar regions into one), etc.

    It is also possible by bad luck in a low coverage region that all your reads at that position happen to have an N rather than a clear base.

    Comment


    • #3
      Good eye. N is from Celera Assember in this case, rather than PacBio.

      If you've got enough PacBio coverage, you can run Quiver for assembly polishing, which can lead to final accuracy of > 99.999%. See www.pacbiodevnet.com/quiver. That can remove or introduce Ns, depending on the situation. You can use this option to reduce the number of Ns in certain cases.

      --noEvidenceConsensusCall=reference

      Comment


      • #4
        Do yo mean use Quiver instead of celera ?

        Is Quiver a standard/common tool for assembling PacBio? (I'm new to PacBio)

        Comment


        • #5
          Quiver is a beta tool for PacBio that will become standard in the next release. It's available on github now. Here's a documentation link:

          https://github.com/PacificBioscience...owToQuiver.rst

          Comment


          • #6
            Thank you for your reply. I have 2 more questions.

            1. Does Quiver still need next-generation sequences for correction?

            2. Any chance, by changing any parameters, I could reduce n in Celra?

            Comment


            • #7
              Good questions. Quiver is a consensus and variant caller, like the GATK Haplotype Caller, rather than a de novo assembly algorithm.

              1. HGAp does not need 2nd gen sequences (or PacBio CCS) for error correction, if that's your question. Quiver is used by HGAp, but it's only for generating the final assembly sequence after you've got your contigs. It's not an "error correction" algorithm.

              2. I'm not aware of which parameters would reduce the Ns in Celera Assembler. Sorry!

              Comment

              Working...
              X