Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

SPAdes plasmids output

  • Filter
  • Time
  • Show
Clear All
new posts

  • SPAdes plasmids output

    Hello all. I have done a plasmidSPAdes run on my bacterial reads to look for any plasmids. In the final output directory, I have the contigs.fasta file which contains the plasmids with the suffix componenet_X denoting each respective plasmid.

    When looking at the different kmer value folders the K21-K99 folders seem to contain assemblies of the bacterial contigs in the final_contigs.fasta file, as a normal SPAdes run would. However, the K127 folder final_contigs.fasta is instead similar to the contigs.fasta file mentioned above - it seems to only contain plasmids and not the bacterial contigs.

    After comparing the K127 final_contigs.fasta and contigs.fasta files with diff, they have differences. My question is why are these two files different? Is the contigs.fasta file different because SPAdes ran the mismatch corrector? And also why are plasmids only output in the K127 folder final_contigs.fasta and not the other kmer value folders?

    Last edited by ronaldrcutler; 07-26-2016, 07:51 PM.

  • #2

    In the K127 folder there is also before_chromosome_removal.fasta file. When I try to see if contig in the final_contigs.fasta is in the aformentioned file, I do not find them. So are the sequences that are supposedly plasmid sequences from the final_contigs.fasta in the K127 folder and contigs.fasta in the main SPAdes output directory in the final_contigs.fasta files in the other folders? What exactly is the difference between them all?


    • #3
      And also why are plasmids only output in the K127 folder final_contigs.fasta and not the other kmer value folders?
      Increasing the kmer length decreases the depth of coverage for each kmer. At some point, the depth becomes insufficient to assemble the de Bruijn graph accurately. Since plasmid sequences are present at much higher read depth, assembly of those reads can tolerate a higher kmer than the host cell sequences. It appears that setting k=127 is a threshold for assembling plasmid but not host cell reads.

      Also, if you're only interested in the plasmid sequences, you may want to use plasmidSPAdes (described here).
      Last edited by HESmith; 07-28-2016, 08:30 AM.


      • #4
        I was using plasmidSPAdes. If I understand correctly, the plasmid contigs are able to assemble at higher kmer values because they are present at a higher read depth. Why are they more present at higher read depths than the host cell? What about coverage?


        • #5
          Plasmids are present in multiple copies per cell, whereas the host genome is (typically) a single copy. More copies = higher read depth.


          • #6
            Okay, thanks for the clarification. So since there are no plasmids specified in the other final_contigs.fasta files from using kmer values less than 127, I can confidently say there are no plasmids in these and were only found when using a kmer value of 127?