Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 10 kb plasmid de novo assembly

    I have the simplest task for a de novo assembler: to assemble a short bacterial 10kb plasmid without repeats from Illumina 90 bp long reads. The plasmid was sequenced on average 1,000 times over. I used default parameters on velvet and SeqMan NGen and couldn't get it assembled. Could anyone suggest an assembler and parameters that I could use for the task.

  • #2
    First throw away 90% of your reads. 1000x is too high.
    --
    Phillip

    Comment


    • #3
      Originally posted by pmiguel View Post
      First throw away 90% of your reads. 1000x is too high.
      --
      Phillip
      I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?

      Comment


      • #4
        Originally posted by e.dobbs View Post
        I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
        I think it is because random errors get repeated over and over and start to look like real base calls. This complicates the solution path through the assembly graph and you get many highly related but separate contigs. Philip is correct, get a sub-sample of your data and do the assembly. You know what your solution should look like (1 contig) so do an experiment with 10X, 20X, 30X, 50X, 100X and see what you get. The N50 value will get better and better as you add reads and should approach your largest contig size (which hopefully is close to 10kb). After some level of coverage the N50 will fall and your largest contig will get shorter. With a 10kb plasmid you'll probably peak at 30 or 50X

        Travis

        Comment


        • #5
          Originally posted by e.dobbs View Post
          I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
          The guys writing the code probably did not see >100X coverage as a common use case. So it is not optimized for that read depth. Kind of like you order a dump truck full of mulch for your landscaping. If you get that much you landscape your lawn. But if an air craft carrier load of mulch gets dumped on you, it crushes your house and smothers you.

          --
          Phillip

          Comment


          • #6
            Thanks for the answers guys! I've re-tried my assembly with 1/6000th of my data and the assembly looks much better

            Comment


            • #7
              Given your abundance of data, some really aggressive trimming of ends might help. Another approach would be to use a tool such as MUSKET that trims/corrects reads to eliminate ultra-rare kmers.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM
              • seqadmin
                Multiomics Techniques Advancing Disease Research
                by seqadmin


                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                A major leap in the field has
                ...
                02-08-2024, 06:33 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:12 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-23-2024, 04:11 PM
              0 responses
              65 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2024, 08:52 AM
              0 responses
              70 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-20-2024, 08:57 AM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X