Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 10 kb plasmid de novo assembly

    I have the simplest task for a de novo assembler: to assemble a short bacterial 10kb plasmid without repeats from Illumina 90 bp long reads. The plasmid was sequenced on average 1,000 times over. I used default parameters on velvet and SeqMan NGen and couldn't get it assembled. Could anyone suggest an assembler and parameters that I could use for the task.

  • #2
    First throw away 90% of your reads. 1000x is too high.
    --
    Phillip

    Comment


    • #3
      Originally posted by pmiguel View Post
      First throw away 90% of your reads. 1000x is too high.
      --
      Phillip
      I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?

      Comment


      • #4
        Originally posted by e.dobbs View Post
        I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
        I think it is because random errors get repeated over and over and start to look like real base calls. This complicates the solution path through the assembly graph and you get many highly related but separate contigs. Philip is correct, get a sub-sample of your data and do the assembly. You know what your solution should look like (1 contig) so do an experiment with 10X, 20X, 30X, 50X, 100X and see what you get. The N50 value will get better and better as you add reads and should approach your largest contig size (which hopefully is close to 10kb). After some level of coverage the N50 will fall and your largest contig will get shorter. With a 10kb plasmid you'll probably peak at 30 or 50X

        Travis

        Comment


        • #5
          Originally posted by e.dobbs View Post
          I have a similar project to this one and I'm having trouble assembling my viral sequences which I'm expecting 10,000-100,000x coverage. Why does too much coverage produce a problem for de novo assembly?
          The guys writing the code probably did not see >100X coverage as a common use case. So it is not optimized for that read depth. Kind of like you order a dump truck full of mulch for your landscaping. If you get that much you landscape your lawn. But if an air craft carrier load of mulch gets dumped on you, it crushes your house and smothers you.

          --
          Phillip

          Comment


          • #6
            Thanks for the answers guys! I've re-tried my assembly with 1/6000th of my data and the assembly looks much better

            Comment


            • #7
              Given your abundance of data, some really aggressive trimming of ends might help. Another approach would be to use a tool such as MUSKET that trims/corrects reads to eliminate ultra-rare kmers.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM
              • seqadmin
                Addressing Off-Target Effects in CRISPR Technologies
                by seqadmin






                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                08-27-2024, 04:44 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 06:25 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 01:02 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-18-2024, 06:39 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-11-2024, 02:44 PM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Working...
              X