Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • contig assembly

    Hello,

    We've de novo assembled our RNA-Seq reads (about 50 millions 2×75 reads) into contigs by several de novo assemblers with different parameters. Most of the contigs we’ve get were very short due to the poor sequencing quality and the low sequencing depth. The contigs from each assmbler under different parameters varied from eachother, but some of them had overlaps. So these contigs may be assembled into longer contigs. The problem is that we couldn't assemble millions of contigs into supercontigs manually. Moreover, our computer resources were very low (12G RAM, 8 core CPU, 500G spaces). Is anyone knows how to assmeble these contigs into longer contigs with our limited computer resources, and which software could handle the assemble task.

    Thanks

    YH-GU
    Last edited by yh_gu; 08-07-2010, 01:57 AM.

  • #2
    If you have enough memory to assemble reads into contigs then you clearly have enough memory to assemble contigs into supercontigs, as that is an easier feat.

    When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
    --
    Jeremy Leipzig
    Bioinformatics Programmer
    --
    My blog
    Twitter

    Comment


    • #3
      The last is certainly true - but can anyone recommend the most suitable software tool for the task? I am currently looking to do something similar and so far have only tried PAVE (which died with an error message that I'm tracking down). In my case I have performed de novo assembly on a number of genotypes of the same species and now I want to merge those together, identify SNPs and see if longer ESTs can be made by merging contigs across the per-genotype assemblies.

      So, what are the current favourite tools for merging large numbers of contigs coming from de novo transcript assemblies of short read data?

      Comment


      • #4
        Originally posted by Zigster View Post
        When an assembler produces contigs that ostensibly overlap and yet remain separate there is some likely path ambiguity that has not been resolved. It sounds like you'll need more paired-end sequence for your assembly to coalesce.
        The overlaps I've mentioned mainly refer to the contigs that produced by different assembler. So, we want to find a suitable software to assemble them longer.

        Comment


        • #5
          Velvet seems to work fairly well with contig assemblies in my hands, though as Zigster pointed out, the assembly path ambiguity will ultimately prevent use of as much productive overlaps as you'd suspect because there will be discrepancies across assemblers in just the wrong places per contig.

          It may be interesting to "go conservative" with different assemblers' contigs by trimming away their weakpoints. E.g. maybe try trimming away low quality ends of contigs to minimize including ambiguous sequence spans in your secondary assembly. Otherwise you'd expect to get good overlaps in the middle of the contigs but not good alignments at the ends. But each assembler has its challenge area, so you may want to deal with each one in its own way. At some point we (the collective) should put together some cross-assembler lessons learned, and maybe pre-configurations that help tools like Velvet use each assembler's strengths more natively.

          Velvet does have numerous options to tweak though, which I think gives it promise, and you can try "oases" which is a layer on top of Velvet which is intended to allow for splice variants. Marcel Schulz and Daniel Zerbino seem to have put together a very useful (and timely) toolsuite for this type of work. Kudos to them, and thanks to them as well for providing it as they continue perfecting it.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM
          • seqadmin
            An Introduction to the Technologies Transforming Precision Medicine
            by seqadmin


            In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
            01-27-2025, 07:46 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 02-07-2025, 09:30 AM
          0 responses
          65 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-05-2025, 10:34 AM
          0 responses
          101 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-03-2025, 09:07 AM
          0 responses
          79 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 01-31-2025, 08:31 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Working...
          X