Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo 454 assembly w/ newbler ... how long?

    I'm having some issues with a newbler assembly that I posted about in another forum (but probably should have posted here ... hopefully this isn't a ghost town!); essentially, though, my concern is this: can someone give me an idea of how long their assemly runs with newbler have taken? I've got ~1.7 million reads (N50 ~250bp) from a plant genome, and I have an assembly rum that's going on ~65 hours now ... is that normal, or excessive?

    Any comments would be appreciated.

    ~Joe

  • #2
    Hi Joe,

    Your difficulty on assemling plants 454 data is expected.

    Plant sequences are highly repetitive. The 454 assembly running time is porportional to the degree of repeats in the data set.

    Typically, for bacterial data of your size, it takes only couple of hours to finish. But for plants, it can go on to several days, or not finishing at all, and our of memory crash.

    If you do pre-processing removing the repetitive reads in your data, it may help to get results faster and maybe better contigs to start with. Generally, plants are tough on bioinformatics for de novo assembly.
    Last edited by hlu; 01-02-2009, 02:27 PM.

    Comment


    • #3
      Hi,
      I've been working with much smaller genomes, bacterial approx. 4.5mb in size, 1.6million assembled reads. Using Newbler version 2.0, 64bit, checking the 'complex large genome' tab it took approx. 40min to perform the de novo assembly.

      As mentioned in the previous post, plant genomes are alot more of a headache bioinformatically and require a hefty amount of processing time. But 65h + does seem alot, when compared to the bacterial genome. Check with Roche as newbler may be RAM dependent, up'ing it may speed up the assembly?!??!?

      Comment


      • #4
        Thanks Raj -- I should have noted that I think I sounded the alarm too soon; my runs are finishing in several days ... it just appeared for a while that there was no progress and I was unfamiliar with newbler's behavior. I'm using the '-m' flag to keep all reads in memory, which should speed up the runs ... and they appear to be maxing out at ~10G.

        I've also removed reads that blasted well to RepBase's various plant libraries, and am re-assembling, but unfortunately haven't been timing the assembly runs exactly ... if I get a chance to benchmark raw and no-repeat assemblies against each other, I'll try to post results here.

        Comment


        • #5
          Hi,

          I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???

          Comment


          • #6
            Originally posted by AAWT View Post
            Hi,

            I recently got some data for transcriptome sequencing by 454, and want to analyze by using CLC workbench, for the de novo assembly, I got huge difference by only changing the minimum length of contigs, this confused me for further analysis, did anyone used the same, what are the recommendations???
            well, first, please don't hijack threads, open a new one.

            Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
            E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

            Did I get you right?

            hth, Sven

            Comment


            • #7
              Originally posted by sklages View Post
              well, first, please don't hijack threads, open a new one.

              Concerning your concern :-) ... if you raise the minimum contig length to something longer than the length of a substantial number of contigs in your assembly, then of course this influences the overall result.
              E.g. if you have a lot of very short fragments of 300bp and you raise the minimum contig length to 350bp, then you will loose a lot of contigs ..

              Did I get you right?

              hth, Sven
              Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????

              Comment


              • #8
                Originally posted by AAWT View Post
                Yes this is also one concern to loose some data and the other thing which I did was the local blast of de novo assembled contigs with already available reference sequence, what I got,,,,,,,with the same ref seq unigene many contigs aligned which give the confusion that many contigs have same seuence, so why the same seq include in as many contigs during de novo assembly,,,,,,,,what does it mean that the assembly is very poor,,,,,,,,,,or or or,,,,,,????
                It does not necessarily mean that your contig sequences are identical;
                probably they are very similar, *almost* identical. Depending on the
                kind of assembler these are put together or, in your case, not.
                CLC is not really a cDNA denovo Assembler and quality of the results
                obtained may vary.

                And, did you trim your data (polyA, potential adaptors)? This will influence
                your assembly as well.

                Last but not least, to give you a kind of feeling for your dataset,
                try to use another assembler, at least as a "reference assembly",
                e.g. Roche's Newbler or MIRA.
                However, if your dataset is huge and the library is not normalised you may
                run into problems with most straight forward assembly approaches.

                hth, Sven

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Today, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 07:17 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-29-2024, 10:49 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Working...
                X