Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly: raw data type & volume

    Hi,
    I'm trying to assemble 454 raw reads (gDNA) with Newbler 2.6. Can anyone tell me what is the maximum volume of raw data (nt) newbler could intake in one-step or incremental form (as it could assemble large genome of up to 3Gb in size) ? Also the proportion of shotgun and mate-paired reads we should use in order to have a better assembly ?

  • #2
    I have done some assemblies with some pretty large data sets, in the 40-50 Gb range. With the large and het options, I can get an assembly, without, they simply never finish. By never, I mean, after 6 weeks of processing, no updates of the status files for several weeks. I did these on machine with 1 TB RAM. I tried various incremental assemblies and different parameters and essentially got to the same place as when I presented Newbler with all the data. I didn't see any improvements with the CIO options.

    Comment


    • #3
      Thanks Bob........did you use '-m' option or others advanced options? And also trim the dataset? I had the trouble when tried to feed the trimmed and split 454 mate-paired reads. Because Newbler couldn't detect them as mate-paired though it is not seen for Illumina paired-end reads after trimming.

      And for a large eukaryotic genome, say 3Gb in size, the 40-50Gb dataset you used covers only 16-17(x) of the whole genome that might not be quite enough whereas, for a 300Mb genome the figure reached upto 167(x)! So, is there any rule that what coverage we should initially use while trying to assembling a large genome?

      Comment


      • #4
        Hi,
        yes, we do quality and contaminant trimming. Newbler looks for the linker, so if you mean you are splitting the reads and removing the linker, that doesn't work. Or at least, didn't last time I did that.
        As for rules of thumb, I always refer to the Broad's guidelines. Which often don't work, but you have to start somewhere.
        And yes, we did try the -m an other options. I probably tried about 20-30 different combinations.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X