Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • P-Richmond
    Member
    • Oct 2010
    • 13

    Genome Assembly

    Hello,
    I am trying to assemble a genome (estimated at ~28mb) and I have the following types of sequencing data:
    454 reads (~3 million reads)
    Illumina Single End 50 bp (~45 million reads )
    Illumina Paired End 50 bp 2kb insert (~40 millions pairs)
    Illumina Single end RNA-seq (multiple conditions pooled) (~50 million reads)


    I am looking for assembly software that can take in the multiple different data types and create a single assembled genome. Previously it has been done using different assemblers for each data type and then merging assemblies--however I imagine that a single assembler which is given the different sequencing datasets at one time would produce a better assembly than simply merging the assemblies together.

    If this doesn't exist, or is discussed in another thread then please point me in the right direction.

    Cheers,
    Phil
  • Jeremy
    Senior Member
    • Nov 2009
    • 190

    #2
    Is that a typo or is the genome really only about 28 mb? A genome that small is considered pretty easy as far as assembling goes.
    With that much data you should have several hundred fold coverage from each one of the data sets listed. I would leave the RNA-seq data out of the genome assembly.
    Anyway, this thread should help you.

    Comment

    • Wallysb01
      Senior Member
      • Feb 2011
      • 286

      #3
      First off, that thread Jeremy linked too is very good. I've found some guidance in that same place.

      Personally, my first strategy would probably be to leave the 454 alone for now, you have plenty of illumina coverage. So, first assemble the illumina data with something like ABySS or SOAP, including doing scaffolding. Then, throw the 454 data in to fill gaps (BASE clear has a stand alone that I believe takes 454).

      If that doesn't work out as you need, which I doubt, you could assembly only contigs from both the illumina and 454 separately, merge them with something like CAP3. Then scaffold and gap fill again using stand alone programs.

      Alternatively you could give all types of data to Ray and assemble them together. Ideally, you'd do all three methods and compare what you get. Don't just trust simple stats like N50 or NG50. I'd suggest aligning your genome assemblies to what ever is the most closely related species with a high quality genome and visualizing it some how. BWA-SW could help you with this, as could something like lastz or MUMmer. With a genome that small you should be able to get a decent sense of how the assembly is going by just scrolling along the alignments in IGV and checking for any sort of funny business (yes, that's the technical term).

      Ignore the RNA seq data until you have a genome that you like, then align the reads to that genome to aid in the annotation process. You could also de novo assembly the RNA into transcripts and align to the genome, or do both. Maker is a nice program to guiding your though annotating your genome. Incorporating RNA-seq into a genome assembly could prove useful one day, but its pretty difficult to do now. Though, the RNA-seq alignments and/or de novo assembled transcript alignments will also help you in determining the quality of your assemblies. Ie. gaps or misassembles in the genome will interfere with transcript and raw read alignments, which you can also visualize in IGV. So you may want to carry through this far with all major versions of your assembly to see which ones contain the most complete genes.

      Good luck!
      Last edited by Wallysb01; 08-06-2012, 11:43 PM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM
      • seqadmin
        Investigating the Gut Microbiome Through Diet and Spatial Biology
        by seqadmin




        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
        02-24-2025, 06:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-20-2025, 05:03 AM
      0 responses
      17 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-19-2025, 07:27 AM
      0 responses
      18 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      19 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      185 views
      0 reactions
      Last Post seqadmin  
      Working...