Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembling, please help

    Dear all,

    I have just started analyzing my first Illumina GAII pe data. Please tolerate me if I asked stupid questions. I have a dataset of multiplex mitochondrial shotgun sequences. Because there is no reference sequence available, I used abyss to assemble the sequences after demultiplexing, merge and quality control. I got thousands, if not tens of thousands contigs output from abyss. The longest one is only 3 kb and most of them are very shot contigs. What can I do next to assemble them into whole mitochondrial genomes?

    I read a paper (Perry et al, 2010 MolEcol) talking about filtering the abyss output sequences by their coverage and their similarity to reference mtgenome, where the sequence from the same species was available in their case. Then they used the selected contigs to assemble the final mtgenome, but no details was given. I also tried SOAPdenovo, and I got "segmentation fault". I know it's probably my fault, but I doubt it can give me a fully assembled mtgenome anyway. So please help if you have any suggestions or you can give me any direction.

    Many thanks in advance.

  • #2
    Gosh, you are asking the 'holy grail' question that everyone is the world is trying to figure out !! What you got out of the assembler is exactly what we always get. Do not expect any short read assembler to spit out perfect chromosomes. That is as unlikely as sending a new graduate student to bench for the first time and expect her to come up with Nobel-winning result.

    At the step where you are, most people try the following things -

    i) play with ABySS parameters, such as K-mer lengths, and see whether the fragments get larger,
    ii) filter out reads that already assembled and then send the rest into SOAPdenovo (assuming memory size was the reason for crash),
    iii) sequence libraries with multiple mate pair sizes (say something short like 250 nt and something longer in kBs).

    However, I would recommend something else. If this is a metagenomic sample with different mitochondria chromosomes present at different frequencies, you may try out some transcriptome assemblers. I explained the difference between genome assembler and transcriptome assembler here in the last paragraph (http://www.homolog.us/blogs/?p=158). If you follow my argument, you will find that metagenomes have more similarities with transcriptomes than genomes.

    Hope that helps.
    http://homolog.us

    Comment


    • #3
      You said mulitplexed - as in barcoded right? So each barcoded sample is a single organism's mitochondria? If so you should be fine trying to assemble each individually.

      Are these animals, plants, yeast? My point is there should be published mito genomes which are not too distant. I would therefore also try some reference guided assembles. Once you have one done well, it could be used to guide the assembly of you other samples (assuming they are related in some way). You might also consider getting some long reads for at least one mito to help assemble it as your reference circle - if the available complete mito are too distant to help.

      Comment


      • #4
        Samanta, maubp,

        Thanks a lot for all suggestions. I will try those methods.

        Yes, those are 96 barcoded samples in one GAII lane. They are fish mt-genome, each should be around 16 kb. So there should be enough coverage for each sample and it should be easy to assemble it because of the small size of the mt-genome. Maybe my raw data are not good or the mitochondrial sequences are very different in different species. I tried BWA using closely related species as the reference, but didn't get many hits. I will mess around with different parameters and try again.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X