Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cli
    Member
    • Mar 2011
    • 29

    de novo assembling, please help

    Dear all,

    I have just started analyzing my first Illumina GAII pe data. Please tolerate me if I asked stupid questions. I have a dataset of multiplex mitochondrial shotgun sequences. Because there is no reference sequence available, I used abyss to assemble the sequences after demultiplexing, merge and quality control. I got thousands, if not tens of thousands contigs output from abyss. The longest one is only 3 kb and most of them are very shot contigs. What can I do next to assemble them into whole mitochondrial genomes?

    I read a paper (Perry et al, 2010 MolEcol) talking about filtering the abyss output sequences by their coverage and their similarity to reference mtgenome, where the sequence from the same species was available in their case. Then they used the selected contigs to assemble the final mtgenome, but no details was given. I also tried SOAPdenovo, and I got "segmentation fault". I know it's probably my fault, but I doubt it can give me a fully assembled mtgenome anyway. So please help if you have any suggestions or you can give me any direction.

    Many thanks in advance.
  • samanta
    Senior Member
    • Feb 2010
    • 108

    #2
    Gosh, you are asking the 'holy grail' question that everyone is the world is trying to figure out !! What you got out of the assembler is exactly what we always get. Do not expect any short read assembler to spit out perfect chromosomes. That is as unlikely as sending a new graduate student to bench for the first time and expect her to come up with Nobel-winning result.

    At the step where you are, most people try the following things -

    i) play with ABySS parameters, such as K-mer lengths, and see whether the fragments get larger,
    ii) filter out reads that already assembled and then send the rest into SOAPdenovo (assuming memory size was the reason for crash),
    iii) sequence libraries with multiple mate pair sizes (say something short like 250 nt and something longer in kBs).

    However, I would recommend something else. If this is a metagenomic sample with different mitochondria chromosomes present at different frequencies, you may try out some transcriptome assemblers. I explained the difference between genome assembler and transcriptome assembler here in the last paragraph (http://www.homolog.us/blogs/?p=158). If you follow my argument, you will find that metagenomes have more similarities with transcriptomes than genomes.

    Hope that helps.
    http://homolog.us

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      You said mulitplexed - as in barcoded right? So each barcoded sample is a single organism's mitochondria? If so you should be fine trying to assemble each individually.

      Are these animals, plants, yeast? My point is there should be published mito genomes which are not too distant. I would therefore also try some reference guided assembles. Once you have one done well, it could be used to guide the assembly of you other samples (assuming they are related in some way). You might also consider getting some long reads for at least one mito to help assemble it as your reference circle - if the available complete mito are too distant to help.

      Comment

      • cli
        Member
        • Mar 2011
        • 29

        #4
        Samanta, maubp,

        Thanks a lot for all suggestions. I will try those methods.

        Yes, those are 96 barcoded samples in one GAII lane. They are fish mt-genome, each should be around 16 kb. So there should be enough coverage for each sample and it should be easy to assemble it because of the small size of the mt-genome. Maybe my raw data are not good or the mitochondrial sequences are very different in different species. I tried BWA using closely related species as the reference, but didn't get many hits. I will mess around with different parameters and try again.

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          Yesterday, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        18 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        52 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        111 views
        0 reactions
        Last Post SEQadmin2  
        Working...