Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembling, please help

    Dear all,

    I have just started analyzing my first Illumina GAII pe data. Please tolerate me if I asked stupid questions. I have a dataset of multiplex mitochondrial shotgun sequences. Because there is no reference sequence available, I used abyss to assemble the sequences after demultiplexing, merge and quality control. I got thousands, if not tens of thousands contigs output from abyss. The longest one is only 3 kb and most of them are very shot contigs. What can I do next to assemble them into whole mitochondrial genomes?

    I read a paper (Perry et al, 2010 MolEcol) talking about filtering the abyss output sequences by their coverage and their similarity to reference mtgenome, where the sequence from the same species was available in their case. Then they used the selected contigs to assemble the final mtgenome, but no details was given. I also tried SOAPdenovo, and I got "segmentation fault". I know it's probably my fault, but I doubt it can give me a fully assembled mtgenome anyway. So please help if you have any suggestions or you can give me any direction.

    Many thanks in advance.

  • #2
    Gosh, you are asking the 'holy grail' question that everyone is the world is trying to figure out !! What you got out of the assembler is exactly what we always get. Do not expect any short read assembler to spit out perfect chromosomes. That is as unlikely as sending a new graduate student to bench for the first time and expect her to come up with Nobel-winning result.

    At the step where you are, most people try the following things -

    i) play with ABySS parameters, such as K-mer lengths, and see whether the fragments get larger,
    ii) filter out reads that already assembled and then send the rest into SOAPdenovo (assuming memory size was the reason for crash),
    iii) sequence libraries with multiple mate pair sizes (say something short like 250 nt and something longer in kBs).

    However, I would recommend something else. If this is a metagenomic sample with different mitochondria chromosomes present at different frequencies, you may try out some transcriptome assemblers. I explained the difference between genome assembler and transcriptome assembler here in the last paragraph (http://www.homolog.us/blogs/?p=158). If you follow my argument, you will find that metagenomes have more similarities with transcriptomes than genomes.

    Hope that helps.
    http://homolog.us

    Comment


    • #3
      You said mulitplexed - as in barcoded right? So each barcoded sample is a single organism's mitochondria? If so you should be fine trying to assemble each individually.

      Are these animals, plants, yeast? My point is there should be published mito genomes which are not too distant. I would therefore also try some reference guided assembles. Once you have one done well, it could be used to guide the assembly of you other samples (assuming they are related in some way). You might also consider getting some long reads for at least one mito to help assemble it as your reference circle - if the available complete mito are too distant to help.

      Comment


      • #4
        Samanta, maubp,

        Thanks a lot for all suggestions. I will try those methods.

        Yes, those are 96 barcoded samples in one GAII lane. They are fish mt-genome, each should be around 16 kb. So there should be enough coverage for each sample and it should be easy to assemble it because of the small size of the mt-genome. Maybe my raw data are not good or the mitochondrial sequences are very different in different species. I tried BWA using closely related species as the reference, but didn't get many hits. I will mess around with different parameters and try again.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM
        • seqadmin
          Understanding Genetic Influence on Infectious Disease
          by seqadmin




          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
          09-09-2024, 10:59 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-26-2024, 12:57 PM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Working...
        X