Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembling, please help

    Dear all,

    I have just started analyzing my first Illumina GAII pe data. Please tolerate me if I asked stupid questions. I have a dataset of multiplex mitochondrial shotgun sequences. Because there is no reference sequence available, I used abyss to assemble the sequences after demultiplexing, merge and quality control. I got thousands, if not tens of thousands contigs output from abyss. The longest one is only 3 kb and most of them are very shot contigs. What can I do next to assemble them into whole mitochondrial genomes?

    I read a paper (Perry et al, 2010 MolEcol) talking about filtering the abyss output sequences by their coverage and their similarity to reference mtgenome, where the sequence from the same species was available in their case. Then they used the selected contigs to assemble the final mtgenome, but no details was given. I also tried SOAPdenovo, and I got "segmentation fault". I know it's probably my fault, but I doubt it can give me a fully assembled mtgenome anyway. So please help if you have any suggestions or you can give me any direction.

    Many thanks in advance.

  • #2
    Gosh, you are asking the 'holy grail' question that everyone is the world is trying to figure out !! What you got out of the assembler is exactly what we always get. Do not expect any short read assembler to spit out perfect chromosomes. That is as unlikely as sending a new graduate student to bench for the first time and expect her to come up with Nobel-winning result.

    At the step where you are, most people try the following things -

    i) play with ABySS parameters, such as K-mer lengths, and see whether the fragments get larger,
    ii) filter out reads that already assembled and then send the rest into SOAPdenovo (assuming memory size was the reason for crash),
    iii) sequence libraries with multiple mate pair sizes (say something short like 250 nt and something longer in kBs).

    However, I would recommend something else. If this is a metagenomic sample with different mitochondria chromosomes present at different frequencies, you may try out some transcriptome assemblers. I explained the difference between genome assembler and transcriptome assembler here in the last paragraph (http://www.homolog.us/blogs/?p=158). If you follow my argument, you will find that metagenomes have more similarities with transcriptomes than genomes.

    Hope that helps.
    http://homolog.us

    Comment


    • #3
      You said mulitplexed - as in barcoded right? So each barcoded sample is a single organism's mitochondria? If so you should be fine trying to assemble each individually.

      Are these animals, plants, yeast? My point is there should be published mito genomes which are not too distant. I would therefore also try some reference guided assembles. Once you have one done well, it could be used to guide the assembly of you other samples (assuming they are related in some way). You might also consider getting some long reads for at least one mito to help assemble it as your reference circle - if the available complete mito are too distant to help.

      Comment


      • #4
        Samanta, maubp,

        Thanks a lot for all suggestions. I will try those methods.

        Yes, those are 96 barcoded samples in one GAII lane. They are fish mt-genome, each should be around 16 kb. So there should be enough coverage for each sample and it should be easy to assemble it because of the small size of the mt-genome. Maybe my raw data are not good or the mitochondrial sequences are very different in different species. I tried BWA using closely related species as the reference, but didn't get many hits. I will mess around with different parameters and try again.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 07:20 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-16-2024, 05:49 AM
        0 responses
        38 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-15-2024, 06:53 AM
        0 responses
        44 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Working...
        X