Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create longer contigs from transcriptome assembly

    Hi everyone,

    I am doing transcriptome assembly on short Illumina reads (both single end and paired end with avg length of 100 bp).
    I use different transcriptome assemblers, like MIRA, Velvet Oases, SOAPdenovoTrans.
    The avg contig length I got varies from 400 to 1000 bp.
    I tried different parameters, like different kmer and different min contig size, but the result does not improve a lot.
    I am interested in making much longer contigs (few thousands), so I was wondering is there any way how can I improve and increase this current contig length.
    Also, I would like to know why this length varies a lot between different transcriptome assemblers (there is a huge difference between length 400 and length 1000).
    I know that Velvet and SOAPdenovo are based on de Bruijn graphs, while MIRA is OLC based.

    I would appreciate a lot if someone can share some similar experience with me.
    Thank you very much,
    Best Regards,
    Natasha

  • #2
    So ... what average contig length do you expect? And why? While the transcriptome projects that come through my hands do generate some transcripts in the 'few thousands' most of them are much shorter.

    The average length *may* vary because of the number of short transcripts being kept between the various programs. If one program keeps all transcripts while another throws away transcripts less than 200 bases then your average will vary even if the longest transcripts do not. Really you can not say much about average lengths unless you also know the shortest/longest lengths and the distribution.

    Comment


    • #3
      Also, if you are going to do denovo transcriptome assembly then you really owe it to yourself to try out Trinity instead of using non-transcriptome assemblers.

      Comment


      • #4
        The length of your assemblies will be greatly impacted on expression level of the genes you're assembling. Even in very deeply sequenced samples, and with replicates, its going to be very hard to assemble very many genes from TSS to polyA, or even start to stop codon. Simple statistics like N50, or mean length, just don't mean much for transcriptomes.

        You need to do some sort of orthology assignment to get an idea of how complete your assembly is, or how one assembly compares to another. If you're going for simple statistics about your assembly. I'd much rather just look at number of transcripts >1kbp than N50/average length, because its usually in the 500-1000bp range that you start getting meaningful information for downstream analysis.

        And to try to answer your questions about how to improve assembly length, I would just say try trans-ABySS (which uses multiple k-mer approach and might be the best assembler in terms of completeness) and Trinity (does a nice job with length and ease of downstream analysis). From your use of Velvet Oases, it sounds like you're doing this on microbes, but you may still find success with those two.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          Today, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-24-2024, 06:58 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-23-2024, 08:43 AM
        0 responses
        56 views
        0 likes
        Last Post seqadmin  
        Working...
        X