Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • NatashaPavlovikj
    Junior Member
    • Feb 2013
    • 8

    Create longer contigs from transcriptome assembly

    Hi everyone,

    I am doing transcriptome assembly on short Illumina reads (both single end and paired end with avg length of 100 bp).
    I use different transcriptome assemblers, like MIRA, Velvet Oases, SOAPdenovoTrans.
    The avg contig length I got varies from 400 to 1000 bp.
    I tried different parameters, like different kmer and different min contig size, but the result does not improve a lot.
    I am interested in making much longer contigs (few thousands), so I was wondering is there any way how can I improve and increase this current contig length.
    Also, I would like to know why this length varies a lot between different transcriptome assemblers (there is a huge difference between length 400 and length 1000).
    I know that Velvet and SOAPdenovo are based on de Bruijn graphs, while MIRA is OLC based.

    I would appreciate a lot if someone can share some similar experience with me.
    Thank you very much,
    Best Regards,
    Natasha
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    So ... what average contig length do you expect? And why? While the transcriptome projects that come through my hands do generate some transcripts in the 'few thousands' most of them are much shorter.

    The average length *may* vary because of the number of short transcripts being kept between the various programs. If one program keeps all transcripts while another throws away transcripts less than 200 bases then your average will vary even if the longest transcripts do not. Really you can not say much about average lengths unless you also know the shortest/longest lengths and the distribution.

    Comment

    • westerman
      Rick Westerman
      • Jun 2008
      • 1104

      #3
      Also, if you are going to do denovo transcriptome assembly then you really owe it to yourself to try out Trinity instead of using non-transcriptome assemblers.

      Comment

      • Wallysb01
        Senior Member
        • Feb 2011
        • 286

        #4
        The length of your assemblies will be greatly impacted on expression level of the genes you're assembling. Even in very deeply sequenced samples, and with replicates, its going to be very hard to assemble very many genes from TSS to polyA, or even start to stop codon. Simple statistics like N50, or mean length, just don't mean much for transcriptomes.

        You need to do some sort of orthology assignment to get an idea of how complete your assembly is, or how one assembly compares to another. If you're going for simple statistics about your assembly. I'd much rather just look at number of transcripts >1kbp than N50/average length, because its usually in the 500-1000bp range that you start getting meaningful information for downstream analysis.

        And to try to answer your questions about how to improve assembly length, I would just say try trans-ABySS (which uses multiple k-mer approach and might be the best assembler in terms of completeness) and Trinity (does a nice job with length and ease of downstream analysis). From your use of Velvet Oases, it sounds like you're doing this on microbes, but you may still find success with those two.

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 11:08 AM
        0 responses
        7 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        53 views
        0 reactions
        Last Post SEQadmin2  
        Working...