Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create longer contigs from transcriptome assembly

    Hi everyone,

    I am doing transcriptome assembly on short Illumina reads (both single end and paired end with avg length of 100 bp).
    I use different transcriptome assemblers, like MIRA, Velvet Oases, SOAPdenovoTrans.
    The avg contig length I got varies from 400 to 1000 bp.
    I tried different parameters, like different kmer and different min contig size, but the result does not improve a lot.
    I am interested in making much longer contigs (few thousands), so I was wondering is there any way how can I improve and increase this current contig length.
    Also, I would like to know why this length varies a lot between different transcriptome assemblers (there is a huge difference between length 400 and length 1000).
    I know that Velvet and SOAPdenovo are based on de Bruijn graphs, while MIRA is OLC based.

    I would appreciate a lot if someone can share some similar experience with me.
    Thank you very much,
    Best Regards,
    Natasha

  • #2
    So ... what average contig length do you expect? And why? While the transcriptome projects that come through my hands do generate some transcripts in the 'few thousands' most of them are much shorter.

    The average length *may* vary because of the number of short transcripts being kept between the various programs. If one program keeps all transcripts while another throws away transcripts less than 200 bases then your average will vary even if the longest transcripts do not. Really you can not say much about average lengths unless you also know the shortest/longest lengths and the distribution.

    Comment


    • #3
      Also, if you are going to do denovo transcriptome assembly then you really owe it to yourself to try out Trinity instead of using non-transcriptome assemblers.

      Comment


      • #4
        The length of your assemblies will be greatly impacted on expression level of the genes you're assembling. Even in very deeply sequenced samples, and with replicates, its going to be very hard to assemble very many genes from TSS to polyA, or even start to stop codon. Simple statistics like N50, or mean length, just don't mean much for transcriptomes.

        You need to do some sort of orthology assignment to get an idea of how complete your assembly is, or how one assembly compares to another. If you're going for simple statistics about your assembly. I'd much rather just look at number of transcripts >1kbp than N50/average length, because its usually in the 500-1000bp range that you start getting meaningful information for downstream analysis.

        And to try to answer your questions about how to improve assembly length, I would just say try trans-ABySS (which uses multiple k-mer approach and might be the best assembler in terms of completeness) and Trinity (does a nice job with length and ease of downstream analysis). From your use of Velvet Oases, it sounds like you're doing this on microbes, but you may still find success with those two.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X