Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Knowledge about SOAPdenovo-trans

    Hi all,
    I found a unpublished tool for transcriptome de novo assembly: SOAPdenovo-trans but I haven't found anything about it's performance in comparison with e.g. Trinity or Oases. I heard, that it shall do a good job.

    Can anyone say something about its performance?

    Thanks in advance,

  • #2
    Knowledge about SOAPdenovo-trans

    Find here a summary of my uses of your RNA transcript assemblers,
    comparing with what I see as 3 good and improving programs for
    this: Velvet/Oases, Trinity and SOAPdenovo-Trans.


    Very briefly, three de-novo assemblers tested here are closely ranked,
    and ranking depends on the particular species and data set used.
    Locust insect: Velvet/O > Trinity > SOAPTrans
    Cacao plant: SOAPTrans > Trinity > Velvet/O >> Cufflinks
    Daphnia waterflea: Velvet/O > SOAPTrans > Trinity >> Cufflinks

    SOAPTrans in particular can assembly better, quicker, with less memory use than the other two. It can also fail inexplicably, or do worse than the others.
    My recommendation is to try these three, see which works for you and if possible use them all and extract the best subset by some gene evidence criteria (like homology, high coding ratio, ...).


    • #3
      Hi Don,
      thanks for your post, it was really helpful.

      Regarding SOAP-trans what k-mer seems the best to you to use, as it has no multi-k function like oases? I am interested in the low and highly expressed genes in my transcriptome.



      • #4
        In between 25-31 would be good for RNA seq


        • #5
          SOAPdenovo-trans and kmer size for best gene assembly

          This is an old thread but still relevant as there is much mis-information about this. Given that great improvements have been made in Illumina read quality since the early generation of short short 35 bp reads, we need to revisit how best to assemble these. Kmer size shreds reads to smaller pieces to better assemble, but when reads are accurate, shredding introduces errors by allowing mis-mated reads to be assembled together.

          For highly expressed genes, that are long and somewhat repetetive (eg. muscle genes), small kmers are a problem for inaccuract gene assembly, even though there use can lead to that technical measure of "more reads assembled". We should care more about "more accurately assembled genes". When I use kmer sizes up to the read size (eg. 100 bp or longer), I get the most accurate gene assemblies for some of the loci that are well expressed. On average, the most accurare gene assemblies are for kmers above 35 ranging to 95. This holds for SOAPtrans, Velvet/Oases, idba-trans, and is why these do better than Trinity, since the later is restricted to 25 or 31 kmer.

          Here is a recent example from the yellow fever mosquito Anopheles, for longest 10,000 genes assembled, best kmer size:
          10k_longest 1k_long
          226 k05 18 k05
          1224 k25 92 k25
          2912 k35 335 k35
          1852 k45 197 k45
          1553 k55 182 k55
          1069 k65 106 k65
          522 k75 28 k75
          414 k85 21 k85
          228 k95 21 k95

          Best assembler:
          10k_longest 1k_long
          4684 velo 580 velvet/oases
          3675 idba 275 idba-trans
          1306 soap 116 soapdenovo
          335 trin 29 trinity

          E.g. Velvet/oases remains the most capable accurate gene assembler, and does so in part by doing well with kmer > 30 gene assemblies. SOAP denovo remains good, but "idba-trans" has surpassed it in producing 2nd most accurate assemblies. Trinity is in last place still (and this is w/ mos recent 2014/2015 version).

          Another important note is these genes assembled from mRNA-seq are more accurate, more orthology-complete, than the gene models from MAKER predicted on genome assembly of mosquitoes. RNA-seq and MAKER genes reported in doi: 10.1126/science.1258522, 2015, Highly evolvable malaria vectors:the genomes of 16 Anopheles mosquitoes.


          • #6

            I'm using SOAPdenovo-Trans to assemble the SOLiD single-end reads of 50bp length. The input fastq file contains 112537370 reads. My config file is as follows:

            #maximal read length
            #in which part(s) the reads are used
            #fastq file for single reads

            Then I ran the following command:
            ./SOAPdenovo-Trans all -s config_file -o outputGraph -R -L 300

            It has been two days since the process started and is still continuing. But there has been to changes in the output directory. This makes me doubt if the process is stuck somewhere or the command I gave is incorrect. The only thing I can see in my command prompt is this:

            The version 1.03: released on July 19th, 2013

            pregraph -s soap.config -K 23 -o outputGraph
            In soap.config, 1 libs, max seq len 50, max name len 256
            8 thread created
            read from file:
            --- 100000000th reads
            --- 200000000th reads
            --- 300000000th reads
            --- 400000000th reads
            --- 500000000th reads
            --- 600000000th reads
            --- 700000000th reads
            --- 800000000th reads
            --- 900000000th reads
            --- 1000000000th reads
            --- 1100000000th reads
            --- 1200000000th reads
            --- 1300000000th reads
            --- 1400000000th reads
            And it is still continuing. I wonder because it exceeds the number of reads that is in the input file.


            Latest Articles


            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin

              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin

              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM





            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            Last Post seqadmin