Announcement

Collapse
No announcement yet.

De Novo Assembly of a transcriptome

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • berath
    replied
    I would like to take the community's opinions on the differential expression analysis of a de novo assembled transcriptome.

    We are studying a non-model organism with no genome sequence information; we have 99-bp SE Illumina reads and testing the differential expression for two experimental conditions with two biological replicates each.

    For the de novo transcriptome assembly, we have utilized all four lanes and used velvet-oases (multi-k) and trinity packages. Both assembly metrics and biological annotation suggested that the velvet-oases produced a (slightly) better assembly.

    For the DE analysis, is it a better approach to use an alignment software to map quality checked sequencing reads (from individually tested condition) to the annotated contigs constructed by the combined assembly (from all conditions) and calculate RPKM values

    or

    construct two separate de novo assemblies for each experimental condition, extract the number of reads & fragments of an annotated contig and compare it to those of the same gene coding contig from the other assembly?

    The second approach seems to be integrated in the Trinity package (as FPKM values for each contig); however, as noted in this thread earlier the authors agree that the values are approximate. I assume read_tracking and amos file option from velvet would allow to extract similar info.

    Any thoughts?
    Thanks..

    Leave a comment:


  • Bueller_007
    replied
    Why aren't people using STM for combining runs from multiple k values?
    http://genome.cshlp.org/content/20/10/1432.full

    Leave a comment:


  • boetsie
    replied
    I don't think it is a good idea to use SSPACE for merging assemblies. Of course contigs can be combined if pairs can be found, however it will not merge full assemblies. You will still end up with the initial size of the total assembly of different k-mers.

    Best way to go is using a tool that merges assemblies like Zorro or GAM. Have a look at this thread for a list of these tools;

    http://seqanswers.com/forums/showthr...ighlight=zorro

    Boetsie

    Originally posted by dnusol View Post
    Hi, just some more info on memory use

    velvetg k-mer 31 with 127M reads peaked at 250Gb RAM for 18 cores, took half an hour to run, and produced about 320Gb of output data.

    Regarding merging output from different kmers, how about Minimus2 or SSPACE?

    HTH,

    D

    Leave a comment:


  • dnusol
    replied
    Hi, just some more info on memory use

    velvetg k-mer 31 with 127M reads peaked at 250Gb RAM for 18 cores, took half an hour to run, and produced about 320Gb of output data.

    Regarding merging output from different kmers, how about Minimus2 or SSPACE?

    HTH,

    D

    Leave a comment:


  • Jenzo
    replied
    Originally posted by ikim View Post
    For ppl running mult-kmers of Velvet, any suggestions on how to combine the assemblies? I used to use vmatch but it seems that their 'nonredundant' setting clusters together much more than just nonredundant data.
    Dear ikim,
    I also try to combine different assemblies and was not satisfied with results of vmatch and cd-hit-est. For me, assembling all the contigs with cap3 or tigr works much better than clustering with VMatch or cd-hit-est.
    To get a idea, how redundant my final dataset is, I think I will blast it against itself..
    If you got a good solution for efficient clustering to gain a nonredundant set of contigs, please let me know :-)
    Best wishes!

    Leave a comment:


  • ikim
    replied
    Originally posted by dnusol View Post
    Hi Apexy, thanks for your input,

    I thought small kmers would work worse for long reads (105bp) that is why I chose in the 31-45 range.
    Since my last post I got some more news: velvetg peaked at 56Gb RAM for kmer 31 and about 40M reads (keep in mind read_trkg was on, as suggested in the manual, which seems to be memory-hungry).

    Best,

    David
    My exp is likewise; very small kmer settings for longer reads are far from optimal and take a great deal of resources. My runs are generally 31-61 mer. Memory usage between 8 - 28 GB for our typical ~60 M, 90bp paired end reads, runs for 5-6 hours.
    A single equivalent run of Trinity seems to top at 68 GB (4 days to run using 5 processors, 3 days using 8), we set butterfly memory allocation to 10GB so when run with -CPU 8, max mem would have been 80GB though it never got using that much. Our latest 150M mixed library run at CPU 10 took 5 days).
    Initial annotations suggest a single Trinity run yields better results than one Velvet/Oases run (n50 size, assembly size, refseq matches, cds numbers).
    I'm liking how Trinity being three programs allows better handling of recovery runs.
    For ppl running mult-kmers of Velvet, any suggestions on how to combine the assemblies? I used to use vmatch but it seems that their 'nonredundant' setting clusters together much more than just nonredundant data.

    Leave a comment:


  • dnusol
    replied
    hi Mbandi,

    I am setting the kmer length using the automatic option for multiple kmers on velveth, first run velveth and then just tried the first kmer length on velvetg to assess memory usage. So I still have to run velvetg on the three other kmers specified. I am not intending to run everything simultaneously but I do plan to try velvetg on my full set of reads (127M) to test memory needs for future.

    I already preprocessed my set and then selected a random subset to reduce size, but I don't think going down below 30% of my full set is a good idea.

    There is a thread on Oases user-list regarding memory usage that may be of interest to someone.

    http://listserver.ebi.ac.uk/pipermai...ne/000190.html


    Best,

    David

    Leave a comment:


  • Apexy
    replied
    Hi David,

    Just to add to my previous post, -read_trkg & -amos_file yes was on at oases. My reads were of varying lengths min=30 and max=60. I do not disagree on the memory usage you require but I was just amazed compared to my little experience. Well I will run mine this time on 31M reads and see if it crashed. Are you running one k at-a-time? Does velveth precede velvetg immediately(in a script) or separately? From what I gather, unprocessed reads increase the complexity of the de Bruijn graph with more memory imprint. You can also make a rendez vous on the velveth & oases mailing list and benefit from more experience hands.

    HTH,

    Mbandi

    Leave a comment:


  • dnusol
    replied
    Hi Apexy, thanks for your input,

    I thought small kmers would work worse for long reads (105bp) that is why I chose in the 31-45 range.
    Since my last post I got some more news: velvetg peaked at 56Gb RAM for kmer 31 and about 40M reads (keep in mind read_trkg was on, as suggested in the manual, which seems to be memory-hungry).

    Best,

    David

    Leave a comment:


  • Apexy
    replied
    Hi dnusol,
    I'm not experience with assembly, but I started running velveth->velvetg->oases (k iterations pipeline) with 10601688 reads (paired and single). The memory constrain was profound and it always crashed. I was advice to abstain from very low k values. I do my iterations on 19 <= k <=29 with only 5G of memory allocated to the whole process (although not all is used when I look at the log file on the job id) and it take 31.55 mins. I use a 31G, 16 processor machine which I share with others. With 40M reads of yours, it is obvious you would need more memory. However I advice you to start with k=19.
    Cheers

    Leave a comment:


  • dnusol
    replied
    Hi, here my two cents: the idea I follow is to use Trinity and then Velvet/Oases on different kmers for de novo transcriptome. I will run both and then assemble the results to create a consensus transcriptome. At the moment, I have run Trinity using 127M 105bp reads (mixed paired-single but used as single-end as Trinity seems to use info only for mate-pairs not paired-reads) on my 24Gb RAM 8 processors box and had no problem on default parameters (I think it took two days or so).

    I am now trying to run velvet on a subset of those (40M mixed single-paired) and am running out of memory, so I am trying with a larger computer. I guess I will also run into problems when Oases comes.

    Best.

    Leave a comment:


  • lletourn
    replied
    For those interested this just came out:
    Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.
    http://www.biomedcentral.com/1471-2164/12/317/abstract

    Leave a comment:


  • Wallysb01
    replied
    Some data for those trying to figure out which programs to run for transcriptome data:

    I tried to run Trinity on 1 lane from the HiSeq, ~100M 105 bp paired end reads, on a machine with 64 GBs of RAM and 4 Xeon processors (though the processor is not the problem), and it crashed after creating all the kmers in the de Bruijn graph and then trying to create contigs.

    I'll be moving on to ABySS, as it seems to be much more memory efficient, and despite having access to one of the world's largest super computers, I can't get more than 64 GBs of RAM (makes me wonder what's so super about it).

    Leave a comment:


  • Apexy
    replied
    evaluating transcriptome assembly from k mer iterations

    Originally posted by blackgore View Post
    How are people evaluating their transcriptome assemblies? The standard N50 assessment can't be that useful, as the goal here isn't exactly to generate a tiny set of huge contigs...?
    Hi,

    A comparative approach was suggested by a user on the oases mailing list.
    http://listserver.ebi.ac.uk/pipermai...ry/000008.html

    HTH

    Mbandi

    Leave a comment:


  • panos_ed
    replied
    Originally posted by Celia View Post
    Wallysby01,

    thanks for answering...as soon as trinity stops running I will have a look at what you said about the 5000 and 10000th contig.
    Celia,

    I don't know if Trinity is still running but if it is taking too long at the Butterfly step, then you might find interesting this note that I found in the Trinity FAQ.

    They, however, say this shouldn't be an issue after version 2011-05-19...

    Leave a comment:

Working...
X