Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • berath
    replied
    I would like to take the community's opinions on the differential expression analysis of a de novo assembled transcriptome.

    We are studying a non-model organism with no genome sequence information; we have 99-bp SE Illumina reads and testing the differential expression for two experimental conditions with two biological replicates each.

    For the de novo transcriptome assembly, we have utilized all four lanes and used velvet-oases (multi-k) and trinity packages. Both assembly metrics and biological annotation suggested that the velvet-oases produced a (slightly) better assembly.

    For the DE analysis, is it a better approach to use an alignment software to map quality checked sequencing reads (from individually tested condition) to the annotated contigs constructed by the combined assembly (from all conditions) and calculate RPKM values

    or

    construct two separate de novo assemblies for each experimental condition, extract the number of reads & fragments of an annotated contig and compare it to those of the same gene coding contig from the other assembly?

    The second approach seems to be integrated in the Trinity package (as FPKM values for each contig); however, as noted in this thread earlier the authors agree that the values are approximate. I assume read_tracking and amos file option from velvet would allow to extract similar info.

    Any thoughts?
    Thanks..

    Leave a comment:


  • Bueller_007
    replied
    Why aren't people using STM for combining runs from multiple k values?
    An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

    Leave a comment:


  • boetsie
    replied
    I don't think it is a good idea to use SSPACE for merging assemblies. Of course contigs can be combined if pairs can be found, however it will not merge full assemblies. You will still end up with the initial size of the total assembly of different k-mers.

    Best way to go is using a tool that merges assemblies like Zorro or GAM. Have a look at this thread for a list of these tools;



    Boetsie

    Originally posted by dnusol View Post
    Hi, just some more info on memory use

    velvetg k-mer 31 with 127M reads peaked at 250Gb RAM for 18 cores, took half an hour to run, and produced about 320Gb of output data.

    Regarding merging output from different kmers, how about Minimus2 or SSPACE?

    HTH,

    D

    Leave a comment:


  • dnusol
    replied
    Hi, just some more info on memory use

    velvetg k-mer 31 with 127M reads peaked at 250Gb RAM for 18 cores, took half an hour to run, and produced about 320Gb of output data.

    Regarding merging output from different kmers, how about Minimus2 or SSPACE?

    HTH,

    D

    Leave a comment:


  • Jenzo
    replied
    Originally posted by ikim View Post
    For ppl running mult-kmers of Velvet, any suggestions on how to combine the assemblies? I used to use vmatch but it seems that their 'nonredundant' setting clusters together much more than just nonredundant data.
    Dear ikim,
    I also try to combine different assemblies and was not satisfied with results of vmatch and cd-hit-est. For me, assembling all the contigs with cap3 or tigr works much better than clustering with VMatch or cd-hit-est.
    To get a idea, how redundant my final dataset is, I think I will blast it against itself..
    If you got a good solution for efficient clustering to gain a nonredundant set of contigs, please let me know :-)
    Best wishes!

    Leave a comment:


  • ikim
    replied
    Originally posted by dnusol View Post
    Hi Apexy, thanks for your input,

    I thought small kmers would work worse for long reads (105bp) that is why I chose in the 31-45 range.
    Since my last post I got some more news: velvetg peaked at 56Gb RAM for kmer 31 and about 40M reads (keep in mind read_trkg was on, as suggested in the manual, which seems to be memory-hungry).

    Best,

    David
    My exp is likewise; very small kmer settings for longer reads are far from optimal and take a great deal of resources. My runs are generally 31-61 mer. Memory usage between 8 - 28 GB for our typical ~60 M, 90bp paired end reads, runs for 5-6 hours.
    A single equivalent run of Trinity seems to top at 68 GB (4 days to run using 5 processors, 3 days using 8), we set butterfly memory allocation to 10GB so when run with -CPU 8, max mem would have been 80GB though it never got using that much. Our latest 150M mixed library run at CPU 10 took 5 days).
    Initial annotations suggest a single Trinity run yields better results than one Velvet/Oases run (n50 size, assembly size, refseq matches, cds numbers).
    I'm liking how Trinity being three programs allows better handling of recovery runs.
    For ppl running mult-kmers of Velvet, any suggestions on how to combine the assemblies? I used to use vmatch but it seems that their 'nonredundant' setting clusters together much more than just nonredundant data.

    Leave a comment:


  • dnusol
    replied
    hi Mbandi,

    I am setting the kmer length using the automatic option for multiple kmers on velveth, first run velveth and then just tried the first kmer length on velvetg to assess memory usage. So I still have to run velvetg on the three other kmers specified. I am not intending to run everything simultaneously but I do plan to try velvetg on my full set of reads (127M) to test memory needs for future.

    I already preprocessed my set and then selected a random subset to reduce size, but I don't think going down below 30% of my full set is a good idea.

    There is a thread on Oases user-list regarding memory usage that may be of interest to someone.




    Best,

    David

    Leave a comment:


  • Apexy
    replied
    Hi David,

    Just to add to my previous post, -read_trkg & -amos_file yes was on at oases. My reads were of varying lengths min=30 and max=60. I do not disagree on the memory usage you require but I was just amazed compared to my little experience. Well I will run mine this time on 31M reads and see if it crashed. Are you running one k at-a-time? Does velveth precede velvetg immediately(in a script) or separately? From what I gather, unprocessed reads increase the complexity of the de Bruijn graph with more memory imprint. You can also make a rendez vous on the velveth & oases mailing list and benefit from more experience hands.

    HTH,

    Mbandi

    Leave a comment:


  • dnusol
    replied
    Hi Apexy, thanks for your input,

    I thought small kmers would work worse for long reads (105bp) that is why I chose in the 31-45 range.
    Since my last post I got some more news: velvetg peaked at 56Gb RAM for kmer 31 and about 40M reads (keep in mind read_trkg was on, as suggested in the manual, which seems to be memory-hungry).

    Best,

    David

    Leave a comment:


  • Apexy
    replied
    Hi dnusol,
    I'm not experience with assembly, but I started running velveth->velvetg->oases (k iterations pipeline) with 10601688 reads (paired and single). The memory constrain was profound and it always crashed. I was advice to abstain from very low k values. I do my iterations on 19 <= k <=29 with only 5G of memory allocated to the whole process (although not all is used when I look at the log file on the job id) and it take 31.55 mins. I use a 31G, 16 processor machine which I share with others. With 40M reads of yours, it is obvious you would need more memory. However I advice you to start with k=19.
    Cheers

    Leave a comment:


  • dnusol
    replied
    Hi, here my two cents: the idea I follow is to use Trinity and then Velvet/Oases on different kmers for de novo transcriptome. I will run both and then assemble the results to create a consensus transcriptome. At the moment, I have run Trinity using 127M 105bp reads (mixed paired-single but used as single-end as Trinity seems to use info only for mate-pairs not paired-reads) on my 24Gb RAM 8 processors box and had no problem on default parameters (I think it took two days or so).

    I am now trying to run velvet on a subset of those (40M mixed single-paired) and am running out of memory, so I am trying with a larger computer. I guess I will also run into problems when Oases comes.

    Best.

    Leave a comment:


  • lletourn
    replied
    For those interested this just came out:
    Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.
    Background Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology.

    Leave a comment:


  • Wallysb01
    replied
    Some data for those trying to figure out which programs to run for transcriptome data:

    I tried to run Trinity on 1 lane from the HiSeq, ~100M 105 bp paired end reads, on a machine with 64 GBs of RAM and 4 Xeon processors (though the processor is not the problem), and it crashed after creating all the kmers in the de Bruijn graph and then trying to create contigs.

    I'll be moving on to ABySS, as it seems to be much more memory efficient, and despite having access to one of the world's largest super computers, I can't get more than 64 GBs of RAM (makes me wonder what's so super about it).

    Leave a comment:


  • Apexy
    replied
    evaluating transcriptome assembly from k mer iterations

    Originally posted by blackgore View Post
    How are people evaluating their transcriptome assemblies? The standard N50 assessment can't be that useful, as the goal here isn't exactly to generate a tiny set of huge contigs...?
    Hi,

    A comparative approach was suggested by a user on the oases mailing list.


    HTH

    Mbandi

    Leave a comment:


  • panos_ed
    replied
    Originally posted by Celia View Post
    Wallysby01,

    thanks for answering...as soon as trinity stops running I will have a look at what you said about the 5000 and 10000th contig.
    Celia,

    I don't know if Trinity is still running but if it is taking too long at the Butterfly step, then you might find interesting this note that I found in the Trinity FAQ.

    They, however, say this shouldn't be an issue after version 2011-05-19...

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X