Hi Aurelien
I was told that Trinity would work with non-strand-specific paired-ends as if they were single reads.
I don't about the interpretation of strand-specific or non-strand-specific data in velvet-oases, but they seem to perform better with paired reads anyway.
HTH
Dave
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Aurelien Mazurie View Post- most tools that are mentioned for the transcriptome assembly (Rnnotator, Oases, ABySS, Multiple-k) uses Velvet internally
Originally posted by Aurelien Mazurie View PostMy first question would be: are paired-ends a big plus, or are they not worth the extra cost?
Leave a comment:
-
Very interesting thread. I am collecting information about the best strategy to perform de novo transcriptome assembly for a plant for which we have no reference genome. From what I read here it seems that most people are going for Illumina rather than 454 reads (which answers my first question, about which NGS technology should be used for this task). However, I am still wondering about the following choices:
- most tools that are mentioned for the transcriptome assembly (Rnnotator, Oases, ABySS, Multiple-k) uses Velvet internally; the only exception appears to be Trinity, which have its own assembly algorithm. It means those tools can make use of both single- and paired-ends reads. However, there is little information about which of those tools actually use pairing information to improve on the results (e.g., to detect splicing variants). My first question would be: are paired-ends a big plus, or are they not worth the extra cost?
- some tools explicitly state they work best with strand-specific data (e.g., Trinity). Others mention using it, but do not tell if strand-specific data is mandatory (e.g., Rnnotator). My second question is: should I prefer strand-specific sequences?
Best,
Aurelien
Leave a comment:
-
Originally posted by dnusol View PostHi,
is web-based blastx able to digest a full contig output from velvet or oasis, or is it better to download both blast and uniprot database and work locally?
Best,
Dave
Running local NCBI Blast is a nightmare unless you have a good computing facilitiy. I will suggest you run Netblastx and limit the output to 10 or less. Use the tubular or xml output will be easier for you to parse the results. Your computer should have at least 6-8 GB of memory for netblast.
Leave a comment:
-
thanks lletourn, we'd have some ESTs available too, through from my initial searching through them, coverage in the EST library is pretty poor. So, I think we'd basically be in the same situation, using it for validation, but not assembly.
Leave a comment:
-
I've used velvet+oases with GAIIx 108PE data. We actually mixed in difference samples of the same specie for the assembly
We got pretty good result when comparing with available ESTs. We didn't put them in the assembly because we weren't sure of how "good" the EST were. It turns out we found 93% of the full length EST in the assembly.
We also used blastx locally on NR to try to identify the genes. This took a long, long time. It was actually the thing that took the most time by far.
Having the mixed samples, we used an in-house software on the oases output to extract how many reads were used per transcript for each sample to get a feel of variation of expression...this is in no way precise given that a read can be in multiple transcripts (isoforms for example) but it gives insight into differences between the samples.
Leave a comment:
-
We're still waiting for the reads, but we were planning on using the trinity package from the Broad: http://trinityrnaseq.sourceforge.net/
Basically, we figure if its good enough for the broad, its good enough for us. But we're green to this and trying to assemble a vertebrate transcriptome, so I'm certainly up for suggestions. Can anyone compare runtimes/processing requirements and the like, for ABySS and other programs? Broad suggests 2GB memory per million reads for example. We expect to have roughly 100M reads of paired end 100bp. Do we really need 200GB of memory? We have access to cluster that would make that possible, but it sounds like ABySS maybe runs on less, with that breast cancer paper saying they used 20 nodes with 2GBs each for 194M reads of 36bp?
Also, for assessing quality, I'd guess the best way would to simply compare to the distribution of a related but more fully annotated species. I don't expect that would be easy, however, requiring a large batch-blast-type analysis while understanding sequence divergence and gene duplications/deletions issues. Other than that, I just don't know how telling these kind of k-mer analysis things really are. So you got X# of contigs bigger than 100bps, or max of 10kb, who cares, exactly? Especially when you go looking through your RNA-seq data that was alligned to reference genome and see all kinds of areas coming up out side gene regions, even on well annotated species like mouse. How much of this is just genomic contamination or a kind of "phantom" or random transcription of areas that do nothing? Basically, I just want to know how well you covered the ~20K genes in a vertebrate genome. After you show me that, I can start carrying about micro-RNAs, or your k-mers.Last edited by Wallysb01; 05-10-2011, 09:29 AM.
Leave a comment:
-
Hi,
is web-based blastx able to digest a full contig output from velvet or oasis, or is it better to download both blast and uniprot database and work locally?
Best,
Dave
Leave a comment:
-
We've tried a few different programs for de novo transcriptome assembly, you can see our paper that came out about a year ago here: http://www.biomedcentral.com/1471-2164/11/663.
As Marcel pointed out, Oases seems to do a pretty good job with the latest updates. In our paper we introduce, Rnnotator, which adds some additional pre/post-processing steps to further improve the assembly. We've been able to assemble a plant transcriptome, but we are still evaluating the result.
Originally posted by Neil View PostHi all,
We are planning to perform an mRNA-seq run using the Illumina GAII platform. We are worried about assembling the transcriptome when we get our data back. Most of the RNA-seq papers I read are assembling to a reference genome/transcriptome, we don't have either of these! Is there anyone out there that has assembled cDNA short reads de novo? If so, are paired reads as important as they are with genome assembly?
also, what software would you recommend for this?
hope someone can help
best regards
neil
Leave a comment:
-
Dear petang,
Is ok...
No worry about it...
I just not sure whether my query about identify alternative splicing variation without using reference genome sequence is working now?
Is it sound logically or not?
It seems like really quite difficult to identify alternative splicing variation without the reference genome sequence
Thanks first for any advice.
Leave a comment:
-
In my experience, the number of contigs assembled from 20 million of 2x100bp reads varies from 30000-50000 contigs, depends on the complexity of the genome. The first question is how many contigs you want to annotate and the purpose of your experiment. If you aimed to gene discovery, the first 10000 highly expressed contigs should be good enough. Or alternatievly, you can choose the long contigs (let say, longer than 1000bp).
If you are doing comparative transcriptomics. Obivously you can choose those differentially expressed contigs.
In either cases, it is impossible to annotate all contigs without the support of a bioinformatics group.
The quickest way to annotate the transcript is Blastx UniProt, then retrive all the information (pfam, GO, KEGG etc) from the hit. However, you will missed all the conserved hypothetical proteins which is only available from NCBI. So, I will start from BLASTx uniport, then use the un-hit contigs for BLASTx NCBI nr.
Leave a comment:
-
Just a quick questions on this de novo assembly for transcriptomic, say if I am having Illumina 2x100bp RNA-seq reads once it is assembly by de novo assembler, how do we annotate the transcript? but doing blast or blast2go? does it sufficient?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
49 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: