Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by lletourn View PostI highly suggest nr, with blastx with your transcripts. Conserved proteins are easier to find this way.
Originally Posted by grassgirl
I have about 6600 isotigs and I'm not sure how to find out how much memory it would take (and that I would request on the cluster) to blast them all to nr. I have heard that I can test a subset (10, 100, 1000), but don't know how to go about doing this. Any suggestions? Should I split my isotig file up before blasting?
I've done this with 33,000 assembled transcripts on a 192 node cluster. It took a few days (2-4 I don't remember) to get all the xml results. Basically I broke down the 33k transcripts in 192 parts. And ran 1 per node.
Code:# blastall -p blastx -d <db> -i <contigFile.fasta> -U -f 14 -F "m S" -e 1e-10 -b 20 -v 20 -a <#_cpus> Adapted from BLAST by Korf, Yandell & Bedell Sorry that this is the command using the old BLAST toolset. If you are using BLAST+, as NCBI is urging people to do, you'll have to translate this to the new command/options. The BLAST+ package has a perl script which can do the translation for you.
Also, the memory requirement is independent of the size of your query set. BLAST does not store the query or the results in RAM. The major contributor to RAM consumption is the size of the target database. Here again, sticking to more narrowly targeted DBs will help, but by today's standards of RAM even nr should not be a problem.
Leave a comment:
-
Hi there,
we just did some Velvet/Oases-Assemblys on several non-normalized 60bp-PE-Libs and I would like to inform you about the ressources needed:
Set1 with 101 Mio Reads: up to 38 GB RAM for Velvet, up to 63 GB for oases with k=25
Set2 with 87 Mio Reads: up to 25 GB for Velvet, up to 46 GB for oases, also k=25
Set3 with 100 Mio Reads: up to 37 GB for Velvet and 27 GB for oases, k=25
Runtimes where between 4,5h for Velvet and up to 1h for Oases.
We explored more kmers and the ressources needed were smaller for higher kmers (as expected).
We also had a set of 454 reads, assembled them with mira, which took 13h and not more than 7 GB of RAM. The N50 value here was about 450 bp for 60k contigs.
In addition, all transcripts have a N50 value around 670 bp after clustering from all Sets (with 454 Contigs).
We plan to do the same assemblies with transAbyss and Trinity as well, I can post ressources needed here, if you are interessted.
Leave a comment:
-
Originally posted by grassgirl View Postbut was told by a researcher with much experience that the 454 de novo data would assemble better than Illumina because of the long reads.
Originally posted by grassgirl View PostAlso I was told by Roche that there is no protocol for paired end cDNA libraries because the reads are so long and that it isn't a necessity.
Originally posted by grassgirl View PostAs for assembly, a fellow researcher runs GS de novo Assembler followed by cap3.
In your case, with 454 reads, I would suggest an overlap based one like mira. Mira has always given good result with 454 and sanger type reads for ESTs and transcriptome analysis. It also works well with illumina but is *very* resource demanding.
Originally posted by grassgirl View PostI have access to a cluster with blastall and would like to blast to nt or nr.
Originally posted by grassgirl View PostI have about 6600 isotigs and I'm not sure how to find out how much memory it would take (and that I would request on the cluster) to blast them all to nr. I have heard that I can test a subset (10, 100, 1000), but don't know how to go about doing this. Any suggestions? Should I split my isotig file up before blasting?
I've done this with 33,000 assembled transcripts on a 192 node cluster. It took a few days (2-4 I don't remember) to get all the xml results. Basically I broke down the 33k transcripts in 192 parts. And ran 1 per node.
Leave a comment:
-
Originally posted by Aurelien Mazurie View PostI am collecting information about the best strategy to perform de novo transcriptome assembly for a plant for which we have no reference genome. From what I read here it seems that most people are going for Illumina rather than 454 reads (which answers my first question, about which NGS technology should be used for this task).
Also I was told by Roche that there is no protocol for paired end cDNA libraries because the reads are so long and that it isn't a necessity.
As for assembly, a fellow researcher runs GS de novo Assembler followed by cap3.
Regarding BLASTING: I have questions regarding the best way to do this with my isotigs. I have access to a cluster with blastall and would like to blast to nt or nr. I have about 6600 isotigs and I'm not sure how to find out how much memory it would take (and that I would request on the cluster) to blast them all to nr. I have heard that I can test a subset (10, 100, 1000), but don't know how to go about doing this. Any suggestions? Should I split my isotig file up before blasting?
Leave a comment:
-
Wallysby01,
thanks for answering...as soon as trinity stops running I will have a look at what you said about the 5000 and 10000th contig.
Our computer has a 8-core processor and 16GB of memory plus 18GB or SWAP.
and as I said before ABySS always finished over night (max 10hours).
Leave a comment:
-
Celia,
From my understanding your N50 for transcriptome assembly should be pretty low. You're just not going to have large contigs when the average spliced gene is something like 1500-2000 base pairs. Then, depending on your RNA extraction and purification methods, you might have also captured microRNAs. To a certain extent you probably have some genomic contamination, and certainly a lot of pre-spliced mRNAs. Then, what's your coverage on lowly expressed genes? For larger, but lowly expressed genes, you probably have a bunch of small contigs. Anyway, I wouldn't worry about the N50 for transcriptome assembly so much. I'd much rather see what the size of the 5,000th and 10,000th contig is to get a measure of how "complete" the transcriptome is. And unless you have the whole body of the animal and a variety of ages (including embryonic), I wouldn't expect you to get much more than 10K reasonably well assembled genes, if that.
I can't really help with the other question though, I'm still behind on the actually "doing" of the analysis, but do you mind sharing what kind of specs the computer you where running ABYSS on had for processors/RAM?
Leave a comment:
-
Hi. We are running test on different de novo assemblers..like ABYSS and even CLC...which both are fairly quick (maximum 8 hours)..and now we are running Trinity,which is taking days already...
has anyone done a comparison on the different programs and can comment on (first of all) how long Trinity actually takes and also what seems to be the best program to use?? (I cannot run velvet as we dont have enough memory)
We have about 48mill 109bp Illumina reads.
Also, I am having issues with the N50 ...as it seems to be one quality measure for assessing the de novo assemblies, however mine are really low (my read were trimmed and all is above at least 30 quality score)...what N50 values are good for a de novo??
Leave a comment:
-
ikim,
Do you mind sharing what kind computational power your velvet/oases and trinity assemblies are taking?
We're gearing up to start some jobs on a shared campus computer, but they are being kinda fussy about letting us run jobs that might take many 10's or even 100's of GB of RAM and take over a couple of days. Right now they are basically saying we can only have 64 GBs of RAM for a few days. Or 8 GBs of RAM for 14 days. (Makes me wonder why we even have this super computer and why they brag about having TBs of RAM?)
All ranting aside, do you have any idea if 64 GBs for a 3-4 days would be enough? Or if that's all we can get out of them, what programs might be able to work with those constraints?Last edited by Wallysb01; 06-02-2011, 10:21 AM.
Leave a comment:
-
From their advanced Guide online
"FPKM_all: expression value for this transcript computed based on all fragment pairs corresponding to this path.
FPKM_rel: expression value accounting for fragments that map to multiple reported paths (fragment count is equally divided among paths, yes not optimal... we're working on more advanced methods ala cufflinks to better estimate expression values.)"
Guess its still better to use mapper/cufflinks for now.
Leave a comment:
-
I can't comment on the trinity vs bwa, but using bwa on oases assemblies has always been problematic for me since there are hairpins in the assembly and the isoforms in the transcripts.fa need to be filtered out first.
I generate the FPKM values using the read tracking option from velvet/oases.
With the contig-ordering and LastGraph file you can get the reads per transcript. It's not perfect because if oases decides to cut a contig in the final transcript you can't know which reads not to count. I have rarely seen this though.
I'll try trinity to compare.
Leave a comment:
-
We have been using Velvet/Oases for denovo transcriptome assembly of several large eukaryotes. I'm running Trinity tests at the moment and it seems to need similar computational resources as our current pipeline (we run multiple Velvet asm in parallel). I'm hopeful the FPKM values generated will greatly reduce mapping and expression efforts in terms of time and resources. Any one understand whether the Trinity FPKM calculations will be more or less accurate than say from a BWA mapping?
Leave a comment:
-
Aurelien
Originally posted by Aurelien Mazurie View PostThis is something I am wondering: is there any way to come up with an RPKM-like measure of expression level when doing de novo transcriptome assembly? Counting the number of reads per contig (cDNA) appears to be a crude way, but that's the only one I can think of. Any better suggestion? Normalizing by the library's length (number of reads), maybe?
I know less about the other programs given that it appears they all require a little more knowledge than I currently have. Trinity seems pretty much plug and chug so long as you have plenty of computing power.
Leave a comment:
-
Originally posted by lletourn View PostHaving the mixed samples, we used an in-house software on the oases output to extract how many reads were used per transcript for each sample to get a feel of variation of expression...this is in no way precise given that a read can be in multiple transcripts (isoforms for example) but it gives insight into differences between the samples.
Aurelien
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:37 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:37 PM
|
||
Started by seqadmin, Yesterday, 06:07 PM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
67 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Leave a comment: