De Novo Assembly of a transcriptome

Marta replied

02-28-2012, 10:44 AM
Hi edge,

No sure what you are referring to in "Fail to get main class". You'll need to contact the support of the program you are using. If this is from CLC bio trial, write to [email protected]
Looks like it maybe Java related, but this is extend of my understanding.

Regarding the evaluation of assemblies. I have a very old, almost 3 years old presentation when we did comparison of assemblers and evaluated output produced by CLC de novo assembler. The presentation is very long, and available here:

https://docs.google.com/file/d/0B9g4lIAKxQTaMjgyN2UzYzgtNjViNy00MDIzLTkxOGYtZjI4YzQzMzQ5MzQz/edit

https://docs.google.com/file/d/0B9g4lIAKxQTaMjgyN2UzYzgtNjViNy00MDIzLTkxOGYtZjI4YzQzMzQ5MzQz/edit

All you need is slide #25 and you will see what I mean

Best,

Marta
Leave a comment:
edge replied

02-28-2012, 01:16 AM
Hi Marta,

I got few more questions need your advice.
If the number of contigs that assembled by two different assembler are different, is it the best way is just select the top 1000 longest contigs?
Do you mind to explain a little bit more regarding "I like to see that about 90% of my contigs are at about 3x of the overlap length". I not sure regarding how to evaluate blastx result.

Currently I'm a research student, I know CLC bio, which do well in bioinformatic analysis

Many thanks for your reply.
Leave a comment:
edge replied

02-27-2012, 11:41 PM
Hi Marta,

I'm getting the error message shown, "Fail to get main class"
Any idea or advice to solve this issue?
Thanks
Leave a comment:
Marta replied

02-27-2012, 09:14 PM
Hi Edge,

you may want to try the following:

1. save the contigs created by each assembler in .fasta format. Or save just 1000 (or 100) longest contigs
2. download RefSeq proteins of your group of species to your machine
3. run BLASTX (you will need to install it on your machine, or run batch at the ncbi site) using your contigs as query and the downloaded RefSeq as database
4. parse the BLASTx output into a table so you can see the length of each query sequence (contig) and the length of the overlap from the BLAST output
5. I like to see that about 90% of my contigs are at about 3x of the overlap length

There are other QCs I do on assemblies, but this one would be the first filter.

I work for CLC bio, so I should point you to the product you can use to simplify such analysis. You can download the Genomics Workbench trial, that will allow you to run BLASTX and many other things from here:

Home - QIAGEN Digital Insights

http://www.clcbio.com/index.php?id=1292

Welcome to QIAGEN Digital Insights LabCorp uses QCI and HGMD to improve identification and interpretation of genetic variants within inhereited diseases.Read...

Not sure if I can provide commercial links here when associated with the organization. I will hear from the site owner, I guess, if this is not OK :-)
Leave a comment:
edge replied

02-27-2012, 08:38 PM
Hi Marta,

Many thanks for your prompt reply.
I'm very appreciate it

Do you mind to explain a little bit more about your idea?
If I assembly my newly sequenced transcriptome sequence by using two different third party assembler (velvet, abyss, etc).
How can I know that which program able to assembly and get better transcriptome sequence?

Looking forward to your reply.
Thanks.

Originally posted by Marta View Post

How about BLASTX against RefSeq of related species or group? I always wanted to see uninterrupted ORFs of expected size. I also always check for the presence of the longest transcript expected in the particular species. The latest may not apply to all organisms
Leave a comment:
Marta replied

02-27-2012, 08:22 PM
How about BLASTX against RefSeq of related species or group? I always wanted to see uninterrupted ORFs of expected size. I also always check for the presence of the longest transcript expected in the particular species. The latest may not apply to all organisms
Leave a comment:
edge replied

02-27-2012, 08:06 PM
Does anybody know how to evaluate the transcriptome assembly done by third party program is good or not?
Got any specify rules to follow?

Thanks.
Leave a comment:
m.nyine replied

08-26-2011, 01:12 AM
Dear All,

Can the Newbler software help me to answer questions like How many SNPs and frequency of occurrence withing the cDNA sequences, INDELs, percent coverage for the already existing ESTs? I am generating a reference trancriptome but I have not reference genome sequence.
Leave a comment:
scalabrin replied

08-22-2011, 05:50 AM
Originally posted by boetsie View Post

I don't think it is a good idea to use SSPACE for merging assemblies. Of course contigs can be combined if pairs can be found, however it will not merge full assemblies. You will still end up with the initial size of the total assembly of different k-mers.

Best way to go is using a tool that merges assemblies like Zorro or GAM. Have a look at this thread for a list of these tools;

merging scaffolds from several SOAPdenovo assemblies into a single consensus assembly - SEQanswers

http://seqanswers.com/forums/showthread.php?t=10834&highlight=zorro

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Boetsie

just a quick update on GAM, at the moment it is Sanger based only but we are modifying it to accept NGS (bam) files as well!

Home

http://services.appliedgenomics.org/software/gam/
Leave a comment:
lletourn replied

08-09-2011, 06:43 AM
I wanted to add parameters to allow for -long or not. Right now the script assumes -long is passed (in 4 of our experimets we added 454 RNA-Seq reads to the mix).

I'll fix this and post the code...somewhere :-)
Leave a comment:
berath replied

08-09-2011, 06:23 AM
So far, we have tried vmatch and CD-EST-HIT (performed better) to merge our assemblies. CAP3 is the next one to try.

@Iletourn,

Would you mind sharing your script to extract read counts from the velvet-oases assemblies?

I guess there is no way of choosing a DE approach without comparing the results. One way to do it is probably try it on a set of contigs representing a sample set of genes and compare the fold changes from the two approaches.

What I am wondering is if the de novo assembly using the reads from one experimental condition would be comprehensive enough as the one constructed from all the reads at hand. The second approach I mentioned is certainly shorter and maybe less prone to errors than mapping the reads back to combined assembly.
Leave a comment:
lletourn replied

08-09-2011, 04:36 AM
Yeah I know, I computed RPKM separately. I mis-phrased what I meant.
I should go back and edit.
Thanks.
Leave a comment:
DZhang replied

08-09-2011, 04:18 AM
Hi lletourn,

In #69, for DESeq and edgeR, you should use the raw counts, not RPKM.
Leave a comment:
Jenzo replied

08-08-2011, 10:39 PM
Dear berath & Illetourn,
we also mapped back reads of four conditions to the set of contigs from all lanes and now will count the rpkm value. Since we found no tool which is able to do DE of de novo transcriptomes, I think we have to write a script on our own.. Or are any tools available now, for this special purpose?

Again, to merge assemblies, we used TGICL/CAP3-package. Retrospectively, I'm not sure, if it does a good job, because sometimes there are annotations found on the (+)-strand and the same one on another transcript but (-)-strand.. After reverse complementing a sample transcript (-), and doing an alignment against the (+)-transcript, it shows, that sequences are similar, but with few small gaps between 30 and 60 bp.. Could these be real isoforms?
Leave a comment:
lletourn replied

08-08-2011, 06:41 PM
I'm also really interested in the opinions of others on this topic.

I've used the make 1 assembly (with velvet-oases) annotate and compare RPKMs from the assembly it seems to work well for "obvious" differences in genes (the ones that are way more abundant in one than the other).

What I was going to try is to fit the rpkms in edgR or DeSeq and check the results. I'm not too sure what too expect though.

BTW, I had to write my own script to extract read counts as you mentioned. with read_trkg on+LastGraph and the contig-ordering file it's not too hard to count reads per transcript.

What did you use to merge assemblier? Passed them back into velvet, CAP3, Zorro, GAM, other?
Leave a comment:

Previous 1 2 3 4 6 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News