Unconfigured Ad

**westerman** · 09-21-2011, 10:31 AM

Ah, I should have noted that you are a "Senior Member" and thus undoubtedly already know more about sequencing than many of us. My response below was more aimed towards the many new people we get on SeqAnswers thus it may not be applicable to you. Wish I did have more than a rough guide on an actual formula to use.

-------------------

Originally posted by edge View Post

Do we need consider coverage and depth of data...

Yes you do. In particular for a non-normalized transcriptome or non-rRNA-depleted sample then you need to be concerned with picking up low expression genes.

You do not give enough information for us to make an intelligent decision for your particular case (e.g., we would need information on the organism you are sequencing, the complexity of the genes for the organism, if your sequence sample is normalized or not, etc.) However we can play around with some very rough numbers.

Let us assume that your sample is completely normalized. In other words each transcript (gene) is present once and only once in your sample. Assume a complex eukaryotic organism. Then our numbers could look like:

100,000 genes at 1000 bases each ... equals a sequence space of 100 Mbase

Desire 30x sequencing coverage ... means we need 3 GB of sequence.

Your 14 GB will do quite nicely.

On the other hand let us assume that you do not have a normalized sample. Then some genes will be present thousands of times. Others only once. I am sure that there is some graph out there that describes this behavior and provides a multiplication factor but I'll make a wild guess that this increase the sequence space by at least 10. Thus you would need 30 GB of sequence.

The numbers above are very, very rough so do not base your research off of them. The numbers are more meant as a way to say "... it depends ..."

**tbanks** · 09-21-2011, 11:22 AM

The following publication shows a number of simulations on transcriptome assembly and the effects of coverage and sequencing technology. It`s a bit dated now but should help you out. I believe they also have some online software so you can do your own rough simulation.

Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics. 2009 Aug 1;10:347.

**edge** · 09-21-2011, 11:18 PM

many thanks, westerman.

I have a RNA-seq human lung sample, 2X100bp, pair-end read with total 14GB file size right now.
I plan to map my RNA-seq data against transcriptome database that downloaded from NCBI.
After then, I plan to cluster all the short read depend on their mapped transcript group.
My problem facing is to determine how many minimum pair-end read is best to be a cut-off for assembly purpose.
From the mapping result, some of the transcript group only mapped by thousand read pair.

Thanks for any advice.

**mruizm** · 08-25-2013, 09:16 PM

Minimum deep of coverage in transcriptome assembly

Hi everyone, i have 4,46 Gigas of information on various sequencing of transcripts in various tissues of Illumina Miseq paired-end reads. I had assembly all these reads and i found that the mean deep of coverage is of 27,9X (Deep of coverage = efficiency of sequencing / efficiency of assembly)
My question here is, what is de minimun of the deep of coverage for obtain robust information of the assembled transcriptome in a de novo transcriptome analysis?

Thanks!
Best regards!

Topics	Statistics	Last Post
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM

Unconfigured Ad

Minimum short read required for transcriptome assembly

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News