Unconfigured Ad

**Wallysb01** · 08-28-2013, 07:19 PM

If you're pooling 40 samples together to spread across all the lanes then its very important to get the molar ratios correct. I suppose that much is obvious. Also, it is important that all the samples have a similar size distribution. Illumina tech just prefers inserts that are smaller, so if you had some samples with 250bp average sizes and some with 600, that could make a big difference in clustering efficiency of each sample. Now RNA integrity is an issue too, but of the three things RNA integrity shouldn't have as much effect on sequencing depth in pooled samples. If you got the other two things right, that just shouldn't be an issue in this case (it could lead to poor data for other reasons just not really relative sequencing depth).

Also, if only 5 samples came with in 90% of your expected depth of sequencing, I would suspect something went wrong with the actual sequencing. Either clustering didn't work well or barcodes weren't read correctly for a lot of the reads. Though I am curious, you say you have 40 samples total, and you're sequencing to 100M PE reads each, that would be 20 HiSeq lanes, as you shouldn't expect more than 200M PE reads per lane. Is that what you actually did? Or are you counting them like single end reads, leading to 400M total reads per lane?

Most people are now sequencing about 50M reads per sample (either 2x100, so really 100M reads but they are paired so statistically its still 50M, or 1x50). So, if most of your samples are around 50M-80M that should be fine.

**rnastar** · 08-28-2013, 07:48 PM

Thank you for the reply! For clarification, I meant that we are getting only 50-80 million paired end reads, that is, only 100-160 million reads total for a given sample. It sounds like something went wrong with the sequencing but the facility may not want to tell us (this is not through illumina, a local university).

In terms of downstream analyses, we tried to look at alternative splicing (our main interest) using cuffdiff using all samples, and when we did so we found no significant alternative splicing events. When I filtered out samples that had less than 90 million paired end reads off the sequencer we got about 600 significant alternatively spliced genes and a lot of DE genes. I am wondering whether filtering out samples based on the resulting sequencing depth is the way to go, or if we should question the entire set to begin with. In mapping with Tophat, in almost all samples I am seeing a lot of reads mapping to multiple places in the genome. So if we had 200m sequenced reads (100m paired end reads) we observe almost 300-400m reads in the accepted_hits.bam file. This is all making me a bit nervous.

**Wallysb01** · 08-28-2013, 08:43 PM

50-80M paired end reads per sample and 20 replicates for control and cancer cells is a huge data set for RNA-seq. Even if that's not what you paid for, you should be able to find plenty of differentially expressed isoforms, if they are there to find. And tophat -> cuffdiff is probably the best way to go with isoforms. Though the other option is to use DESeq for exon level tests, to find differentially expressed exons, then track them back to what isoforms they could be from. Its interesting you chose 2x50 reads for isoform tests. While its good you went for paired end, the extra 50bp on each read would have been pretty helpful when it comes to resolving isoforms.

I think you are right to set a read depth cut off to include your replicates. I'd suggest maybe 20M-30M PE reads. But it might depend on what your read depth per sample distribution looks like.

As for if there was a problem with the sequencing, do you know how many lanes you payed for? Without knowing that, its hard to judge just how wrong the sequencing might have went.

**rnastar** · 08-29-2013, 09:45 AM

I just followed up on this, and it looks like we sequenced two individuals per lane, so we duplexed the sequencing. From what we are seeing, it looking like the variability in sequencing depth is specific to this set of samples, and not seen as much in other projects we have done.

Topics	Statistics	Last Post
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM

Unconfigured Ad

RNA-seq read depths: observed vs. expected

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News