Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rnastar
    Member
    • Aug 2013
    • 13

    RNA-seq read depths: observed vs. expected

    Dear all,

    We recently submitted RNA-seq samples for sequencing to a local facility (20 cancer samples, 20 controls), where for each sample we sequenced approximately 100 million paired end 50 bp reads. However, after sequencing we found that several samples after sequencing only contained 6 million sequenced reads and others 50-80 million reads. Only about 5 samples had 90 million reads and above, with one sample having 180 million reads.

    I am not sure what to make of this. There were some concerns regarding the RNA quality, of some of the samples, but I am not sure if that could lead to such low output. Our contact at the facility seems to suggest it is the RNA-quality, but I wanted to ask you experts just to be sure.

    The fastQC analysis on the sequences do not show any significant issues in terms of quality, however it appears the filtering may be occuring during or just after the sequencing itself. If anyone has any ideas I would be much appreciative.
  • Wallysb01
    Senior Member
    • Feb 2011
    • 286

    #2
    If you're pooling 40 samples together to spread across all the lanes then its very important to get the molar ratios correct. I suppose that much is obvious. Also, it is important that all the samples have a similar size distribution. Illumina tech just prefers inserts that are smaller, so if you had some samples with 250bp average sizes and some with 600, that could make a big difference in clustering efficiency of each sample. Now RNA integrity is an issue too, but of the three things RNA integrity shouldn't have as much effect on sequencing depth in pooled samples. If you got the other two things right, that just shouldn't be an issue in this case (it could lead to poor data for other reasons just not really relative sequencing depth).

    Also, if only 5 samples came with in 90% of your expected depth of sequencing, I would suspect something went wrong with the actual sequencing. Either clustering didn't work well or barcodes weren't read correctly for a lot of the reads. Though I am curious, you say you have 40 samples total, and you're sequencing to 100M PE reads each, that would be 20 HiSeq lanes, as you shouldn't expect more than 200M PE reads per lane. Is that what you actually did? Or are you counting them like single end reads, leading to 400M total reads per lane?

    Most people are now sequencing about 50M reads per sample (either 2x100, so really 100M reads but they are paired so statistically its still 50M, or 1x50). So, if most of your samples are around 50M-80M that should be fine.

    Comment

    • rnastar
      Member
      • Aug 2013
      • 13

      #3
      Thank you for the reply! For clarification, I meant that we are getting only 50-80 million paired end reads, that is, only 100-160 million reads total for a given sample. It sounds like something went wrong with the sequencing but the facility may not want to tell us (this is not through illumina, a local university).

      In terms of downstream analyses, we tried to look at alternative splicing (our main interest) using cuffdiff using all samples, and when we did so we found no significant alternative splicing events. When I filtered out samples that had less than 90 million paired end reads off the sequencer we got about 600 significant alternatively spliced genes and a lot of DE genes. I am wondering whether filtering out samples based on the resulting sequencing depth is the way to go, or if we should question the entire set to begin with. In mapping with Tophat, in almost all samples I am seeing a lot of reads mapping to multiple places in the genome. So if we had 200m sequenced reads (100m paired end reads) we observe almost 300-400m reads in the accepted_hits.bam file. This is all making me a bit nervous.

      Comment

      • Wallysb01
        Senior Member
        • Feb 2011
        • 286

        #4
        50-80M paired end reads per sample and 20 replicates for control and cancer cells is a huge data set for RNA-seq. Even if that's not what you paid for, you should be able to find plenty of differentially expressed isoforms, if they are there to find. And tophat -> cuffdiff is probably the best way to go with isoforms. Though the other option is to use DESeq for exon level tests, to find differentially expressed exons, then track them back to what isoforms they could be from. Its interesting you chose 2x50 reads for isoform tests. While its good you went for paired end, the extra 50bp on each read would have been pretty helpful when it comes to resolving isoforms.

        I think you are right to set a read depth cut off to include your replicates. I'd suggest maybe 20M-30M PE reads. But it might depend on what your read depth per sample distribution looks like.

        As for if there was a problem with the sequencing, do you know how many lanes you payed for? Without knowing that, its hard to judge just how wrong the sequencing might have went.

        Comment

        • rnastar
          Member
          • Aug 2013
          • 13

          #5
          I just followed up on this, and it looks like we sequenced two individuals per lane, so we duplexed the sequencing. From what we are seeing, it looking like the variability in sequencing depth is specific to this set of samples, and not seen as much in other projects we have done.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 08:59 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          32 views
          0 reactions
          Last Post SEQadmin2  
          Working...