Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

How many reads/replicates do I need for bacterial RNA-seq? MiSeq?

  • Filter
  • Time
  • Show
Clear All
new posts

  • How many reads/replicates do I need for bacterial RNA-seq? MiSeq?

    I'm aware of the ENCODE best practices and other recent research that give guidelines about the number of reads you need for RNA-seq in mammalian genomes. I generally recommend ~40-50M/sample for most applications, as low as 20M if the goal is just expression at the gene level, 100M+ if the goal is rare/aberrant isoform identification. I'm taking on a bacterial RNA-seq project where the goal is to identify differentially expressed genes and isoforms in a WT vs mutant strain of pathogenic F. tularensis. While there isn't splicing, I'm coming to appreciate that the prokaryotic transcriptome is still complex - overlapping genes, strand specificity, sRNAs, etc.

    1. How many reads do I need? What length? Is paired end sequencing as necessary as with complex (spliced) mammalian genomes? A recent paper gave guidelines about another bacteria, P. syringae. With 3.5 million prefiltered reads they were able to cover 95% of the annotated genes with at least 10 reads (average 190). P. syringae has a larger genome and about 3 times as many annotated ORFs as our bacteria, F. tularensis. So can I get away with fewer reads, say, 2 million before any filtering?

    2. If I'm about right on #1 above, needing ~2M reads/sample, and I want to sequence, say, 2-4 samples from each condition (WT vs Mut), what's my best choice for platform? Will MiSeq have the capacity to do this on a single flowcell, or should I use a single lane on our GAIIx?

    3. What counts as a biological replicate in this case? I would imagine taking aliquots from the same flask would be more like technical replication, and taking two different flasks grown from two different colonies to be biological replicates. Am I thinking about this correctly?

  • #2
    Hi Turner,

    Regarding the replicates, you are thinking of that correctly. The important part to replicates is to replicate around your largest source of experimental variation which is usually (not always) biological. For the comment on 2-4, I would change that to 3-5. 2 imo is never an option and really is no better than 1.

    For read length and paired vs single, there are a few publications out there now that state that short single is sufficient. The RSEM paper describes this as well. We did a little study where we had 101 PE data from mouse and in silico created a set of data sets that ranged from 36 cycle SE, 36 cycle PE, up to the full data set including partial read subsets to explore multiplexing possibilities. We looked at our sensitivity to splice variants and detection of known transcript d/dx. What we found was that somewhere between 50 and 76 cycle SE was the optimum which includes a little personal bias towards longer reads. The multiplexing question is a bit more ambiguous so we really don't (yet? not sure) have a good handle on that. What we have been telling people is that if you have to choose between long and more, choose more.

    On the MiSeq vs GA, for the MiSeq, you will be doing 2-3 at a time for 2-3M reads per replicate while if Yongde has a good run, you should be able to do all 6 (thinking triplicates) in one go and get 2-3M+ per replicate. Tell your core you want >30M reads.

    Good luck.
    GO CAVS!
    Last edited by bioBob; 03-01-2012, 05:20 AM.


    • #3
      In your 2 million reads you have to take into account whether the original RNA has been rRNA depleted or not. If your libraries are from total RNA and there was no ribosomal RNA depletion you will not get sufficient mRNA coverage in 2 million reads.

      I agree PE is not required for all our bacterial libraries we find 42 cycles to be sufficient.

      With regard to the biorep question, you are correct sampling from the same flask constitutes a technical replicate not a biological one.

      Best of luck.


      • #4
        You might get something out of this paper, and its supplemental:

        They did a lot of trial and error to find the best way to do bacterial RNA-Seq on in vivo populations. Interestingly, they did a rarefaction analysis and found that above 300,000 reads aligning to mRNA, not much more information is gained.


        • #5
          Resource for Bacterial RNA-seq depth recommendations

          This paper "How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?" ( does a nice job of going through the key library prep and sequencing depth parameters for a bacterial RNA-seq experiment, and concludes that "5-10 million reads per sample [...] are sufficient for most applications of bacterial RNA-Seq".

          From my experience, the critical experimental design question for a bacterial RNA-seq experiment is how well will the rRNAs be depleted (i.e.: how many reads will not be covering target genes) and how many replicates can I perform (i.e.: what will be my statistical power). Depth is helpful to overcome poor rRNA depletion but adding more replicates is the better use of increased sequencing cost, in my opinion.
          Last edited by dmking; 02-14-2020, 01:42 PM. Reason: Added additional notes for depth vs replicates