Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rebrendi
    ng
    • May 2008
    • 78

    RNA-seq results interpretation - help needed

    Hello,

    I am using a standard procedure for RNA-seq, then TopHat followed by DeSeq to determine differential expression in my cell lines from the total RNA sequencing. I am using 2-3 replicates per cell line, with ~30-40 million reads. What surprises me is that for ~9% of all transcripts, I am getting zero expression in all replicates in one of the cell lines. Exactly zero, no reads at all for these transcripts. It is even not possible to calculate the log2 ratio for these genes, since the log of 0 does not exist. Should I consider that these genes are completely shut down in this cell line? Is it common like this?

    Thanks!
    Last edited by rebrendi; 09-01-2012, 12:03 PM.
  • kopi-o
    Senior Member
    • Feb 2008
    • 319

    #2
    I would say it's normal, yes. At least this kind of thing is what I typically observe.

    Comment

    • rebrendi
      ng
      • May 2008
      • 78

      #3
      Originally posted by kopi-o View Post
      I would say it's normal, yes. At least this kind of thing is what I typically observe.
      and you considered that all those transcripts have no expression, or just the signal is missing?

      Comment

      • kopi-o
        Senior Member
        • Feb 2008
        • 319

        #4
        Well, of course if the seq depth is very low you will get zero counts for transcripts that are really expressed. Also discarding multi-mapping reads could lead to this sort of effect. But in general, I tend to assume most of the all-zero transcripts are really not expressed.

        Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.

        Comment

        • rebrendi
          ng
          • May 2008
          • 78

          #5
          Originally posted by kopi-o View Post
          Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.
          Yes, that would be the best check. I have actually, for one of the cell lines, two replicate experiments with 30,000 and 5,000 mapped reads. Both of them have these ~8-9% transcripts with zero reads.

          Comment

          • kopi-o
            Senior Member
            • Feb 2008
            • 319

            #6
            30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.

            Comment

            • rebrendi
              ng
              • May 2008
              • 78

              #7
              Originally posted by kopi-o View Post
              30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.
              I am using Eldorado, it contains much more than RefSeq, so more noise. But I am getting non-zero expression for these 9% transcripts in one cell line, and zero expression in another line, so this is not the annotation artifact. Sorry, I misprinted in the last post, I have 30 millions and 5 millions mapped reads in these two replicate experiments. What do you think?
              Last edited by rebrendi; 09-01-2012, 01:28 PM.

              Comment

              • kopi-o
                Senior Member
                • Feb 2008
                • 319

                #8
                OK,

                (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts! (Again, it may be that ENSEMBL, which is relatively generous regarding inclusion, will systematically yield higher values) For other organisms like Drosophila, the fraction is lower.

                (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598

                Basically one uses as controls a set of genomic regions for which there is no evidence of expression in any source. Then, by counting how many reads that fall into these "gold standard negative" regions, one can calculate a false positive rate for a range of RPKM values. By finding a good compromise between a low false positive rate and a low false negative rate (calculated from annotated transcripts), one can get an estimate for an RPKM cutoff.

                Comment

                • ETHANol
                  Senior Member
                  • Feb 2010
                  • 308

                  #9
                  You'll never be able tell which gene are truly not expressed. That's how science works. We can only see what is, you can never see what isn't!!!!!

                  In this case you will always be able to say, if you sequenced a little deeper a given gene would show some expression.
                  --------------
                  Ethan

                  Comment

                  • rebrendi
                    ng
                    • May 2008
                    • 78

                    #10
                    Originally posted by kopi-o View Post
                    (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598
                    Thank you, great article!

                    Comment

                    • rebrendi
                      ng
                      • May 2008
                      • 78

                      #11
                      Originally posted by kopi-o View Post
                      (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts!
                      so these were all-zero in all 134 replicates, or just in some fraction of them?

                      Comment

                      • kopi-o
                        Senior Member
                        • Feb 2008
                        • 319

                        #12
                        In all 134.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Pathogen Surveillance with Advanced Genomic Tools
                          by seqadmin




                          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                          03-24-2025, 11:48 AM
                        • seqadmin
                          New Genomics Tools and Methods Shared at AGBT 2025
                          by seqadmin


                          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                          The Headliner
                          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                          03-03-2025, 01:39 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-20-2025, 05:03 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-19-2025, 07:27 AM
                        0 responses
                        57 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-18-2025, 12:50 PM
                        0 responses
                        50 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2025, 01:15 PM
                        0 responses
                        201 views
                        0 reactions
                        Last Post seqadmin  
                        Working...