Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq results interpretation - help needed

    Hello,

    I am using a standard procedure for RNA-seq, then TopHat followed by DeSeq to determine differential expression in my cell lines from the total RNA sequencing. I am using 2-3 replicates per cell line, with ~30-40 million reads. What surprises me is that for ~9% of all transcripts, I am getting zero expression in all replicates in one of the cell lines. Exactly zero, no reads at all for these transcripts. It is even not possible to calculate the log2 ratio for these genes, since the log of 0 does not exist. Should I consider that these genes are completely shut down in this cell line? Is it common like this?

    Thanks!
    Last edited by rebrendi; 09-01-2012, 12:03 PM.

  • #2
    I would say it's normal, yes. At least this kind of thing is what I typically observe.

    Comment


    • #3
      Originally posted by kopi-o View Post
      I would say it's normal, yes. At least this kind of thing is what I typically observe.
      and you considered that all those transcripts have no expression, or just the signal is missing?

      Comment


      • #4
        Well, of course if the seq depth is very low you will get zero counts for transcripts that are really expressed. Also discarding multi-mapping reads could lead to this sort of effect. But in general, I tend to assume most of the all-zero transcripts are really not expressed.

        Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.

        Comment


        • #5
          Originally posted by kopi-o View Post
          Perhaps I should go back to my existing RNA-seq data and plot the fraction of all-zero count genes against the sequencing depth. That might give a clue about when the fraction of zero-count genes starts to bottom out.
          Yes, that would be the best check. I have actually, for one of the cell lines, two replicate experiments with 30,000 and 5,000 mapped reads. Both of them have these ~8-9% transcripts with zero reads.

          Comment


          • #6
            30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.

            Comment


            • #7
              Originally posted by kopi-o View Post
              30,000 and 5,000 mapped reads, respectively, seems awfully low. I am surprised you have as few as 8-9% zero-count transcripts, unless it is a bacterium or something, but you said it was a cell line. Are these human cell lines or some other species? And what transcript annotation (e g RefSeq) do you use? I use ENSEMBL and I suspect that in itself leads to a larger fraction of zero-count genes.
              I am using Eldorado, it contains much more than RefSeq, so more noise. But I am getting non-zero expression for these 9% transcripts in one cell line, and zero expression in another line, so this is not the annotation artifact. Sorry, I misprinted in the last post, I have 30 millions and 5 millions mapped reads in these two replicate experiments. What do you think?
              Last edited by rebrendi; 09-01-2012, 01:28 PM.

              Comment


              • #8
                OK,

                (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts! (Again, it may be that ENSEMBL, which is relatively generous regarding inclusion, will systematically yield higher values) For other organisms like Drosophila, the fraction is lower.

                (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598

                Basically one uses as controls a set of genomic regions for which there is no evidence of expression in any source. Then, by counting how many reads that fall into these "gold standard negative" regions, one can calculate a false positive rate for a range of RPKM values. By finding a good compromise between a low false positive rate and a low false negative rate (calculated from annotated transcripts), one can get an estimate for an RPKM cutoff.

                Comment


                • #9
                  You'll never be able tell which gene are truly not expressed. That's how science works. We can only see what is, you can never see what isn't!!!!!

                  In this case you will always be able to say, if you sequenced a little deeper a given gene would show some expression.
                  --------------
                  Ethan

                  Comment


                  • #10
                    Originally posted by kopi-o View Post
                    (2) If we forget about this zero-count business for a while, and just focus on your core problem, which is to distinguish truly expressed transcripts from truly non-expressed, I haven't found a better way to do it than the one outlined in this paper: http://www.ploscompbiol.org/article/...l.pcbi.1000598
                    Thank you, great article!

                    Comment


                    • #11
                      Originally posted by kopi-o View Post
                      (1) I checked my existing RNA-seq data, admittedly a small sample, but anyway. The most interesting data point is a study where we have 134 (human) biological replicates and up to 60M (paired) reads per sample. Even with this relatively deep probing, I find 23% ENSEMBL genes with all-zero counts!
                      so these were all-zero in all 134 replicates, or just in some fraction of them?

                      Comment


                      • #12
                        In all 134.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Non-Coding RNA Research and Technologies
                          by seqadmin




                          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                          Nobel Prize for MicroRNA Discovery
                          This week,...
                          10-07-2024, 08:07 AM
                        • seqadmin
                          Recent Developments in Metagenomics
                          by seqadmin





                          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                          09-23-2024, 06:35 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 06:35 AM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 02:44 PM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-11-2024, 06:55 AM
                        0 responses
                        15 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-02-2024, 04:51 AM
                        0 responses
                        111 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X