No announcement yet.

RNA-seq analysis is not consistent with qPCR results

  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq analysis is not consistent with qPCR results

    Dear all,

    I used to be of the opinion that RNAseq results do not need to be verified by qPCR, but to satisfy future reviewers I sat out to verify the genes of interest by qPCR. To my big surprise, and sadness, these genes do not show any near the same fold of dysregulation as the RNA-seq analysis shows.

    RNAseq libraries were prepared from RNA extracted from adipocytes using Trizol followed by polyA enrichment. TruSeq kit was used and samples were sequenced on HiSeq2000 2x50bp PE. For all groups I have triplicates.
    Reads were mapped to mm9 iGenome ref genome using Tophat 2.0.8 followed by Cufflinks -> Cuffdiff. I have also analyzed the data using HTseq count -> DESeq. Though these two different pipelines do not catch all the same genes as being differentially expressed, they do agree on the expression levels of the different genes.

    qPCR was performed on the same samples (not the library preparation fraction) but the whole cell trizol extraction (which was used for the library prep). RNA was converted to cDNA using iScript and qPCR performed using SYBR green mix from BioRad. I have tested several house keeping genes for normalization. Based on the RNAseq data, TBP is the most consistent one. I have also tried to normalize to HPRT and GAPDH. Does not change the picture.

    I have attached a barplot where I have taken four genes I am very interested in. The mice are heterozygous for Gene 1, so this fits pretty well. However, for Gene 2 and 4 it is completely off. We were so excited to see that the genes were upregulated (2-3 fold) in the "white"-state but according to the qPCR they are not at all upregulated. For Gene 2 it is in fact downregulated.

    What is your take on this? I find this pretty worrisome. Can I trust my RNAseq at all then?

    Thanks a lot. Appreciate all input.

    // Update. Gene 2 and 3 have several isoforms. Maybe that explains some of the variation, but Gene 4 does not have any isoforms.
    Attached Files
    Last edited by DonDolowy; 08-12-2013, 07:30 AM.

  • #2
    So you have 3 groups of mice, each with three biological replicates? And the image shows the expression normalised to TBP. First I don't think 3 is a high enough number of replicates, I understand the reasons for this, I am not criticising.

    From the image the counts do not look to be wildly different between replicates for the genes (maybe Gene3), is this the case? What sort of counts/FPKMs do you get for these genes (and for your house-keepers too?)

    Also what are the basics of the RNAseq data: numbers of reads total, aligning, etc, just out of interest?


    • #3
      Barplots are nice, but they do hide the original intensities measured.

      What were your genes ct and absolute tag counts (out of how many reads)?


      • #4
        I always to qPCR follow-up, but my particular experiments pretty much require it. In my experience, if I do qPCR on the same samples on which I did RNAseq, then the results are very close. The differences come as I increase my N by adding additional samples. In effect, the results show me how well my original samples can be used to estimate actual biological variability. So, the more samples I use, generally the more reliable things turn out to be.

        Also, keep in mind if you're using Taqman assays that while they may be billed as being 100% efficient, that's not always the case (this can lead to scale mismatches, though should result in some of the direction changes that you've seen).


        • #5
          Samples were 8-plexed giving us 25-30 million mapped reads. This should be more than sufficient for gene expression analysis. We are not interested in splicing events etc, but use RNAseq as an alternative to microarray.

          This is the info from Cuffdiff / CummeRbund.
          CuffSet instance with:
          3 samples
          22986 genes
          30963 isoforms
          25081 TSS
          24612 CDS
          68958 promoters
          75243 splicing
          60939 relCDS

          Attached are the Ct values, as well as the FPKM values for the different genes.
          Attached Files


          • #6
            Out of curiousity, did you analyse the data with anything other than cufflinks (edgeR/DESeq/limma/etc.)? Depending on the version, cufflinks has had issues


            • #7
              @dpryan: says he used HTSeq->DESeq in first post

              @Don: The barplot does not show what is in the table, eg gene1 FPKM is ~20-30 but the plot shows normalised value (with TBP) of ~1, should be ~5 based on table. Are they FPKM in plot, or is it counts?


              • #8
                Originally posted by bruce01 View Post
                @dpryan: says he used HTSeq->DESeq in first post
                Oops, missed that, thanks!


                • #9
                  Please tell us the raw count values, not the FPKM values. This is crucial to determine how much precision you can expected.

                  There is also something odd with your barchart: Gene 2 seems to go down significantly in the qPCR results, but in your table, it does not look like this: The first ct value for Gene 2 in Group #3 is even higher than all other values. How have you calculated the error bars?

                  To give a general answer to your question:

                  The precision of expression strength estimates from RNA-Seq data depends on the raw counts. The relative standard error can never be lower than 1/square_root(n), where n is the number of reads mapping to the gene in the sample under consideration. For genes with less than 100 counts, the error will always be at least 10%, which corresponds to a standard error of log2(1+0.1)=0.14 cycles.


                  • #10
                    We never use oligo-dT for RNA-Seq or any expression profiling for that matter. It gives 3' biased libraries (or expression arrays).


                    • #11
                      Thanks for all your input so far.

                      Anders, you are absolutely right. I forgot to mention that the values are first normalized to TBP and then normalized to Ctrl group, which is the first black group on the bar plot. Thus the values around 1.

                      What I am comparing is wildtype (black) versus heterozygous mouse exhibiting two different phenotypes (grey and white).

                      Attached are the analysis done by DESeq. I have included the raw counts from the htseq count matrix, as well as the normalized counts from DESeq.

                      Regarding 3' bias. When looking at gene body coverage using RSeqC, the 3'bias is ranging from almost non-existing to very mild in the samples.
                      Attached Files