Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq variance-based filter before differential expression analysis

    Hello,

    I am analyzing a larger set of RNA-Seq data (20 controls/20 diseased, Illumina Hi-Seq 2000, 101 bp, paired-end) in tissue that is affected in later stages of our disease of interest. Therefore, we were expecting to observe more subtle expression differences for our DE comparisons.

    Nevertheless, when using DESeq v1.12.0 for the protein-coding genes passing a low-count threshold (e.g. at least 2 counts per million in > half the samples, ~14,300 genes), the number of differentially expressed genes at FDR level of significance is only around 50. The problem is that when doing pathway or functional analyses, a larger number of genes would be preferable.

    While I am aware that the recent version of DESeq is more conservative than other DE programs (e.g. edgeR), I think it might be problematic to choose a less stringent FDR p-value for this reason alone. Consequently, I was wondering if adding some additional filter(s) for the genes included in the DE analyses might help reduce the number of multiple tests and increase the number of significant results. For example, are there any accepted variance filters for RNA-Seq DE analyses? Something like removing genes with very high variance across biological replicates (I've noticed edgeR has the tendency to incorrectly label genes in this category as differentially expressed) or removing genes with low variance across all samples? Any suggestions on specific thresholds?

    Thank you for your help!
    Alexandra

  • #2
    1) They have just released DESeq 2 and you might want to check that out.
    2) Read the DESeq vignette, one of the best written out there, they explain in detail how to go about pre-filtering data
    3) The harsh truth is that you don't always get what you want. If there are only 50 truly differentially expressed genes, trying to increase that number because you want it to be higher is your bias and not what is really going on. That data is what the data is. For that matter, 50 genes, is a small enough number that one can check potential functions by hand.

    Comment


    • #3
      Hi chadn737,

      1, 2) I will have to look into the DESeq2 R package - thanks for pointing this out.
      3) In general, I would agree with you. Nevertheless, my question regarding the variance filters came about after having tried several methods for the DE analysis. While the version of DESeq that I used returned only 50 DE genes, other programs (including edgeR) returned surprisingly more differentially expressed genes. I like DESeq better, since it does not return genes that I do not trust (and as you said, the vignette is amazing), but at the same time I know that interesting genes do not pass this FDR threshold because DESeq was designed to be more conservative (as you can see from this reference). I do need to look at DESeq2, though ..

      Alexandra

      Comment


      • #4
        a non-parametric approach is probably better

        for designs with sufficient biological replicates.

        RUM - HTseq-count - SAMseq (samr package) is my pipeline for clinical samples with many biological replicates...

        Comment


        • #5
          adumitri,

          As mentioned in your title and the vignette of DESeq, non-specific filtering of your genes on certain features, to reduce the number of tests carried out, works very well.

          I had tested some variables to filter on: the total counts of reads per gene worked out best for me. But you can test anything that pops into your mind. Remove increasing percentiles from your dataset, in small steps. The maximum number of significant genes lies ~15% higher in my case than not applying any filtering. It takes however a fair amount of time to loop through all the calculations. Precaution on setting the variable to filter on, as it should not be correlated with the hypothesis you are testing (hence, non-specific).

          A typical increase in sign genes for DESeq: see https://dl.dropbox.com/u/18352887/sweet_spot_deseq.png

          For your pathway enrichment, I advise you to use the Piano package in R. See http://nar.oxfordjournals.org/conten.../26/nar.gkt111

          You can provide the complete list of p-values as assigned by DESeq (without applying your cutt-off) to Piano, and let Piano run a couple of gene set enrichment algorithms on it, to assign a consensus score to the pathways.

          Hope this helps.
          www.bits.vib.be

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advanced Tools Transforming the Field of Cytogenomics
            by seqadmin


            At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
            Yesterday, 06:26 AM
          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM
          • seqadmin
            Methods for Investigating the Transcriptome
            by seqadmin




            Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

            Whole Transcriptome RNA-seq
            Whole transcriptome sequencing...
            08-31-2023, 11:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 06:57 AM
          0 responses
          6 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 07:53 AM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-25-2023, 07:42 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-22-2023, 09:05 AM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Working...
          X