Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • saint_667
    Junior Member
    • Aug 2013
    • 5

    Within Sample Correlation of 2 genes - Rna-seq

    Hi Guys,

    I am doing some differential expression analysis of rna seq data using deseq2 . I have 12 different samples and i am using the raw count data and then inputting the matrix in deseq2.

    my question is that if i wanted to compare a correlation of Gene A and Gene B within samples (not between samples - as they are co-expressed): do I do this on the raw counts or normalized counts.

    so I have 12 values for Gene A across 12 samples
    and 12 values for Gene B across 12 samples

    doing a raw count correlation gives me around rho 0.8 something
    however normalizing using the method in DESeq2 will scale each sample differently by size factors and the rho goes down to 0.5

    anyway i am not sure how i should be doing the correlation (raw or normalized), if normalized then which method is preferable for within sample comparisons for 2 different genes.

    Thank you for taking the time to read this and hope someone can give me some advice.
  • saint_667
    Junior Member
    • Aug 2013
    • 5

    #2
    bump!!

    just bumping the post up - as i posted it very late in the evening.

    Comment

    • Michael Love
      Senior Member
      • Jul 2013
      • 333

      #3
      hi,

      Note that DESeq2 doesn't really help you out with this question, as it focuses on gene-by-gene differential expression, and the transformations are most useful for visualizing and clustering samples.

      You don't want the sequencing depth as a factor in the correlation. Consider a situation where gene A and B are not correlated, but you sequence the samples so that each sample has double the number of reads as the previous sample. Then you will get a really high correlation which has no biological significance.

      So you could* do:

      nc <- counts(dds,normalized=TRUE)
      cor(nc[idx,])

      where idx gives the index of genes you want to find correlations for.

      *However, I would also consider batch effects if you are calculating gene-gene correlations and the samples were processed in batches. This would be another way to get spurious large-in-absolute-value correlations. You can check for batch effects using either of the transformations and the plotPCA workflow in the DESeq2 vignette.

      If the samples cluster by batch, then the cqn package vignette explains how to get "normalized expression values", where the normalization takes care of sequencing depth, GC-content bias and gene length bias:

      A normalization tool for RNA-Seq data, implementing the conditional quantile normalization method.

      Comment

      • saint_667
        Junior Member
        • Aug 2013
        • 5

        #4
        Thank you for clearing this up

        hi michael,

        thank you for clearing this up and giving a comprehensive response to this problem. i was thinking along the same lines. someone suggested to use the vsd transformed data from deseq2 and then plot these correlations

        - that however gives really high correlations, almost in line with non-normalized data. i understand that transformations from deseq2 are useful if we want to perform clustering

        - the method that you suggest i.e. using the normalized data makes sense to me. and then look for batch effects as well.

        thanks again

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          Yesterday, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM
        • seqadmin
          Investigating the Gut Microbiome Through Diet and Spatial Biology
          by seqadmin




          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
          02-24-2025, 06:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        26 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        33 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        25 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        190 views
        0 reactions
        Last Post seqadmin  
        Working...