Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEseq2 analysis: Seeming incongruity between PCA & distance heatmap

    Hello all,

    I'm running an RNAseq analysis with DESeq2 (R version 3.1.0, DESeq2_1.4.5 ). Looking at my QC plots, I noticed an odd discrepancy between the PCA plot and the distance heatmap.

    One of the samples (labeled Sample_4 in the attached images) clusters right among the other samples on the PCA, but on the heatmap it appears to be an outlier compared to the other samples.

    I've run a lot of analyses like this using DESeq2 with very similar code, and I've never seen a discrepancy this big between these two plots before. Has anyone encountered this situation before, or have a good idea as to what might explain this?

    Could it have to do with the relatively small amount of total variation explained by PC1 and PC2 (18.4% & 17.6% respectively)?

    Thanks,

    Alex
    Attached Files

  • #2
    The distance between samples on the PCA plot is an approximation of the distance using all the genes, and the quality of the approximation depends on how much variance is captured by PC1 and PC2 (here only ~35%).

    Also, if you were using plotPCA (it looks like you are not though), the PCA is calculated on the top n genes ranked by variance, instead of all the genes.

    Comment


    • #3
      Hey Mike. Thanks for responding. I'm working with Alex on this. We also used the plotPCA function and got the same results (the code used here to make the PCA plots was based on plotPCA - we had difficulty changing the default colors at one point in the plocPCA's history).

      I believe the sample distance heatmap was made using some code that may have been part of the DESeq vignette at one point - something like `heatmap.2(as.matrix(dist(t(assay(rld)))))` where rld is the regularized log transformed dataset.

      So I think I'm hearing you say we're seeing this because the first two PCs aren't explaining that much variance. Still seems odd to me that the distance matrix based on all genes (or top N based on variance) still shows this particular sample as a pretty obvious outlier where PCA did not.

      Looking forward to those time-series examples in the vignette you promised last month in Boston

      Comment


      • #4
        hi Stephen,

        Something to explore: look into the other PC's to see if this sample sticks out in one of those.

        Yes, the time series dataset I submitted to Bioc is currently in review, and once that's done I can write up a workflow. It's a fission yeast time series.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X