Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ronaldrcutler
    Member
    • May 2016
    • 80

    DESeq2 PCA Plots

    Hello all,

    I am running DESeq2 like so in R:
    Code:
    library(DESeq2)
    sTable = data.frame(sampleName = files, fileName = files, condition = cond)
    dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~condition)
    dds <- DESeq(dds)
    res <- results(dds)
    resOrdered <- res[order(res$padj),]
    rld <- rlogTransformation(dds, blind=TRUE)
    print(plotPCA(rld, intgroup="condition"))
    And I am getting a PCA plot that looks like so where 138 genes are padj <0.05 between the blue and red conditions.

    I would expect for the blue replicates to be clustered and the red as well. Given that there were a fair amount of significant genes, I think that I a plotting this PCA wrong.

    When I check the columns to make sure I am using the right I get this:
    Code:
    > colData(dds)
    DataFrame with 6 rows and 2 columns
                                                                 condition
                                                                 <factor>
    ID_18_1.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_2.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_3.bam_sorted.bam_htseq_out.txt     ID18
    GP_18_1.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_2.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_3.bam_sorted.bam_htseq_out.txt    GP18
    Is this something to be concerned about or is this the wrong way to plot PCA?

    Thanks in advance
    -R
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    I've had the best results from PCAs based on DESeq2 results when I used the VST and did an additional correction for transcript length (i.e. divide the by the longest transcript per gene in kb). This was before the rlogTransformation was visible/usable, so it might be that rlog works better for that.

    What was your experimental design? Were all these six samples separate biological replicates? It's concerning that your samples are clustering by ID first and by treatment second. In our case samples clustered primarily by cell population first, and by treatment second. If your ID18_X and GP18_X come from the same (or similar) samples, or were sequenced/extracted in batches (we've noticed sequencing batch effects as well), that might explain why they're clustering together.

    As a sanity check for PCAs, it's a good idea to make sure that the data you're generating the PCA from fits a normal distribution. You can do this by running qqnorm(<data>); values should generally be a straight line along the diagonal, usually with a bit of deviation at the extremities. If the qqnorm plot isn't approximately a straight line, then the data will need additional normalisation applied before running a PCA.

    Comment

    • ronaldrcutler
      Member
      • May 2016
      • 80

      #3
      Hi Gringer,

      The ID18_x were biological replicates from the same batch and GP18_x were biological replicates from the same batch. They did not come from the same samples or the same batch.

      I was not able to generate a distribution with the regularized log transformation (unsure how to extract the values from the data.frame), but came up with a plot of variance over the read counts which shows that there does not seem to be a dependence of the variance on the mean.

      Comment

      • gringer
        David Eccles (gringer)
        • May 2011
        • 845

        #4
        By plotting the rank (assuming you have actually plotted the rank), you've removed any parametric factors from the plot. If you're doing a PCA on the rank then this would be fine, but I suspect your PCA is being done on something else. You need to make sure that the same values are plotted that are observed by the PCA calculation.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        25 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        31 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        23 views
        0 reactions
        Last Post SEQadmin2  
        Working...