Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2 PCA Plots

    Hello all,

    I am running DESeq2 like so in R:
    Code:
    library(DESeq2)
    sTable = data.frame(sampleName = files, fileName = files, condition = cond)
    dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~condition)
    dds <- DESeq(dds)
    res <- results(dds)
    resOrdered <- res[order(res$padj),]
    rld <- rlogTransformation(dds, blind=TRUE)
    print(plotPCA(rld, intgroup="condition"))
    And I am getting a PCA plot that looks like so where 138 genes are padj <0.05 between the blue and red conditions.

    I would expect for the blue replicates to be clustered and the red as well. Given that there were a fair amount of significant genes, I think that I a plotting this PCA wrong.

    When I check the columns to make sure I am using the right I get this:
    Code:
    > colData(dds)
    DataFrame with 6 rows and 2 columns
                                                                 condition
                                                                 <factor>
    ID_18_1.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_2.bam_sorted.bam_htseq_out.txt     ID18
    ID_18_3.bam_sorted.bam_htseq_out.txt     ID18
    GP_18_1.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_2.bam_sorted.bam_htseq_out.txt    GP18
    GP_18_3.bam_sorted.bam_htseq_out.txt    GP18
    Is this something to be concerned about or is this the wrong way to plot PCA?

    Thanks in advance
    -R

  • #2
    I've had the best results from PCAs based on DESeq2 results when I used the VST and did an additional correction for transcript length (i.e. divide the by the longest transcript per gene in kb). This was before the rlogTransformation was visible/usable, so it might be that rlog works better for that.

    What was your experimental design? Were all these six samples separate biological replicates? It's concerning that your samples are clustering by ID first and by treatment second. In our case samples clustered primarily by cell population first, and by treatment second. If your ID18_X and GP18_X come from the same (or similar) samples, or were sequenced/extracted in batches (we've noticed sequencing batch effects as well), that might explain why they're clustering together.

    As a sanity check for PCAs, it's a good idea to make sure that the data you're generating the PCA from fits a normal distribution. You can do this by running qqnorm(<data>); values should generally be a straight line along the diagonal, usually with a bit of deviation at the extremities. If the qqnorm plot isn't approximately a straight line, then the data will need additional normalisation applied before running a PCA.

    Comment


    • #3
      Hi Gringer,

      The ID18_x were biological replicates from the same batch and GP18_x were biological replicates from the same batch. They did not come from the same samples or the same batch.

      I was not able to generate a distribution with the regularized log transformation (unsure how to extract the values from the data.frame), but came up with a plot of variance over the read counts which shows that there does not seem to be a dependence of the variance on the mean.

      Comment


      • #4
        By plotting the rank (assuming you have actually plotted the rank), you've removed any parametric factors from the plot. If you're doing a PCA on the rank then this would be fine, but I suspect your PCA is being done on something else. You need to make sure that the same values are plotted that are observed by the PCA calculation.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 07-19-2024, 07:20 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-16-2024, 05:49 AM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-15-2024, 06:53 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X