Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to cluster samples with PCA in R?

    Hi,

    I have RNA-seq data for 16 mouse samples. I would like to cluster cufflinks results of these samples by PCA in R. The data looks like this:


    PHP Code:
    gene_id    gene_short_name    FPKM_101    FPKM_102    FPKM_103    FPKM_104    FPKM_105    FPKM_106    FPKM_107    FPKM_108    FPKM_109    FPKM_110    FPKM_111    FPKM_112    FPKM_113    FPKM_114    FPKM_115    FPKM_116
    uc007aeu.1    
    -    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    uc007aev.1    
    -    0    0    0.0095358    0.0095358    0.011704    1.48E-10    2.05E-63    0.0083273    0.014457    0.0068505    0.0053635    0.022235    0.0047757    0.018794    0    0.01661
    uc007aew.1    
    -    0    0    0    0    0    1.2568    0.27389    0    0    0    0    0    0    0    0    0
    uc007aex.2    
    -    0    0    0    0    0    7.1538    0.0096687    0    0    0    0    0    0    0    0.0050925    0
    uc007aey.1    
    -    8.27E-07    0.00049201    0.00043141    0.00043141    0.00079353    0    0.00074324    1.56E-09    3.39E-20    1.16E-61    1.80E-20    1.72E-09    1.56E-96    5.13E-07    5.34E-07    4.78E-07
    uc007afb.1    
    -    0.40549    2.08E-248    1.11E-19    1.11E-19    2.40E-93    0    0.49777    0.10711    1.22E-12    0.014644    6.48E-13    0.11777    0.02695    0.25169    0.08951    0.080144
    uc007afc.1    
    -    1.93E-06    0.38845    0.34061    0.34061    0.31315    0    0    5.01E-09    1.04E-19    0.00046243    5.53E-20    5.51E-09    0.17202    1.20E-06    1.16E-06    1.04E-06 
    Commands in R:
    PHP Code:
    data=read.csv('raw_cuff_data.csv'header=TRUE)
    data_pca <- prcomp(data[, 3:18]) 
    How should I plot them in order to see clustered samples. I know my pca data are in data_pca$x, but how should I cluster them? and plot one point for each sample?

    Thanks

  • #2
    Originally posted by rozitaa View Post
    Hi,

    I have RNA-seq data for 16 mouse samples. I would like to cluster cufflinks results of these samples by PCA in R. The data looks like this:

    How should I plot them in order to see clustered samples. I know my pca data are in data_pca$x, but how should I cluster them? and plot one point for each sample?

    Thanks
    Hi - Try something along these lines...

    Code:
    ## Test data:
    dat<- matrix(data= rnorm(n= 10000), ncol= 10)
    colnames(dat)<- paste('sample_', 1:ncol(dat), sep= '')
    
    ## PCA
    pcaResult<-prcomp(t(dat))
    
    ## Set up plot
    plot(pcaResult$x,
        main= 'Principal components of samples',
        xlab= sprintf('PC1 (sd: %s%%)', round(100 * (pcaResult$sdev[1] / sum(pcaResult$sdev)))),
        ylab= sprintf('PC2 (sd: %s%%)', round(100 * (pcaResult$sdev[2] / sum(pcaResult$sdev)))),
        type= 'n'
    )
    
    ## Plot labels
    text(x= pcaResult$x[,1], y= pcaResult$x[,2], labels= rownames(pcaResult$x), cex= 0.5)
    Samples similar to each other will group together in the plot but keep in mind that PCA doesn't do any clustering.

    Dario

    Comment


    • #3
      Thanks, a lot. I also transposed my data and I got the exact pca plot that I wanted. I just wasn't sure if that's the correct way.

      Comment


      • #4
        There is a nice PCA generated in the DESeq2 vignette that you could look at and apply to your cufflinks data.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X