Unconfigured Ad

**Skiaphrene** · 04-27-2015, 06:11 PM

Hi Golsheed,

I haven't used the robCompositions package but it looks very interesting. Can't you use the plot.pcaCoDa() function on the output of pcaCoDa() (which is what I'm assuming you're using to do the PCA) to generate sample & variable plots? I've tried out the example in the documentation and it generates a plot that overlays the variables on top of the sample plot, so you should be able to see your clustering if it is there. Either way, in the pcaCoDa object returned by the pcaCoDa() function, the scores element does seem to contain the sample coordinates in PC space, and the loadings element does seem to contain the variable coordinates in PC space., if you need to plot each separately (which you may if you have many isoforms).

On a more speculative note, perhaps it would be possible to arrange your data into a single data frame (with samples per rows and isoform proportions in columns) and run robCompositions separately on each isoform separately, retain the returned PCs, and run a specially-weighted PCA on these PCs, in a way similar to what a "Multiple Factor Analysis" does, c.f. function MFA() in package FactoMineR. This would ensure that each isoform would contribute equally to the overall analysis.

Hope this helps,

-- Alex

**Golsheed** · 04-28-2015, 05:21 AM

Thanks so much for your help, Alex.

A few questions if you don't mind:

(1) I have many isoforms for each gene (ranging from 2 isoforms to 20-30), do you think the PC plot would make more sense if I plot if for each isoform separately? i.e., one PC plot for isoform one, and so on. Is that what you meant in the first paragraph?

(2) I'm not familiar with multiple factor analysis, do you mind elaborating a bit more about it and also how to do the weighting? or refer me to a paper or something so I can get a better idea of it.

Thanks so much,
Golsheed

**Skiaphrene** · 04-28-2015, 02:44 PM

Originally posted by Golsheed View Post

(1) I have many isoforms for each gene (ranging from 2 isoforms to 20-30), do you think the PC plot would make more sense if I plot if for each isoform separately? i.e., one PC plot for isoform one, and so on. Is that what you meant in the first paragraph?

=> This isn't what I had in mind in the first paragraph... My understanding is that you have several samples for which you have isoform proportions for multiple genes. I was assuming you wanted to run the proportions PCA across all isoforms for all genes at once. This should be possible at least in theory for a normal PCA, but I don't know how it would work out with the proportions PCA, as doing it across all isoforms and all genes at once means that the sum of all proportions is not one (rather it is the number of genes). This kind of analysis would not be able to take into account that the various proportions can be grouped by gene, which is where my idea of a "proportions MFA" came in. Anyway...

=> ...Coming back to your question above, you could run a proportions PCA on each gene individually and generate plots for each (note: you'll have to check if a proportion PCA needs at least 3 proportion variables to make 2 PCs - a normal PCA does). This would highlight things like sample clustering and dimensions of major variability for each gene separately. I don't know how useful that would be.

Originally posted by Golsheed View Post

(2) I'm not familiar with multiple factor analysis, do you mind elaborating a bit more about it and also how to do the weighting? or refer me to a paper or something so I can get a better idea of it.

=> If you're in a purely numeric variable setting, then an MFA is like a PCA of PCAs, and is useful for highlighting variability patterns shared across multiple groups of variables. Here those groups would be your genes, and the variables would be the proportions of each gene's isoforms.

=> I'm sorry but I can't remember the exact weighting scheme.

=> You can read up more about MFA here:
- on this page of FactoMineR's website: http://factominer.free.fr/advanced-m...-analysis.html
- through the references given in the MFA function documentation in the FactoMineR package:
Escofier, B. and Pages, J. (1994) Multiple Factor Analysis (AFMULT package). Computational Statistics and Data Analysis, 18, 121-140.
Becue-Bertaut, M. and Pages, J. (2008) Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data. Computational Statistice and Data Analysis, 52, 3255-3268.
(the latter one can be found on ResearchGate: http://www.researchgate.net/publicat...frequency_data)

=> With the concept of MFA in mind, I can imagine running a proportions PCA on each gene separately, then taking the sample coordinates for the first 2 or three PCs returned, and doing a normal PCA on that. If you take the same number of PCs for each gene then you shouldn't have to worry about any special weighting.

Let me know what you think!

Best,

-- Alex

**Golsheed** · 04-28-2015, 03:23 PM

Thanks a bunch.

So here's what you're proposing in short:
(1) doing a normal PCA on each gene separately
(2) constructing a matrix where each row corresponds to a sample and the columns are as follows:

sample, gene1_PC1, gene1_PC2, gene2_PC1, gene2_PC2, gene3_PC1, ...

and doing a normal PCA on that, right?

Just to make sure I got things right, by "sample coordinates for the first 2 or three PCs" you mean the scores?

Thanks,
Golsheed

**Skiaphrene** · 04-28-2015, 04:07 PM

You're welcome!

Originally posted by Golsheed View Post

(1) doing a normal PCA on each gene separately
(2) constructing a matrix where each row corresponds to a sample and the columns are as follows:

sample, gene1_PC1, gene1_PC2, gene2_PC1, gene2_PC2, gene3_PC1, ...

and doing a normal PCA on that, right?

=> (1) well as you pointed out previously I'm not sure a normal PCA will work on the isoform proportions for each gene, as the variables are related (sum to 1). The proportions PCA from robComposition sounds like a method designed to be able to handle this, so maybe you should do a proportions PCA on each gene rather than a normal PCA.

=> (2) yes, this is what I had in mind. Since the PC coordinates are no longer proportions, a normal PCA across this should be fine.

=> If your data hadn't been proportions then a normal MFA would have sufficed!

Good luck!

Best,

-- Alex

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 106 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

PCA for compositional data and relative isoform usage

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News