Hello,
I have relative isoform usage data (isoform proportions) for two conditions (Non-Infected and Infected) for some genes. I have studied (using statistical hypothesis testing) the differential isoform usage for this data; i.e., whether the relative isoform usage is statistically different after infection, compared to the control (Non-Infected), for each gene. This is a bit different from differential isoform expression, since I'm dealing with isoform usage proportions and not the actual read counts for each isoform.
For a target gene g with K isoform, I have a vector of size K, where its ith element is the relative usage for isoform i for gene g.
I want to perform a principal component analysis on this data to see whether the first (or second) principal component separates the data into two groups based on conditions (Non-Infected and Infected).
Does anyone know how this can in done in R? Considering the data points are vectors for "each gene and each sample", and also the fact that the vector elements sum up to 1 (and thus are dependent on one another), I can't use the usual PCA method in R.
I have found that the package robCompositions can do PCA for compositional data, but there's no detailed documentation for it. My main problem is how to draw a PC plot of data (PC2 vs PC1), which shows the clustering of data based on condition. Can this be done the same way as in prcomp(); i.e., using scores?
I'd appreciate any help.
Thanks,
Golsheed
I have relative isoform usage data (isoform proportions) for two conditions (Non-Infected and Infected) for some genes. I have studied (using statistical hypothesis testing) the differential isoform usage for this data; i.e., whether the relative isoform usage is statistically different after infection, compared to the control (Non-Infected), for each gene. This is a bit different from differential isoform expression, since I'm dealing with isoform usage proportions and not the actual read counts for each isoform.
For a target gene g with K isoform, I have a vector of size K, where its ith element is the relative usage for isoform i for gene g.
I want to perform a principal component analysis on this data to see whether the first (or second) principal component separates the data into two groups based on conditions (Non-Infected and Infected).
Does anyone know how this can in done in R? Considering the data points are vectors for "each gene and each sample", and also the fact that the vector elements sum up to 1 (and thus are dependent on one another), I can't use the usual PCA method in R.
I have found that the package robCompositions can do PCA for compositional data, but there's no detailed documentation for it. My main problem is how to draw a PC plot of data (PC2 vs PC1), which shows the clustering of data based on condition. Can this be done the same way as in prcomp(); i.e., using scores?
I'd appreciate any help.
Thanks,
Golsheed
Comment