Hello,
I've been using the Tuxedo suite to analyze my time series differential expression and so far I'm pretty taken by cummeRbund especially. I have a few questions and maybe a feature suggestion:
1) it would be REALLY useful if findSimilar could be used not only for finding similarly expressed genes, but also isoforms, TSS and CDS. So far I managed to "hack" it by copying the data from files: gene_exp.diff, genes.count_tracking, genes.fpkm_tracking, genes.read_group_tracking to the files for isoforms. I search for similar "genes", but it returns the isoforms, which is what I wanted.
2) (using the hacked data) Is it possible to get a dataframe with JSdistances and full gene names? I managed to only find them out using featureNames like this:
this gets me 2 files that I can sort by tracking_id and make a sweet spreadsheet (I'm pretty new to R and I started with cummeRbund, I don't know yet how to sort them and combine them directly in R. I would welcome any help). Can you think of a better way to do this?
3) Why can't I use fullnames=T with findSimilar for when using the option returnGeneSet=F? This would make the problem non-existent.
4) How can I get the significance levels for the comparisons of the whole transcription profiles and not only the differential expression of one gene between samples? Is this even feasible, since there are so many genes, and the number of comparisons would have to be a square of the number of profiles (from what I understand)?
I've been using the Tuxedo suite to analyze my time series differential expression and so far I'm pretty taken by cummeRbund especially. I have a few questions and maybe a feature suggestion:
1) it would be REALLY useful if findSimilar could be used not only for finding similarly expressed genes, but also isoforms, TSS and CDS. So far I managed to "hack" it by copying the data from files: gene_exp.diff, genes.count_tracking, genes.fpkm_tracking, genes.read_group_tracking to the files for isoforms. I search for similar "genes", but it returns the isoforms, which is what I wanted.
2) (using the hacked data) Is it possible to get a dataframe with JSdistances and full gene names? I managed to only find them out using featureNames like this:
Code:
myGene=getGene(cuff,'<tracking_id>') #<gene_short_name> returns all the isoforms (obviously), you have to use getGenes() then sim_to_myGene=findSimilar(cuff,'<tracking_id>',distThresh=0.1) sim_to_myGene_dataframe=findSimilar(cuff,'tracking_id',distThresh=0.1,returnGeneSet=F) write.table(sim_to_myGene_dataframe,'sim_to_myGene.tsv',sep='\t',quote=F) write.table(featureNames(sim_to_myGene),'sim_to_myGene_names.tsv',sep='\t',quote=F)
3) Why can't I use fullnames=T with findSimilar for when using the option returnGeneSet=F? This would make the problem non-existent.
4) How can I get the significance levels for the comparisons of the whole transcription profiles and not only the differential expression of one gene between samples? Is this even feasible, since there are so many genes, and the number of comparisons would have to be a square of the number of profiles (from what I understand)?