HI everyone,
Many apologies if I'm duplicating, I have searched the forums, google, can't find the specific answer.
So- I've performed my mRNAseq experiment, used the workflow:
cufflinks->cuffmerge->cuffquant->cuffdiff
then used cummeRbund to look at the results.
From cummeRbund I've generated a list of differentially expressed genes.
What I'd like to do now is look at the sequence of the genes to see what type of things are differentially expressed (have done a brief GO analysis, would like to search HMM profiles for protein motifs).
I tried to output the sequences from my merged.gtf file (generated by cuffmerge) using gffread. I can get them to output, but I would really, REALLY, like the gene_id "XLOC_*****" number to be in the fasta header. But it seems that whatever I do, I can't get it out there. I can get almost every single other piece of info from the gtf file there using one or other of the gffread options, but not this.
Clearly it wouldn't be so hard to write my own script to do this, but I'm under time pressure, and I've leaernt the hard way that duplicating others efficient tools is foolhardy.
So- am I missing the crucial option here? Or do folks do this (outputting differentially exporessed gene sequences from mRNAseq expts) iin a different way?
I do have the gene IDs of the annotated genes in the fasta header, but there are some novel/intergenic/anomalous genes which are only really iddentifiable by "XLOC****"
Many thanks for your help
Matt
Many apologies if I'm duplicating, I have searched the forums, google, can't find the specific answer.
So- I've performed my mRNAseq experiment, used the workflow:
cufflinks->cuffmerge->cuffquant->cuffdiff
then used cummeRbund to look at the results.
From cummeRbund I've generated a list of differentially expressed genes.
What I'd like to do now is look at the sequence of the genes to see what type of things are differentially expressed (have done a brief GO analysis, would like to search HMM profiles for protein motifs).
I tried to output the sequences from my merged.gtf file (generated by cuffmerge) using gffread. I can get them to output, but I would really, REALLY, like the gene_id "XLOC_*****" number to be in the fasta header. But it seems that whatever I do, I can't get it out there. I can get almost every single other piece of info from the gtf file there using one or other of the gffread options, but not this.
Clearly it wouldn't be so hard to write my own script to do this, but I'm under time pressure, and I've leaernt the hard way that duplicating others efficient tools is foolhardy.
So- am I missing the crucial option here? Or do folks do this (outputting differentially exporessed gene sequences from mRNAseq expts) iin a different way?
I do have the gene IDs of the annotated genes in the fasta header, but there are some novel/intergenic/anomalous genes which are only really iddentifiable by "XLOC****"
Many thanks for your help
Matt
Comment