Hey, does anyone have any pointers, advice, or experience on modifying GTF files for use with cufflinks??? (v 2.0.2)![Confused](https://www.seqanswers.com/core/images/smilies/confused.png)
In the course of examining RNA-seq data and performing RNA-seq data analysis, an issue I've run into (using the "tuxedo" software/pipeline of tophat->cufflinks) is that tophat maps to apparent non-coding regions (possibly regulatory) but that cufflinks won't indicate FPKM expressions for the pileups!
So a strategy we are trying, whose goal is to trigger cufflinks to tell FPKM expression values, is to either modify or create GTF annotation data and tell tophat/cufflinks to *not* try to find novel transcripts while using the created/modified GTF so that cufflinks might give FPKM values!
One strategy we tried is to create a GTF with features/annotations corresponding to the regions of interest. Created as "pseudogene exons" (in columns 2 and 3), and using exsiting ensemble geneIDs, but custom transcript_ids we fed the GTF to cufflinks. When cufflinks program execution got to the "Loading Annotation" part (at the beginning of the run) it crashed with a segmentation fault!
In the attribute column (#9), no information besides the gene_id and transcript_id was provided! cufflinks may have crashed because no gene_name was given. We really don't know however!
Another strategy we are currently trying is to *modify* an existing GTF (from illumina/igenomes/ensemble) that *modify* work with cufflinks. This time, to capture regions upstream and downstream of genes, for each geneid, we modify the lowest start-value over all annotations by decreasing it by 1000 (to *hopefully* capture expressions of regions upstream). Similarly, we modify the highest end-value by increasing it by 1000 to *hopefully* capture expressions of regions downstream. This is currently going on now, so I don't know if the run will work, end successfully, and give us the expression/FPKM values/numbers we are looking for....
Any pointers, advice, experience, knowledge, insight, etc. with GTF file tweaking for cufflinks would be appreciated!![Smile](https://www.seqanswers.com/core/images/smilies/smile.png)
We are using tophat v2.0.4 and cufflinks v2.0.2 by the way.
thanks
-Eddie
![Confused](https://www.seqanswers.com/core/images/smilies/confused.png)
In the course of examining RNA-seq data and performing RNA-seq data analysis, an issue I've run into (using the "tuxedo" software/pipeline of tophat->cufflinks) is that tophat maps to apparent non-coding regions (possibly regulatory) but that cufflinks won't indicate FPKM expressions for the pileups!
![Confused](https://www.seqanswers.com/core/images/smilies/confused.png)
One strategy we tried is to create a GTF with features/annotations corresponding to the regions of interest. Created as "pseudogene exons" (in columns 2 and 3), and using exsiting ensemble geneIDs, but custom transcript_ids we fed the GTF to cufflinks. When cufflinks program execution got to the "Loading Annotation" part (at the beginning of the run) it crashed with a segmentation fault!
![Frown](https://www.seqanswers.com/core/images/smilies/frown.png)
Another strategy we are currently trying is to *modify* an existing GTF (from illumina/igenomes/ensemble) that *modify* work with cufflinks. This time, to capture regions upstream and downstream of genes, for each geneid, we modify the lowest start-value over all annotations by decreasing it by 1000 (to *hopefully* capture expressions of regions upstream). Similarly, we modify the highest end-value by increasing it by 1000 to *hopefully* capture expressions of regions downstream. This is currently going on now, so I don't know if the run will work, end successfully, and give us the expression/FPKM values/numbers we are looking for....
Any pointers, advice, experience, knowledge, insight, etc. with GTF file tweaking for cufflinks would be appreciated!
![Smile](https://www.seqanswers.com/core/images/smilies/smile.png)
We are using tophat v2.0.4 and cufflinks v2.0.2 by the way.
thanks
-Eddie
Comment