I am running a series of Tuxedo pipelines and have run into the following error message a number of times in CuffDiff "cannot open reference GTF file".
The GTF file was generated in CuffMerge and looks completely normal. Here are some lines to illustrate this:
Ch2 Cufflinks exon 2735 6095 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 6172 6607 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 7279 7433 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 7494 10061 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "4"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
I can find no error in my input command either:
cuffdiff \
-o "/gpfs/group1/f/flyinv/Outputs_CuffDiff/ExonsGenes/SCJR32_a" \
-L SC_JR32_Male,SC_JR32_Female,SC_JR32_Larvae \
--total-hits-norm \
--frag-bias-correct /gpfs/group1/f/flyinv/working_index/Dpse3_0_1.fa \
--multi-read-correct \
--library-norm-method classic-fpkm \
/gpfs/group1/f/flyinv/Outputs_CuffMerge/exonGene/SCR32_a/merged.gtf \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Male/accepted_hits.bam \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Female/accepted_hits.bam \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Larvae/accepted_hits.bam
I have found other threads where the same problem has been encountered, but there seems to be no real explanation or solution:
One possible cause of this problem could be the GFF3 and GTF files used to guide the upstream TopHat and Cufflinks and CuffMerge analyses. TopHat and CuffLinks were run using a GFF3 file that contained the known gene and exon annotations for the target species. A GFF file generated by TopHat from the same data was used to guide the CuffMerge analyses. I don't see what difference this would make, but could this have somehow caused the error?
The GTF file was generated in CuffMerge and looks completely normal. Here are some lines to illustrate this:
Ch2 Cufflinks exon 2735 6095 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 6172 6607 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 7279 7433 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
Ch2 Cufflinks exon 7494 10061 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "4"; oId "CUFF.3.1"; class_code "u"; tss_id "TSS1";
I can find no error in my input command either:
cuffdiff \
-o "/gpfs/group1/f/flyinv/Outputs_CuffDiff/ExonsGenes/SCJR32_a" \
-L SC_JR32_Male,SC_JR32_Female,SC_JR32_Larvae \
--total-hits-norm \
--frag-bias-correct /gpfs/group1/f/flyinv/working_index/Dpse3_0_1.fa \
--multi-read-correct \
--library-norm-method classic-fpkm \
/gpfs/group1/f/flyinv/Outputs_CuffMerge/exonGene/SCR32_a/merged.gtf \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Male/accepted_hits.bam \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Female/accepted_hits.bam \
/gpfs/group1/f/flyinv/Outputs_TopHat/transcriptiomeSequence_exonsAndGeneAnnotationData/SC_JR32_Larvae/accepted_hits.bam
I have found other threads where the same problem has been encountered, but there seems to be no real explanation or solution:
One possible cause of this problem could be the GFF3 and GTF files used to guide the upstream TopHat and Cufflinks and CuffMerge analyses. TopHat and CuffLinks were run using a GFF3 file that contained the known gene and exon annotations for the target species. A GFF file generated by TopHat from the same data was used to guide the CuffMerge analyses. I don't see what difference this would make, but could this have somehow caused the error?
Comment