Dear community,
A similar question was asked here previously and no answers were posted. I'll expand that question.
I'm trying to better understand the cufflinks --> cuffdiff workflow. Once I run cufflinks on each of my .bam files (from tophat), I have a separate .gtf assembly for each sample. To run cuffdiff I need a single unified .gtf file of my assembled transcripts.
So should I use the merged.gtf file produced by cuffmerge or the combined.gtf file produced by cuffcompare? How are these two files different, and what would be the downstream effect of using one or the other for differential expression in cuffdiff?
EDIT: Or would a better workflow be to forego cuffmerge/cuffcompare altogether in favor of running cufflinks on a merge of all the .bam files to generate a single assembly that maximizes assembly accuracy, and use this as the "reference" for cuffdiff? (E.g. samtools merge)
More info:
When running cuffmerge I run into a problem described here previously but not fully resolved ("/lib64/libz.so.1: no version information available" and "File ./merged_asm/tmp/mergeSam_fileBpOwTS doesn't appear to be a valid BAM file"). Cuffmerge still produced the merged.gtf file, but I'm concerned about continuing with cuffdiff without knowing what these cuffmerge errors are about. I have no problem running cuffcompare to get a combined.gtf file.
From the cuffcompare documentation:
From the cuffmerge documentation:
A similar question was asked here previously and no answers were posted. I'll expand that question.
I'm trying to better understand the cufflinks --> cuffdiff workflow. Once I run cufflinks on each of my .bam files (from tophat), I have a separate .gtf assembly for each sample. To run cuffdiff I need a single unified .gtf file of my assembled transcripts.
So should I use the merged.gtf file produced by cuffmerge or the combined.gtf file produced by cuffcompare? How are these two files different, and what would be the downstream effect of using one or the other for differential expression in cuffdiff?
EDIT: Or would a better workflow be to forego cuffmerge/cuffcompare altogether in favor of running cufflinks on a merge of all the .bam files to generate a single assembly that maximizes assembly accuracy, and use this as the "reference" for cuffdiff? (E.g. samtools merge)
More info:
When running cuffmerge I run into a problem described here previously but not fully resolved ("/lib64/libz.so.1: no version information available" and "File ./merged_asm/tmp/mergeSam_fileBpOwTS doesn't appear to be a valid BAM file"). Cuffmerge still produced the merged.gtf file, but I'm concerned about continuing with cuffdiff without knowing what these cuffmerge errors are about. I have no problem running cuffcompare to get a combined.gtf file.
From the cuffcompare documentation:
Cuffcompare clusters/tracks transfrags across samples, and writes a GTF
file <outprefix>.combined.gtf containing a nonredundant set of transcripts
across all input files (with a single representative transfrag chosen
for each clique of matching transfrags across samples).
file <outprefix>.combined.gtf containing a nonredundant set of transcripts
across all input files (with a single representative transfrag chosen
for each clique of matching transfrags across samples).
cuffmerge takes two or more Cufflinks GTF files and merges them into a
single unified transcript catalog. Optionally, you can provide the script
with a reference GTF, and the script will use it to attach gene names and other metadata to the merged catalog.
single unified transcript catalog. Optionally, you can provide the script
with a reference GTF, and the script will use it to attach gene names and other metadata to the merged catalog.
Comment