cuffcompare or cuffmerge for cuffdiff

Hi ,all
This is an old topic in our community.see here and here
although C.Tapnell recommend cufflinks->cuffmerge->cuffdiff flow for diff exp analysis in hereand this new paper ,I must bring it again,beacause Too much confusion.

I have 3 pair-end samples and hava two targets:

[1] discovery new isoform and there structure
[2] differential gene and transcript exp anlalysis and there structure

tophat+cufflinks has no problem for 3 samples.

for the this two aim.I use coffcompare analyze the transfrags which cufflinks assempled and cuffdiff analyze diff exp.

one flow:

cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
cuffmerge -g known.gtf -s genomic_seq.fa 3_assembly_GTF_list.txt
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 merged.gtf A.bam B.bam C.bam

I use cuffcompare,because cuffcompare output .refmap and tmap for each sample. I can extract every cuff_transcript's ref_gen and region from cufflinks result transcripts.gtf like this:
for transcript ENSMUST00000048860
IN sample A:

Gene_name Transcript_id Class_code Cufflinks_transcript_id FPKM Coverage Transcript_length Ref_Transcript_length Chromosome Strand Start End Exon_num Exon_start-Exon_end;ditto
Mreg ENSMUST00000048860 c Sample_A.442.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 1 72205812-72206054;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;

In sample B:

Mreg ENSMUST00000048860 j Sample_B.478.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
Mreg ENSMUST00000048860 = Sample_B.478.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;

and in compare.tracking

TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.442|Sample_A.442.1|100|9.998678|4.225939|15.771418|41.470300|- - -
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.478|Sample_B.478.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.478|Sample_B.478.1|22|0.355742|0.086913|0.624571|1.460597|- -

and this gene in cuffdiff result(treated):

Tracking_id Gene_id Gene_name Class_code Nearest_ref_id TSS Locus Sample_1 Sample_2 FPKM_1 FPKM_2 Foldchange log2(fold_change) test_stat p_value q_value Significant
TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes

see if I foucus ENSMUST00000048860 due to cuffdiff result based foldchange.I need back compare result find this known transcript matched cufflinks assembled transcripts result to decide the assembled transcripts is known(class code = or c) or novel(class code j).
But the cuffdiff id TCONS_00004275 is not same with cuffcompare TCONS_id and the Locus 1:72205806-72258881 also not same. This make me couldnot find interest ENSMUST00000048860's nearest structure in sample A and SampleB.IS Sample_A.442.1 or Sample_A.444.1 or other?

so I change the workflow (without cuffmerge):

cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 combined.gtfA.bam B.bam C.bam

also use example ENSMUST00000048860
for compare result(treated):
IN sample A:

Mreg ENSMUST00000048860 c Sample_A.443.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 172205812-72206054;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 572206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;

IN sample B:

Mreg ENSMUST00000048860 j Sample_B.479.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
Mreg ENSMUST00000048860 = Sample_B.479.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;

another confused,same cufflink+cuffcompare program but the cuff_id is diff ,Sample_A.442.1 Sample_A.444.1 with Sample_A.443.1 Sample_A.443.1 also in Sample_B

in compare.tracking

TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.443|Sample_A.443.1|100|9.998678|4.225939|15.771418|41.470300|- - -
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.479|Sample_B.479.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.479|Sample_B.479.1|22|0.355742|0.086913|0.624571|1.460597|- -
TCONS_00003426 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.463|Sample_C.463.1|100|2.478294|1.853823|3.102766|10.598946|-
TCONS_00003427 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.464|Sample_C.464.1|100|2.878927|1.557985|4.199870|11.712125|-

this gene in cuffdiff result(treated):

TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes

the ENSMUST00000048860 TCONS_00001025 is same as one of comcompare TCONS_id and i konw it mapped Sample_A.444.1 and Sample_B.479.2. then i can find Sample_A.444.1 and Sample_B.479.2 structure

Strand Start End Exon_num Exon_start-Exon_end;ditto
- 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-- - 72238776;72258591-72258706;
- 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;

Then i can do next analysis

but from this two flow the cuffdiff result are very different about this trascript ENSMUST00000048860
cuffdiff result(treated):

TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes

whatever class code,fpkm,foldchange,and also there are other diff between two pipeline. no same known transcrips in the two cuffdiff result.
I want to know which cuffdiff result is more credible,and how workflow can meet the needs of my analysis.

Thanks
Shen

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: