Hi ,all
This is an old topic in our community.see here and here
although C.Tapnell recommend cufflinks->cuffmerge->cuffdiff flow for diff exp analysis in hereand this new paper ,I must bring it again,beacause Too much confusion.
I have 3 pair-end samples and hava two targets:
tophat+cufflinks has no problem for 3 samples.
for the this two aim.I use coffcompare analyze the transfrags which cufflinks assempled and cuffdiff analyze diff exp.
one flow:
I use cuffcompare,because cuffcompare output .refmap and tmap for each sample. I can extract every cuff_transcript's ref_gen and region from cufflinks result transcripts.gtf like this:
for transcript ENSMUST00000048860
IN sample A:
In sample B:
and in compare.tracking
and this gene in cuffdiff result(treated):
see if I foucus ENSMUST00000048860 due to cuffdiff result based foldchange.I need back compare result find this known transcript matched cufflinks assembled transcripts result to decide the assembled transcripts is known(class code = or c) or novel(class code j).
But the cuffdiff id TCONS_00004275 is not same with cuffcompare TCONS_id and the Locus 1:72205806-72258881 also not same. This make me couldnot find interest ENSMUST00000048860's nearest structure in sample A and SampleB.IS Sample_A.442.1 or Sample_A.444.1 or other?
so I change the workflow (without cuffmerge):
also use example ENSMUST00000048860
for compare result(treated):
IN sample A:
IN sample B:
another confused,same cufflink+cuffcompare program but the cuff_id is diff ,Sample_A.442.1 Sample_A.444.1 with Sample_A.443.1 Sample_A.443.1 also in Sample_B
in compare.tracking
this gene in cuffdiff result(treated):
the ENSMUST00000048860 TCONS_00001025 is same as one of comcompare TCONS_id and i konw it mapped Sample_A.444.1 and Sample_B.479.2. then i can find Sample_A.444.1 and Sample_B.479.2 structure
Then i can do next analysis
but from this two flow the cuffdiff result are very different about this trascript ENSMUST00000048860
cuffdiff result(treated):
whatever class code,fpkm,foldchange,and also there are other diff between two pipeline. no same known transcrips in the two cuffdiff result.
I want to know which cuffdiff result is more credible,and how workflow can meet the needs of my analysis.
Thanks
Shen
This is an old topic in our community.see here and here
although C.Tapnell recommend cufflinks->cuffmerge->cuffdiff flow for diff exp analysis in hereand this new paper ,I must bring it again,beacause Too much confusion.
I have 3 pair-end samples and hava two targets:
[1] discovery new isoform and there structure
[2] differential gene and transcript exp anlalysis and there structure
[2] differential gene and transcript exp anlalysis and there structure
for the this two aim.I use coffcompare analyze the transfrags which cufflinks assempled and cuffdiff analyze diff exp.
one flow:
cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
cuffmerge -g known.gtf -s genomic_seq.fa 3_assembly_GTF_list.txt
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 merged.gtf A.bam B.bam C.bam
cuffmerge -g known.gtf -s genomic_seq.fa 3_assembly_GTF_list.txt
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 merged.gtf A.bam B.bam C.bam
for transcript ENSMUST00000048860
IN sample A:
Gene_name Transcript_id Class_code Cufflinks_transcript_id FPKM Coverage Transcript_length Ref_Transcript_length Chromosome Strand Start End Exon_num Exon_start-Exon_end;ditto
Mreg ENSMUST00000048860 c Sample_A.442.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 1 72205812-72206054;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
Mreg ENSMUST00000048860 c Sample_A.442.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 1 72205812-72206054;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
Mreg ENSMUST00000048860 j Sample_B.478.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
Mreg ENSMUST00000048860 = Sample_B.478.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
Mreg ENSMUST00000048860 = Sample_B.478.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.442|Sample_A.442.1|100|9.998678|4.225939|15.771418|41.470300|- - -
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.478|Sample_B.478.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.478|Sample_B.478.1|22|0.355742|0.086913|0.624571|1.460597|- -
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.478|Sample_B.478.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.478|Sample_B.478.1|22|0.355742|0.086913|0.624571|1.460597|- -
Tracking_id Gene_id Gene_name Class_code Nearest_ref_id TSS Locus Sample_1 Sample_2 FPKM_1 FPKM_2 Foldchange log2(fold_change) test_stat p_value q_value Significant
TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
But the cuffdiff id TCONS_00004275 is not same with cuffcompare TCONS_id and the Locus 1:72205806-72258881 also not same. This make me couldnot find interest ENSMUST00000048860's nearest structure in sample A and SampleB.IS Sample_A.442.1 or Sample_A.444.1 or other?
so I change the workflow (without cuffmerge):
cuffcompare -o compare -s genomic_seq.fa -r known.gtf tanscriptA.gtf transcriptB.gtf transcriptC.gtf
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 combined.gtfA.bam B.bam C.bam
cuffdiff -o -b genomic_seq.fa -L A,B,C -u -p 6 combined.gtfA.bam B.bam C.bam
for compare result(treated):
IN sample A:
Mreg ENSMUST00000048860 c Sample_A.443.1 9.998678 41.470300 243 2493 1 . 72205812 72206054 172205812-72206054;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 572206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
Mreg ENSMUST00000048860 = Sample_A.444.1 25.753304 108.941130 1695 2493 1 - 72206430 72258706 572206430-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258706;
Mreg ENSMUST00000048860 j Sample_B.479.1 0.355742 1.460597 1682 2493 1 - 72206370 72243058 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72243016-72243058;
Mreg ENSMUST00000048860 = Sample_B.479.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
Mreg ENSMUST00000048860 = Sample_B.479.2 1.652110 6.783196 1742 2493 1 - 72206370 72258693 572206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
in compare.tracking
TCONS_00001024 XLOC_000542 Mreg|ENSMUST00000048860 c q1:Sample_A.443|Sample_A.443.1|100|9.998678|4.225939|15.771418|41.470300|- - -
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.479|Sample_B.479.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.479|Sample_B.479.1|22|0.355742|0.086913|0.624571|1.460597|- -
TCONS_00003426 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.463|Sample_C.463.1|100|2.478294|1.853823|3.102766|10.598946|-
TCONS_00003427 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.464|Sample_C.464.1|100|2.878927|1.557985|4.199870|11.712125|-
TCONS_00001025 XLOC_000542 Mreg|ENSMUST00000048860 = q1:Sample_A.444|Sample_A.444.1|100|25.753304|24.064335|27.442272|108.941130|1695 q2:Sample_B.479|Sample_B.479.2|100|1.652110|1.194891|2.109330|6.783196|1742 -
TCONS_00002413 XLOC_000542 Mreg|ENSMUST00000048860 j - q2:Sample_B.479|Sample_B.479.1|22|0.355742|0.086913|0.624571|1.460597|- -
TCONS_00003426 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.463|Sample_C.463.1|100|2.478294|1.853823|3.102766|10.598946|-
TCONS_00003427 XLOC_000542 Mreg|ENSMUST00000048860 c - - q3:Sample_C.464|Sample_C.464.1|100|2.878927|1.557985|4.199870|11.712125|-
TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes
the ENSMUST00000048860 TCONS_00001025 is same as one of comcompare TCONS_id and i konw it mapped Sample_A.444.1 and Sample_B.479.2. then i can find Sample_A.444.1 and Sample_B.479.2 structure
Strand Start End Exon_num Exon_start-Exon_end;ditto
- 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-- - 72238776;72258591-72258706;
- 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
- 72206430 72258706 5 72206430-72207593;72208896-72209059;72210646-72210736;72238617-- - 72238776;72258591-72258706;
- 72206370 72258693 5 72206370-72207593;72208896-72209059;72210646-72210736;72238617-72238776;72258591-72258693;
but from this two flow the cuffdiff result are very different about this trascript ENSMUST00000048860
cuffdiff result(treated):
Tracking_id Gene_id Gene_name Class_code Nearest_ref_id TSS Locus Sample_1 Sample_2 FPKM_1 FPKM_2 Foldchange log2(fold_change) test_stat p_value q_value Significant
TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
TCONS_00004275 XLOC_001277 Mreg j ENSMUST00000048860 TSS2418 1:72205806-72258881 sample_A sample_B 8.99002 0.315128 0.0350531 -4.83432 3.98775 6.67042e-05 0.00727599 yes
TCONS_00001025 XLOC_000542 Mreg = ENSMUST00000048860 TSS1655 1:72206327-72258693 sample_A sample_B 18.4693 1.30708 0.0707704 -3.82072 3.82424 0.000131174 0.0148275 yes
I want to know which cuffdiff result is more credible,and how workflow can meet the needs of my analysis.
Thanks
Shen
Comment