Hi all,
Like many I am trying to replicate the TopHat/cufflinks/Cumerbund protocol recently published and after sorting out a series of errors, I've finally managed to see the results with Cumerbund. The problem is that these do not match those of the paper. Could you please help me identifying where the problem might come from?
Software/script - underlined are the changes that were made to the original protocol so that it could run:
bowtie 0.12.7
tophat 2.0.3
tophat --bowtie1 -p 20 -G genes.gtf -r 180 -o genome C1_R1_1.fq C1_R1_2.fq
(..)
cufflinks-1.2.1
cufflinks -p 20 -o C1_R1_clout C1_R1_thout/accepted_hits.bam
cuffmerge -g genes.gtf -s genome.fa -p 20 assemblies.txt
cuffdiff -o diff_out -b genome.fa -p 20 -L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam
Results
The distribution of expression levels for each sample is:
But should have been (from the paper):
The number of genes and transcripts that are differentially expressed between two samples are also different:
> cuff_data
CuffSet instance with:
2 samples
2663 genes
4376 isoforms
3086 TSS
1427 CDS
2663 promoters
3086 splicing
1304 relCDS
> gene_diff_data <- diffData(genes(cuff_data))
> sig_gene_data <- subset(gene_diff_data, (significant == 'yes'))
> nrow(sig_gene_data)
[1] 1117
> cuff_data_tux <- readCufflinks('/home/domingue/TuxedoTestData/diff_out')
> cuff_data #my analysis
CuffSet instance with:
2 samples
2663 genes
4376 isoforms
3086 TSS
1427 CDS
2663 promoters
3086 splicing
1304 relCDS
> cuff_data_tux #what should have been
CuffSet instance with:
2 samples
14353 genes
26464 isoforms
17442 TSS
13727 CDS
14353 promoters
17442 splicing
11372 relCDS
For diagnostic purposes, these are the number of fragments that map to each chromosome with my settings:
C1_R1_thout/accepted_hits.bam
2L 23011544 4657196 0
2R 21146708 4974320 0
3L 24543557 4055890 0
3R 27905053 5353776 0
4 1351857 202238 0
M 19517 0 0
X 22422827 4164550 0
* 0 0 0
(..)
Any help is appreciated.
Like many I am trying to replicate the TopHat/cufflinks/Cumerbund protocol recently published and after sorting out a series of errors, I've finally managed to see the results with Cumerbund. The problem is that these do not match those of the paper. Could you please help me identifying where the problem might come from?
Software/script - underlined are the changes that were made to the original protocol so that it could run:
bowtie 0.12.7
tophat 2.0.3
tophat --bowtie1 -p 20 -G genes.gtf -r 180 -o genome C1_R1_1.fq C1_R1_2.fq
(..)
cufflinks-1.2.1
cufflinks -p 20 -o C1_R1_clout C1_R1_thout/accepted_hits.bam
cuffmerge -g genes.gtf -s genome.fa -p 20 assemblies.txt
cuffdiff -o diff_out -b genome.fa -p 20 -L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam
Results
The distribution of expression levels for each sample is:
But should have been (from the paper):
The number of genes and transcripts that are differentially expressed between two samples are also different:
> cuff_data
CuffSet instance with:
2 samples
2663 genes
4376 isoforms
3086 TSS
1427 CDS
2663 promoters
3086 splicing
1304 relCDS
> gene_diff_data <- diffData(genes(cuff_data))
> sig_gene_data <- subset(gene_diff_data, (significant == 'yes'))
> nrow(sig_gene_data)
[1] 1117
> cuff_data_tux <- readCufflinks('/home/domingue/TuxedoTestData/diff_out')
> cuff_data #my analysis
CuffSet instance with:
2 samples
2663 genes
4376 isoforms
3086 TSS
1427 CDS
2663 promoters
3086 splicing
1304 relCDS
> cuff_data_tux #what should have been
CuffSet instance with:
2 samples
14353 genes
26464 isoforms
17442 TSS
13727 CDS
14353 promoters
17442 splicing
11372 relCDS
For diagnostic purposes, these are the number of fragments that map to each chromosome with my settings:
C1_R1_thout/accepted_hits.bam
2L 23011544 4657196 0
2R 21146708 4974320 0
3L 24543557 4055890 0
3R 27905053 5353776 0
4 1351857 202238 0
M 19517 0 0
X 22422827 4164550 0
* 0 0 0
(..)
Any help is appreciated.
Comment