I am new to the Tuxedo Suite and decided to start by going through the workflow in the Nature Protocols publication. All seemed to be going well until I got to the point of differential expression analysis with Cuffdiff. There, the output file gene_exp.diff had NOTEST in the status column for almost all genes. I looked at the Cufflinks manual, which indicated that this output can result from not having enough alignments for testing in any given gene. I then used Samtools to be sure the .bam files that were generated by TopHat had a number of mapped reads that was comparable to the Nature Protocols example. The results, for all accepted_hits.bam files suggested that the number of mapped reads was very similar to what was expected according to the protocol. An example is below:
$ samtools idxstats C2_R3_thout/accepted_hits.bam
2L 23011544 4591000 0
2R 21146708 5029664 0
3L 24543557 4112826 0
3R 27905053 5317642 0
4 1351857 194612 0
M 19517 0 0
X 22422827 4157222 0
I then tried to use the Cuffdiff option -c, to reduce the minimum number of alignments needed at a locus from 10 to 5, but the resulting gene_exp.diff file still had NOTEST in the vast majority of genes.
In case it helps, the Cuffdiff command that I used was as follows:
$ cuffdiff -o diff_out -b ./Drosophila_melanogaster/Ensembl/BDGP5.25/Sequence/Bowtie2Index/genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam
If any additional information is necessary for you to help me troubleshoot this issue, please let me know.
$ samtools idxstats C2_R3_thout/accepted_hits.bam
2L 23011544 4591000 0
2R 21146708 5029664 0
3L 24543557 4112826 0
3R 27905053 5317642 0
4 1351857 194612 0
M 19517 0 0
X 22422827 4157222 0
I then tried to use the Cuffdiff option -c, to reduce the minimum number of alignments needed at a locus from 10 to 5, but the resulting gene_exp.diff file still had NOTEST in the vast majority of genes.
In case it helps, the Cuffdiff command that I used was as follows:
$ cuffdiff -o diff_out -b ./Drosophila_melanogaster/Ensembl/BDGP5.25/Sequence/Bowtie2Index/genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam
If any additional information is necessary for you to help me troubleshoot this issue, please let me know.
Comment