Hi All,
I used HTSeq-count on a .sam file, while providing an annotation in the GTF format. The program htseq-count does not seem to have run properly. I might be wrong. The thing is that I am getting FPKM values for some genes, whereas htseq-count comes up with a zero value. I sorted the Tophat output on the basis of "read id" and then used HTSeq-count with a drosophila GTF file. The gene expression output indicated that a lot of reads were not included. Also the gene expression "raw read count" was zero for a lot of genes, which had an FPKM value using Cufflinks. Here is what my GTF file looks like
Here are a few lines from the .sam file
Here is the python command I gave
Cufflinks output
HTSeq-count output
As you can see the cufflinks and htseq-count output are not coherent. there are genes missing in cufflinks while "FBGn0000003" shows a value in cufflinks but zero in htseq. Maybe Dr. Simon Anders can help with this. I cannot understand. Furthermore is the formatting of my GTF file correct?
I used HTSeq-count on a .sam file, while providing an annotation in the GTF format. The program htseq-count does not seem to have run properly. I might be wrong. The thing is that I am getting FPKM values for some genes, whereas htseq-count comes up with a zero value. I sorted the Tophat output on the basis of "read id" and then used HTSeq-count with a drosophila GTF file. The gene expression output indicated that a lot of reads were not included. Also the gene expression "raw read count" was zero for a lot of genes, which had an FPKM value using Cufflinks. Here is what my GTF file looks like
Code:
chr4 FlyBase exon 1144800 1144957 . + . gene_id "FBgn0013749"; transcript_id "FBtr0089192"; exon_number "1"; gene_name "Arf102F"; parent_type=mRNA; chr4 FlyBase exon 1145062 1145324 . + . gene_id "FBgn0013749"; transcript_id "FBtr0089192"; exon_number "2"; gene_name "Arf102F"; parent_type=mRNA; chr4 FlyBase exon 1145394 1145519 . + . gene_id "FBgn0013749"; transcript_id "FBtr0089192"; exon_number "3"; gene_name "Arf102F"; parent_type=mRNA; chr4 FlyBase exon 1145576 1145807 . + . gene_id "FBgn0013749"; transcript_id "FBtr0089192"; exon_number "4"; gene_name "Arf102F"; parent_type=mRNA; chr4 FlyBase exon 1135804 1136520 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "12"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1135804 1136520 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "12"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1136582 1136705 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "11"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1136582 1136705 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "11"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1136952 1137082 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "10"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1136952 1137082 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "10"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1137141 1137224 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "9"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1137141 1137224 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "9"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1137286 1137840 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "8"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1137286 1137840 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "8"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1138477 1138531 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "7"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1138477 1138531 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "7"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1138594 1139643 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "6"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1138594 1139643 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "6"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1140563 1140673 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089207"; exon_number "5"; gene_name "cals"; parent_type=mRNA; chr4 FlyBase exon 1140563 1140673 . - . gene_id "FBgn0039928"; transcript_id "FBtr0089208"; exon_number "5"; gene_name "cals"; parent_type=mRNA;
Code:
HWI-EAS313:1:1:5:36#0 0 chr3R 1291390 255 42M * 0 0 TTTTACTCCTGCAGGTCAGTAATTCAAGTCGGATATTAACTT BCA:;BCBCCCB.?A:A:@:3=7BC@9B/?BB+?)<?5=ABA NM:i:0 NH:i:1 HWI-EAS313:1:1:5:239#0 16 chr3L 16783628 255 42M * 0 0 CGCCGGACATGCCAGCGGCGAAGTTTCGTCCCGTAAGGCCGA BA?BBB@B@7=BAABBAA@BBBB9+=B@9@ABBBBBB@=ABA NM:i:1 NH:i:1 HWI-EAS313:1:1:5:462#0 16 chr3R 19011622 255 42M * 0 0 CAAGGAGATTCTTGGTATCTAAGCACTCCTCTTACCAAATTT ###B95+>5:>)+=B>-.;1<5A5)7@AC?>::=?A=>B9A> NM:i:2 NH:i:1 HWI-EAS313:1:1:5:480#0 16 chr3R 8845667 255 42M * 0 0 CAAAGGAGGATCTTGAGAGGTTCAAAGCTTTACGCATTATCG ?BCBCC7CBBCBCBCBCA=ACACBBCB@37C;C<BB@BC?CB NM:i:1 NH:i:1 HWI-EAS313:1:1:5:501#0 16 chr3R 25748393 1 42M * 0 0 CCCAGCTGCACCTGCCGCGACCAGGACTACGCCGGATGGTGG ##;39;.,51,1%0@==;(:7>?A@6@/-><AA@A=78>5;A NM:i:5 NH:i:4 CC:Z:= CP:i:25749839 HWI-EAS313:1:1:5:501#0 0 chr3R 25749839 1 42M * 0 0 CCACCATCCGGCGTAGTCCTGGTCGCGGCAGGTGCAGCTGGG A;5>87=A@AA<>-/@6@A?>7:(;==@0%1,15,.;93;## NM:i:5 NH:i:4 CC:Z:= CP:i:26313149 HWI-EAS313:1:1:5:501#0 0 chr3R 26313149 1 42M * 0 0 CCACCATCCGGCGTAGTCCTGGTCGCGGCAGGTGCAGCTGGG A;5>87=A@AA<>-/@6@A?>7:(;==@0%1,15,.;93;## NM:i:5 NH:i:4 CC:Z:= CP:i:26315145 HWI-EAS313:1:1:5:501#0 16 chr3R 26315145 1 42M * 0 0 CCCAGCTGCACCTGCCGCGACCAGGACTACGCCGGATGGTGG ##;39;.,51,1%0@==;(:7>?A@6@/-><AA@A=78>5;A NM:i:5 NH:i:4 HWI-EAS313:1:1:5:643#0 0 chr3R 25748461 3 42M * 0 0 CTACGATGGCAGCCCACTGCCCGACTGGCTCCAGTCCGTCGA @B9BBBB@BBBBA@ABB9BBBBB<BBBBBAB@==?@A<#### NM:i:0 NH:i:2 CC:Z:= CP:i:25749771 HWI-EAS313:1:1:5:643#0 16 chr3R 25749771 3 42M * 0 0 TCGACGGACTGGAGCCAGTCGGGCAGTGGGCTGCCATCGTAG ####<A@?==@BABBBBB<BBBBB9BBA@ABBBB@BBBB9B@ NM:i:0 NH:i:2 HWI-EAS313:1:1:5:761#0 16 chr3L 16031864 255 42M * 0 0 CATGACCGCATGGCAGGAATGCCGTATTGTACTCGGCGCCGT A@<?=AAABAABA=ABAB@BB4@AAABA;<>??7@=@3AA@B NM:i:0 NH:i:1 HWI-EAS313:1:1:5:871#0 16 chrX 5627098 255 42M * 0 0 CGTTTATCCTGCGGATCGATTGCGGTGCTATCAGTGCAGCGG =CC@;A@=@@CCCC>CACCAC@@BBABBAACBCCCCCCCBCB NM:i:0 NH:i:1 HWI-EAS313:1:1:5:1080#0 16 chr3L 3195955 255 42M * 0 0 AGGTCANCAACTCTGCCTTCGTGGAGCGCGTCAAGGCCCGTG >@@=9<%8>A@;<@@B@;@922BBB@<@@A@>@BBA?@8@AB NM:i:1 NH:i:1 HWI-EAS313:1:1:5:1163#0 16 chr3L 14394814 255 42M * 0 0 AAAGATCGGCATCCATCTGAGCAATGGCACCGGCAATTCTGT ##@<=<ACBBB;4@ABB;*<ACC7;;CC,BBCC?B;AABCCB NM:i:3 NH:i:1
Code:
python -m HTSeq.scripts.count accepted_hits.sam ../ALLEXONS.gtf > output.out
Code:
gene_id S1 FBgn0000003 208.127 FBgn0000008 2.84909 FBgn0000014 8.2101 FBgn0000015 3.71491 FBgn0000017 4.53253 FBgn0000018 4.94072 FBgn0000024 5.39896 FBgn0000032 18.2969 FBgn0000036 0.615742 FBgn0000037 0.651662 FBgn0000038 1.6011 FBgn0000039 1.12615
Code:
FBgn0000003 0 FBgn0000008 52 FBgn0000014 157 FBgn0000015 56 FBgn0000017 123 FBgn0000018 36 FBgn0000022 5 FBgn0000024 85 FBgn0000028 0 FBgn0000032 144
Comment