I am trying to use htseq-count to count the occurences of features (taken from my GFF file) in my BAM file:
I use the std htseq-count command viz;
I get a series of error messages telling me that my reads are all skipped as they don't occur in my GFF files:
The following is the first 10 lines of the BAM file:
10 lines of GFF file:
All ID attributes in my GFF file seem to match the BAM headers. Not sure if my untrained eye is missing something. Any suggestions? How should I modify the GFF file?
Thanks
Siva
I use the std htseq-count command viz;
Code:
samtools view file.bam | htseq-count [options] - file.gff
Code:
Warning: Skipping read 'HWI-ST1085:118:C1ALWACXX:1:2213:19460:44494', because chromosome 'chromosome:AGPv2:1:1:301354135:1', to which it has been aligned, did not appear in the GFF file'.
Code:
HWI-ST1085:118:C1ALWACXX:1:1105:4877:50025 0 chromosome:AGPv2:1:1:301354135:1 3412 50 100M * 0 0 TGAAGTATAAGGCAACCCAAGTCTGCCATCATCTCTTTCTCGTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAG &&&(((((*****++++++++*++++++++++++++++++++++++++++++++++++++++++++++++++****((((((''&''&&&&&&&&&&$&$ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1104:1183:70980 0 chromosome:AGPv2:1:1:301354135:1 3444 50 100M * 0 0 CTCTTTCTCGTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAG &&&((((&*)***)+++++++++++*+++))++&*+)(&$)++)+))++*++&*+)+*+++++++&***)*((((&%%%&&&&&&&&&&$%"$!""&&&& AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1111:2149:77593 0 chromosome:AGPv2:1:1:301354135:1 3444 50 100M * 0 0 CTCTTTCTCGTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAG &&&(((((*****++++++++++++++++++++++++++++++++++++++++++++++++++++******((((&&&&&&&&&&&&&&&&&&$%&&&'& AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:2313:11306:75258 0 chromosome:AGPv2:1:1:301354135:1 3446 50 100M * 0 0 CTTTCTCGTTGAGCTGAAGCGTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTC $$$&&&&&&&&&!!"''+(+!!$&''++!$'&'++++'+++++'+++'+++++++'&++++&&&&&&&&&&%%$$$$$$$%$$$%$%#$$$###%%&%%# AS:i:-4 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:12A7C79 YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1214:18295:59994 0 chromosome:AGPv2:1:1:301354135:1 3450 50 100M * 0 0 CTCGTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTCCGCT $$$&&&&%(%(((%"&"%"&"('"!"'&)(+!$&%(("&""((&!%))+'(%!$((""#!!#&!$!!"%&$""!!!"!""#!#$""""%"%#%%$!!!!! AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:96GYT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1312:13622:34439 0 chromosome:AGPv2:1:1:301354135:1 3453 50 100M * 0 0 GTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTCGGCTAGT &&&(((((****)++++++++++++++++++++++++++++++++++++++++++++)++++++++***(((&&&&&&&&&&&&$%&&&'&&&&&&&&&$ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:2112:2965:72104 0 chromosome:AGPv2:1:1:301354135:1 3453 50 100M * 0 0 GTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTCGGCTAGT &%&(((((*****+++++++++++++++++++++++*+++++++++++++++++++++++++++++***(((&&&&&&&&&&&&#%%&&'&&&&&&$%&" AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:2314:17254:77725 0 chromosome:AGPv2:1:1:301354135:1 3453 50 100M * 0 0 GTTGAACTGAAGCCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTCGGCTGGT &&&(((((*****+++++++++++++++++++++++)+++++++++++++++++++))++++*+++***(((&&&&&&&&&&&&%%&&&&&&&&&&%!!! AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:97AYT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1212:1945:32241 0 chromosome:AGPv2:1:1:301354135:1 3465 50 100M * 0 0 CCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCGGCGGTAGCTAGTCGGCTAGTCCATTGACTGGC %&&((&&&*(*))++++*+++++++++++)+++'++')++)+)+%&(*'*+++)++))'***(%&''&"$#$!#%&&&&&&$&&&$%#&&"%&&&&#!"% AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:67A32 YT:Z:UU NH:i:1 HWI-ST1085:118:C1ALWACXX:1:1109:10979:8471 16 chromosome:AGPv2:1:1:301354135:1 3465 50 100M * 0 0 CCTGTCCATGCCTCCATGGCCCAGTCCAGCATCATCGCCAATCAGAGCTGAGGGCAGCCGCAGAGCCAGCGGTAGCTAGTCGGCTAGTCCATTGACTGGC &&&&&&&&&&&&&&&&&&%&&&&&&&&&&&(&&&&&&''''''(((((*****++++++++++++++++++++++++++++++++++*****(((((&&& AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100YT:Z:UU NH:i:1
Code:
1 ensembl chromosome 1 301354135 . . . ID=1;Name=chromosome:AGPv2:1:1:301354135:1 1 ensembl gene 4854 9652 . - . ID=GRMZM2G059865;Name=GRMZM2G059865;biotype=protein_coding 1 ensembl mRNA 4854 9652 . - . ID=GRMZM2G059865_T01;Parent=GRMZM2G059865;Name=GRMZM2G059865_T01;biotype=protein_coding 1 ensembl intron 7904 9192 . - . Parent=GRMZM2G059865_T01;Name=intron.71462 1 ensembl intron 7121 7593 . - . Parent=GRMZM2G059865_T01;Name=intron.71463 1 ensembl intron 6798 6917 . - . Parent=GRMZM2G059865_T01;Name=intron.71464 1 ensembl intron 6518 6638 . - . Parent=GRMZM2G059865_T01;Name=intron.71465 1 ensembl intron 6266 6361 . - . Parent=GRMZM2G059865_T01;Name=intron.71466 1 ensembl intron 5976 6107 . - . Parent=GRMZM2G059865_T01;Name=intron.71467 1 ensembl intron 5408 5856 . - . Parent=GRMZM2G059865_T01;Name=intron.71468 1 ensembl intron 5189 5341 . - . Parent=GRMZM2G059865_T01;Name=intron.71469
Thanks
Siva
Comment