Hello,
i have been spent many hours counting the reads from tophat output file, accepted_hits.bam. before tophat mapping, bowtie2-bulid was used to bulid index of genome.
when i use HTseq-count to count the bam, I try a lot of ways, including sort the file, cut off the last number (_0 or _1) of paired-end reads name ,but warning message still appear.
frist, i use samtools to check the status of bam file,
>samtools flagstat accepted_hits.bam
1114298 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1114298 + 0 mapped (100.00%:-nan%)
1114298 + 0 paired in sequencing
1049578 + 0 read1
64720 + 0 read2
30176 + 0 properly paired (2.71%:-nan%)
60650 + 0 with itself and mate mapped
1053648 + 0 singletons (94.56%:-nan%)
29714 + 0 with mate mapped to a different chr
23484 + 0 with mate mapped to a different chr (mapQ>=5)
i use samtools to sort and convert bam file, and get sam file
HTseq-count -s no accepted_sorted.sam zebrafish.gtf > htcount
Warning: Read HWI-ST507_74_2_68_21290_5843_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21292_174160_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21350_3889_0_1_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
...
...
...
Warning: Read HWI-ST507_74_2_68_21353_180091_0_0_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21353_180091_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
1114298 sam line pairs processed.
i check the message of HWI-ST507_74_2_68_21353_180091 read,
HWI-ST507_74_2_68_21353_180091_0_0_1 83 GL831154.1 4572842 50 81M = 4572839 -84 TGATCAGGTGCTATTAAAGCATAGCTATTGACCGAGTATCTGCATGGTGGCAGCCTTTCCAAAGCTGGACTCGTCCCTTTT BB_VY_^^`R`Z`[WbU`M[^K]HUP^NSUKV[KZWPZG]OTZYSPOXVO^^]Ra]WS]Y]ZPKXFLMPXFXPMQF[^^aa AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:53T21T5 YT:Z:UU NH:i:1
HWI-ST507_74_2_68_21353_180091_0_0_2 163 GL831154.1 4572839 50 47M = 4572842 84 GCTTGATCAGGTGCTATTAAAGGATAGCTATTGACCGAGTATCTGCT b^ab`Ma^]bb_aYb]bcbbJZKW\GWU\RbY^[]MaZ[_]\VZ`BB AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:22C23A0 YT:Z:UU NH:i:1
i see the same problem from previous thread, and i use perl script to change the id of reads, cutting off the _0 or _1.
i check the message of HWI-ST507_74_2_68_21353_180091 read,
HWI-ST507_74_2_68_21353_180091_0_0 83 GL831154.1 4572842 50 81M = 4572839 -84 TGATCAGGTGCTATTAAAGCATAGCTATTGACCGAGTATCTGCATGGTGGCAGCCTTTCCAAAGCTGGACTCGTCCCTTTT BB_VY_^^`R`Z`[WbU`M[^K]HUP^NSUKV[KZWPZG]OTZYSPOXVO^^]Ra]WS]Y]ZPKXFLMPXFXPMQF[^^aa AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:53T21T5 YT:Z:UU NH:i:1
HWI-ST507_74_2_68_21353_180091_0_0 163 GL831154.1 4572839 50 47M = 4572842 84 GCTTGATCAGGTGCTATTAAAGGATAGCTATTGACCGAGTATCTGCT b^ab`Ma^]bb_aYb]bcbbJZKW\GWU\RbY^[]MaZ[_]\VZ`BB AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:22C23A0 YT:Z:UU NH:i:1
and run HTSeq-count again and get the output of reads count, warning message still appear. But the result was differ from the last. this is so confusing
i have been spent many hours counting the reads from tophat output file, accepted_hits.bam. before tophat mapping, bowtie2-bulid was used to bulid index of genome.
when i use HTseq-count to count the bam, I try a lot of ways, including sort the file, cut off the last number (_0 or _1) of paired-end reads name ,but warning message still appear.
frist, i use samtools to check the status of bam file,
>samtools flagstat accepted_hits.bam
1114298 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1114298 + 0 mapped (100.00%:-nan%)
1114298 + 0 paired in sequencing
1049578 + 0 read1
64720 + 0 read2
30176 + 0 properly paired (2.71%:-nan%)
60650 + 0 with itself and mate mapped
1053648 + 0 singletons (94.56%:-nan%)
29714 + 0 with mate mapped to a different chr
23484 + 0 with mate mapped to a different chr (mapQ>=5)
i use samtools to sort and convert bam file, and get sam file
HTseq-count -s no accepted_sorted.sam zebrafish.gtf > htcount
Warning: Read HWI-ST507_74_2_68_21290_5843_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21292_174160_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21350_3889_0_1_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
...
...
...
Warning: Read HWI-ST507_74_2_68_21353_180091_0_0_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST507_74_2_68_21353_180091_0_0_2 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
1114298 sam line pairs processed.
i check the message of HWI-ST507_74_2_68_21353_180091 read,
HWI-ST507_74_2_68_21353_180091_0_0_1 83 GL831154.1 4572842 50 81M = 4572839 -84 TGATCAGGTGCTATTAAAGCATAGCTATTGACCGAGTATCTGCATGGTGGCAGCCTTTCCAAAGCTGGACTCGTCCCTTTT BB_VY_^^`R`Z`[WbU`M[^K]HUP^NSUKV[KZWPZG]OTZYSPOXVO^^]Ra]WS]Y]ZPKXFLMPXFXPMQF[^^aa AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:53T21T5 YT:Z:UU NH:i:1
HWI-ST507_74_2_68_21353_180091_0_0_2 163 GL831154.1 4572839 50 47M = 4572842 84 GCTTGATCAGGTGCTATTAAAGGATAGCTATTGACCGAGTATCTGCT b^ab`Ma^]bb_aYb]bcbbJZKW\GWU\RbY^[]MaZ[_]\VZ`BB AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:22C23A0 YT:Z:UU NH:i:1
i see the same problem from previous thread, and i use perl script to change the id of reads, cutting off the _0 or _1.
i check the message of HWI-ST507_74_2_68_21353_180091 read,
HWI-ST507_74_2_68_21353_180091_0_0 83 GL831154.1 4572842 50 81M = 4572839 -84 TGATCAGGTGCTATTAAAGCATAGCTATTGACCGAGTATCTGCATGGTGGCAGCCTTTCCAAAGCTGGACTCGTCCCTTTT BB_VY_^^`R`Z`[WbU`M[^K]HUP^NSUKV[KZWPZG]OTZYSPOXVO^^]Ra]WS]Y]ZPKXFLMPXFXPMQF[^^aa AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:53T21T5 YT:Z:UU NH:i:1
HWI-ST507_74_2_68_21353_180091_0_0 163 GL831154.1 4572839 50 47M = 4572842 84 GCTTGATCAGGTGCTATTAAAGGATAGCTATTGACCGAGTATCTGCT b^ab`Ma^]bb_aYb]bcbbJZKW\GWU\RbY^[]MaZ[_]\VZ`BB AS:i:-11 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:22C23A0 YT:Z:UU NH:i:1
and run HTSeq-count again and get the output of reads count, warning message still appear. But the result was differ from the last. this is so confusing
Comment