As I am running through an RNA-Seq pipeline using Hisat2 for alignment and HTSeq-count for counting reads in features I notice this warning at the bottom of the log file
Looking at the stats of the bam file that gave the HTSeq-count warnings using "samtools flagstat"
Now for the previous RNA-Seq pipeline on the same data, with the only difference being Tophat2 for alignment, I do not see this warning in the HTSeq-count log files.
Looking at the stats of the tophat2 aligned bam file that came from the same sample above.
I know this HTSeq-count warning is characteristic of unsorted bam files as I have run into that problem in the past. However, I made sure that I was still getting this warning even with name sorted files and making sure HTSeq-count was expecting name sorted files! I can see that in the hisat2 alignment, I did not have 100% mapping, which may explain the error - Why are these different? Both aligners were run with default settings.
Moreover, I am wondering how/why this warning occurs as I know HTSeq-count needs only paired or single alignments and cannot deal with both at the same time. Otherwise that is characteristic of this error message:
Although I see in both the Tophat2 and Hisat2 stats that there are singletons.
TLDR; Why isn't there 100% mapping in Hisat2 output alignments when there is 100% mapping for tophat2 output alignments using default settings for each?
Code:
Warning: 284233 reads with missing mate encountered.
Code:
76075665 + 0 in total (QC-passed reads + QC-failed reads) 1565341 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 71435955 + 0 mapped (93.90% : N/A) 74510324 + 0 paired in sequencing 37255162 + 0 read1 37255162 + 0 read2 64430312 + 0 properly paired (86.47% : N/A) 67187398 + 0 with itself and mate mapped 2683216 + 0 singletons (3.60% : N/A) 2452092 + 0 with mate mapped to a different chr 2095660 + 0 with mate mapped to a different chr (mapQ>=5)
Looking at the stats of the tophat2 aligned bam file that came from the same sample above.
Code:
85046681 + 0 in total (QC-passed reads + QC-failed reads) 18181171 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 85046681 + 0 mapped (100.00% : N/A) 66865510 + 0 paired in sequencing 34237825 + 0 read1 32627685 + 0 read2 16861294 + 0 properly paired (25.22% : N/A) 61704000 + 0 with itself and mate mapped 5161510 + 0 singletons (7.72% : N/A) 4055974 + 0 with mate mapped to a different chr 1899988 + 0 with mate mapped to a different chr (mapQ>=5)
I know this HTSeq-count warning is characteristic of unsorted bam files as I have run into that problem in the past. However, I made sure that I was still getting this warning even with name sorted files and making sure HTSeq-count was expecting name sorted files! I can see that in the hisat2 alignment, I did not have 100% mapping, which may explain the error - Why are these different? Both aligners were run with default settings.
Moreover, I am wondering how/why this warning occurs as I know HTSeq-count needs only paired or single alignments and cannot deal with both at the same time. Otherwise that is characteristic of this error message:
Code:
'pair_alignments' needs a sequence of paired-end alignments
TLDR; Why isn't there 100% mapping in Hisat2 output alignments when there is 100% mapping for tophat2 output alignments using default settings for each?