I have human RNA reads that I aligned against the human reference genome (GRCh 38) using BWA MEM and TopHat2. I now want to count the genes with HTSeq-count. Do I need to filter out the "non-proper pairs" beforehand? So that I only parse proper pairs into HTSeq-count? If so, how can I do that?
Samtools flagstats shows me that all bam files have ~100% mapped reads and the percentage of proper pairs is between 75-80%.
Samtools flagstats shows me that all bam files have ~100% mapped reads and the percentage of proper pairs is between 75-80%.