Dear all,
I am currently working with RNAseq data obtained from different strains of Plasmodium parasite. We are mainly interested in differential expression between the strains. After quality trimming of the reads, I planned to align reads on Plasmodium genome with TopHat2 or HISAT2 (still not yet decided, I am comparing both), and to use HTseq-count for counting.
Alignment works quite correctly, but I have a lot of reads with multiple alignments (> 40% with MAPQ < 10) and I am not sure to really understand how to handle them for the downstream analysis. Based on what I read, I first understood that for differential expression analysis, it would be better to keep multi-mapped reads for counting, but I found that multireads are not counted with HTseq-count. Do I have to force HTseq-count to count them (remove NH optional flag in .bam file)? Is there any way to reduce the levels of multi-mapped reads?
Thank you very much for any help.
I am currently working with RNAseq data obtained from different strains of Plasmodium parasite. We are mainly interested in differential expression between the strains. After quality trimming of the reads, I planned to align reads on Plasmodium genome with TopHat2 or HISAT2 (still not yet decided, I am comparing both), and to use HTseq-count for counting.
Alignment works quite correctly, but I have a lot of reads with multiple alignments (> 40% with MAPQ < 10) and I am not sure to really understand how to handle them for the downstream analysis. Based on what I read, I first understood that for differential expression analysis, it would be better to keep multi-mapped reads for counting, but I found that multireads are not counted with HTseq-count. Do I have to force HTseq-count to count them (remove NH optional flag in .bam file)? Is there any way to reduce the levels of multi-mapped reads?
Thank you very much for any help.
Comment