Originally posted by tangx_2010
View Post
2. How does HTSeq treats multiple mapped reads? Are they just counted several times by different genes?
In general, a multiply aligned read should be discarded. (Imagine, genes A and B have partial sequence identity. If A is differentially expressed and B is not, any read originating from A that matches to both A and B will let B appear as differentially expressed, too, if it is counted for both. Hence, the prudent strategy is to only count reads that map uniquely to a gene.)
For now, HTSeq looks for the "NH" optional flag. If it indicates that more than one alignment is reported, the read is not counted. If you use the "--minaqual" option, you can also cause all reads with low alignment quality to be skipped, which is another way how some aligners tag multiple alignments. If neither of the two works for you, you should pre-filter the SAM file. It is easy to write such a filtering script with HTSeq.
Comment