Hi,
I've just found that a read starting with letter "N" in my sam file was marked as "no_feature" by htseq-count. I hacked that read by changing the "N" to the reference letter and also modifying the affected fields such as FLAG and MD, and got htseq-count worked with that read. My question is: does htseq-count discard any reads with "N"s in anywhere, or just those starting with "N"s? It looks like that in that read all other letters of the sequence were same with the reference, so I would rather keep that read.
I raised this question because by using FastQC I've found that many of my reads have "N"s in their tails (3'?). Will htseq-count output better counts if I trim those tails?
Thanks,
Sylvia
I've just found that a read starting with letter "N" in my sam file was marked as "no_feature" by htseq-count. I hacked that read by changing the "N" to the reference letter and also modifying the affected fields such as FLAG and MD, and got htseq-count worked with that read. My question is: does htseq-count discard any reads with "N"s in anywhere, or just those starting with "N"s? It looks like that in that read all other letters of the sequence were same with the reference, so I would rather keep that read.
I raised this question because by using FastQC I've found that many of my reads have "N"s in their tails (3'?). Will htseq-count output better counts if I trim those tails?
Thanks,
Sylvia
Comment