I am using Tophat to analyze illumina HiSeq2000 paired end read data. I have noticed that during the initial execution, Tophat1(and 2) "converts the reads" and then sorts the left reads into kept and discarded groups (e.g. 8,000,012 kept, 10,121 discarded) and does the same for the right reads (e.g. 7,804,000 kept, 206133 discarded). Since there are a different number of discarded reads, I'm assuming that "lone" mates are treated as single reads.
My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?
My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?
Comment