I run tophat2 with “-g 20" on some data. As a result, the "align_summary.txt" says
Then I run some manual checking on the numbers by using samtools on the output "accepted_hits.bam" and "unmapped.bam". I tried
Result is 12385846, looks close enough, although not perfect (12385846 + 79916185 = 92302031).
Then I tried command (a)
Result is 15255480
I also tried command (b)
Result is 13013262
My understanding about tophat's behavior is that when there are multiple equally good hits, it will randomly pick one hit as "best", and all other hits will be assigned the 256 bit flag (secondary hits). So presumably, if the number 17726355 in "align_summary.txt" is the number of reads, then it should be similar to the result of (b). Or if 17726355 in "align_summary.txt" is the number of hits, it should be at least 2 times the result of (a). But none of my two guesses are correct.
So my question is, what exactly does the number 17726355 means.
Please let me know if I missed anything. Thanks all for any comments or suggestions!
jianrong
Reads:
Input: 92302015
Mapped: 79916185 (86.6% of input)
of these: 17726355 (22.2%) have multiple alignments (22103 have >20)
86.6% overall read alignment rate.
Input: 92302015
Mapped: 79916185 (86.6% of input)
of these: 17726355 (22.2%) have multiple alignments (22103 have >20)
86.6% overall read alignment rate.
Code:
samtools view unmapped.bam | wc -l
Then I tried command (a)
Code:
samtools view -f 256 accepted_hits.bam | wc -l
I also tried command (b)
Code:
samtools view -f 256 accepted_hits.bam | cut -f1 | sort | uniq | wc -l
My understanding about tophat's behavior is that when there are multiple equally good hits, it will randomly pick one hit as "best", and all other hits will be assigned the 256 bit flag (secondary hits). So presumably, if the number 17726355 in "align_summary.txt" is the number of reads, then it should be similar to the result of (b). Or if 17726355 in "align_summary.txt" is the number of hits, it should be at least 2 times the result of (a). But none of my two guesses are correct.
So my question is, what exactly does the number 17726355 means.
Please let me know if I missed anything. Thanks all for any comments or suggestions!
jianrong
Comment