Hi there,
I've run tophat (version 2.0.4 and encode gtf) to a paired-end 50bp RNA-Seq data . The fastq files shows the total number of reads is 36,497,412, after tophat run, I used samtools to check the bam file, I got:
samtools flagstat accepted_hits.bam
38,502,265 in total
0 QC failure
0 duplicates
38502265 mapped (100.00%)
38502265 paired in sequencing
19645891 read1
18856374 read2
17808682 properly paired (46.25%)
32180420 with itself and mate mapped
6321845 singletons (16.42%)
4384082 with mate mapped to a different chr
301920 with mate mapped to a different chr (mapQ>=5)
My questions are:
1. Why is the total number of reads (38,502,265) from bam file is larger than my original library size (36,497,41)?
2. The flagstat shows that there is no unmapped reads. In this case, what are the proper ways to calculate the percentage of unmapped reads?
Thanks very much. Any suggestions are welcome.
I've run tophat (version 2.0.4 and encode gtf) to a paired-end 50bp RNA-Seq data . The fastq files shows the total number of reads is 36,497,412, after tophat run, I used samtools to check the bam file, I got:
samtools flagstat accepted_hits.bam
38,502,265 in total
0 QC failure
0 duplicates
38502265 mapped (100.00%)
38502265 paired in sequencing
19645891 read1
18856374 read2
17808682 properly paired (46.25%)
32180420 with itself and mate mapped
6321845 singletons (16.42%)
4384082 with mate mapped to a different chr
301920 with mate mapped to a different chr (mapQ>=5)
My questions are:
1. Why is the total number of reads (38,502,265) from bam file is larger than my original library size (36,497,41)?
2. The flagstat shows that there is no unmapped reads. In this case, what are the proper ways to calculate the percentage of unmapped reads?
Thanks very much. Any suggestions are welcome.
Comment