Aligned paired-end RNA-seq using Tophat ver. 1.4.0.
Used -z0 option for Tophat but it produced a different bam result.
input fastq file size : 4.1G * 2 for paired-end
==================
output 1. without -z0 option
1.3G : accepted_hits.bam
88K : deletions.bed
50K : insertions.bed
3.9M : junctions.bed
70 : left_kept_reads.info
4.0K : logs
70 : right_kept_reads.info
93M : unmapped_left.fq.z
113M : unmapped_right.fq.z
==================
output 2. with -z0 option
523M : accepted_hits.bam
55K : deletions.bed
31K : insertions.bed
2.6M : junctions.bed
70 : left_kept_reads.info
4.0K : logs
70 : right_kept_reads.info
2.7G : unmapped_left.fq
2.5G : unmapped_right.fq
==================
Then, I used samtools flagstat to look at bam file stats.
The first bam file(without -z0 option) reported:
The second bam file(with -z0 option) reported:
Does Tophat use different algorithm depend on -z0 option?
How come Tophat produces different bam files?
Thanks in advance!
Used -z0 option for Tophat but it produced a different bam result.
input fastq file size : 4.1G * 2 for paired-end
==================
output 1. without -z0 option
Code:
$ tophat -r 94 -p 8 -G genes.gtf genome s_1_1.fastq s_1_2.fastq
88K : deletions.bed
50K : insertions.bed
3.9M : junctions.bed
70 : left_kept_reads.info
4.0K : logs
70 : right_kept_reads.info
93M : unmapped_left.fq.z
113M : unmapped_right.fq.z
==================
output 2. with -z0 option
Code:
$ tophat -z0 -r 94 -p 8 -G genes.gtf genome s_1_1.fastq s_1_2.fastq
55K : deletions.bed
31K : insertions.bed
2.6M : junctions.bed
70 : left_kept_reads.info
4.0K : logs
70 : right_kept_reads.info
2.7G : unmapped_left.fq
2.5G : unmapped_right.fq
==================
Then, I used samtools flagstat to look at bam file stats.
The first bam file(without -z0 option) reported:
Code:
$ samtools flagstat accepted_hits.bam 57149297 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 57149297 + 0 mapped (100.00%:-nan%) 57149297 + 0 paired in sequencing 28845685 + 0 read1 28303612 + 0 read2 42295726 + 0 properly paired (74.01%:-nan%) 54359924 + 0 with itself and mate mapped 2789373 + 0 singletons (4.88%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
Code:
$ samtools flagstat accepted_hits.bam 20898015 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 20898015 + 0 mapped (100.00%:-nan%) 20898015 + 0 paired in sequencing 10912428 + 0 read1 9985587 + 0 read2 3175304 + 0 properly paired (15.19%:-nan%) 13747868 + 0 with itself and mate mapped 7150147 + 0 singletons (34.21%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
How come Tophat produces different bam files?
Thanks in advance!