To run fast and test the result, I splited one paired-end RNAseq fastq to eight pairs, then I aligned the data to unmasked genome reference using HISAT and Tophat2 respectively. It was so strange that the HISAT results of high RNA sequencing amount and low one were so different.
####### Below are the HISAT results for one splited paired end fastq: (The other seven results were as similar as below.)
begin [Thu Aug 27 09:25:07 CST 2015]
6743045 reads; of these:
6743045 (100.00%) were paired; of these:
572503 (8.49%) aligned concordantly 0 times
5478092 (81.24%) aligned concordantly exactly 1 time
692450 (10.27%) aligned concordantly >1 times
----
572503 pairs aligned concordantly 0 times; of these:
35260 (6.16%) aligned discordantly 1 time
----
537243 pairs aligned 0 times concordantly or discordantly; of these:
1074486 mates make up the pairs; of these:
937868 (87.29%) aligned 0 times
98655 (9.18%) aligned exactly 1 time
37963 (3.53%) aligned >1 times
93.05% overall alignment rate
finish [Thu Aug 27 09:26:18 CST 2015]
######## Below are the HISAT results of raw paired end fastqs:
begin [Thu Aug 27 15:32:23 CST 2015]
53944360 reads; of these:
53944360 (100.00%) were paired; of these:
53859198 (99.84%) aligned concordantly 0 times
173 (0.00%) aligned concordantly exactly 1 time
84989 (0.16%) aligned concordantly >1 times
----
53859198 pairs aligned concordantly 0 times; of these:
42847397 (79.55%) aligned discordantly 1 time
----
11011801 pairs aligned 0 times concordantly or discordantly; of these:
22023602 mates make up the pairs; of these:
7838984 (35.59%) aligned 0 times
14547 (0.07%) aligned exactly 1 time
14170071 (64.34%) aligned >1 times
92.73% overall alignment rate
finish [Thu Aug 27 15:57:01 CST 2015]
###############
The HISAT of splited paired end fastq seemed not bad. But I was shocked when I saw the HISAT results of raw paired end fastqs. Why was the discordant alignment rate so high? The input data were same, and the difference was just that the former was raw fastq and the the latter were the splited fastq.
PS:
########## The Tophat2 result for one of splited paired end fastqs
Left reads:
Input : 6743045
Mapped : 6257213 (92.8% of input)
of these: 537537 ( 8.6%) have multiple alignments (4996 have >20)
Right reads:
Input : 6743045
Mapped : 6234936 (92.5% of input)
of these: 535149 ( 8.6%) have multiple alignments (4982 have >20)
92.6% overall read mapping rate.
Aligned pairs: 6153025
of these: 526916 ( 8.6%) have multiple alignments
60596 ( 1.0%) are discordant alignments
90.4% concordant pair alignment rate.
############The result of Tophat2 for raw paired end fastqs:
Left reads:
Input : 53944360
Mapped : 49867482 (92.4% of input)
of these: 3252071 ( 6.5%) have multiple alignments (50305 have >20)
Right reads:
Input : 53944360
Mapped : 49672436 (92.1% of input)
of these: 3237335 ( 6.5%) have multiple alignments (50373 have >20)
92.3% overall read mapping rate.
Aligned pairs: 48962272
of these: 3172613 ( 6.5%) have multiple alignments
427349 ( 0.9%) are discordant alignments
90.0% concordant pair alignment rate.
####### Below are the HISAT results for one splited paired end fastq: (The other seven results were as similar as below.)
begin [Thu Aug 27 09:25:07 CST 2015]
6743045 reads; of these:
6743045 (100.00%) were paired; of these:
572503 (8.49%) aligned concordantly 0 times
5478092 (81.24%) aligned concordantly exactly 1 time
692450 (10.27%) aligned concordantly >1 times
----
572503 pairs aligned concordantly 0 times; of these:
35260 (6.16%) aligned discordantly 1 time
----
537243 pairs aligned 0 times concordantly or discordantly; of these:
1074486 mates make up the pairs; of these:
937868 (87.29%) aligned 0 times
98655 (9.18%) aligned exactly 1 time
37963 (3.53%) aligned >1 times
93.05% overall alignment rate
finish [Thu Aug 27 09:26:18 CST 2015]
######## Below are the HISAT results of raw paired end fastqs:
begin [Thu Aug 27 15:32:23 CST 2015]
53944360 reads; of these:
53944360 (100.00%) were paired; of these:
53859198 (99.84%) aligned concordantly 0 times
173 (0.00%) aligned concordantly exactly 1 time
84989 (0.16%) aligned concordantly >1 times
----
53859198 pairs aligned concordantly 0 times; of these:
42847397 (79.55%) aligned discordantly 1 time
----
11011801 pairs aligned 0 times concordantly or discordantly; of these:
22023602 mates make up the pairs; of these:
7838984 (35.59%) aligned 0 times
14547 (0.07%) aligned exactly 1 time
14170071 (64.34%) aligned >1 times
92.73% overall alignment rate
finish [Thu Aug 27 15:57:01 CST 2015]
###############
The HISAT of splited paired end fastq seemed not bad. But I was shocked when I saw the HISAT results of raw paired end fastqs. Why was the discordant alignment rate so high? The input data were same, and the difference was just that the former was raw fastq and the the latter were the splited fastq.
PS:
########## The Tophat2 result for one of splited paired end fastqs
Left reads:
Input : 6743045
Mapped : 6257213 (92.8% of input)
of these: 537537 ( 8.6%) have multiple alignments (4996 have >20)
Right reads:
Input : 6743045
Mapped : 6234936 (92.5% of input)
of these: 535149 ( 8.6%) have multiple alignments (4982 have >20)
92.6% overall read mapping rate.
Aligned pairs: 6153025
of these: 526916 ( 8.6%) have multiple alignments
60596 ( 1.0%) are discordant alignments
90.4% concordant pair alignment rate.
############The result of Tophat2 for raw paired end fastqs:
Left reads:
Input : 53944360
Mapped : 49867482 (92.4% of input)
of these: 3252071 ( 6.5%) have multiple alignments (50305 have >20)
Right reads:
Input : 53944360
Mapped : 49672436 (92.1% of input)
of these: 3237335 ( 6.5%) have multiple alignments (50373 have >20)
92.3% overall read mapping rate.
Aligned pairs: 48962272
of these: 3172613 ( 6.5%) have multiple alignments
427349 ( 0.9%) are discordant alignments
90.0% concordant pair alignment rate.
Comment