Hello everyone. I have a single library of ~ 30 million paired end reads. After trimming primer sequences and adaptors (it was a highly amplified library from RNA), about half of my reads become single ended (I lose read 2 which is mostly primers). When I use tophat2 to try and map the PE and orphan SE reads simultaneously I get the following mapping stats.
Left reads:
Input : 15693801
Mapped : 15021980 (95.7% of input)
of these : 4774963 (31.8%) have multiple alignments (1996947 have >20)
Right reads:
Input : 15693801
Mapped : 14981893 (95.5% of input)
of these : 4767368 (31.8%) have multiple alignments (1996950 have >20)
Unpaired reads:
Input : 12548124
Mapped : 11939794 (95.2% of input)
of these : 4816113 (40.3%) have multiple alignments (83 have >20)
95.5% overall read mapping rate.
Aligned pairs: 14492245
of these : 4742324 (32.7%) have multiple alignments
1998194 (13.8%) are discordant alignments
79.6% concordant pair alignment rate.
Then, if i map the paired end reads and single end reads in separate tophat2 runs, I get different mapping stats (the biggest difference is the discordant alignments)
Left reads:
Input : 15693801
Mapped : 15021392 (95.7% of input)
of these : 4844003 (32.2%) have multiple alignments (2137068 have >20)
Right reads:
Input : 15693801
Mapped : 14981461 (95.5% of input)
of these : 4836357 (32.3%) have multiple alignments (2137069 have >20)
95.6% overall read mapping rate.
Aligned pairs: 14491403
of these : 4812629 (33.2%) have multiple alignments
2107050 (14.5%) are discordant alignments
78.9% concordant pair alignment rate.
and here are the separate SE stats (also slightly different)
Reads:
Input : 12548124
Mapped : 11937315 (95.1% of input)
of these: 4800413 (40.2%) have multiple alignments (35 have >20)
95.1% overall read mapping rate.
Why should there be any difference if they are mapped in the same run versus separately? (mapped with same reference and same commands and done as per the tophat2 manual by adding the SE reads after a comma to one of the two paired end read files)
Evan
Left reads:
Input : 15693801
Mapped : 15021980 (95.7% of input)
of these : 4774963 (31.8%) have multiple alignments (1996947 have >20)
Right reads:
Input : 15693801
Mapped : 14981893 (95.5% of input)
of these : 4767368 (31.8%) have multiple alignments (1996950 have >20)
Unpaired reads:
Input : 12548124
Mapped : 11939794 (95.2% of input)
of these : 4816113 (40.3%) have multiple alignments (83 have >20)
95.5% overall read mapping rate.
Aligned pairs: 14492245
of these : 4742324 (32.7%) have multiple alignments
1998194 (13.8%) are discordant alignments
79.6% concordant pair alignment rate.
Then, if i map the paired end reads and single end reads in separate tophat2 runs, I get different mapping stats (the biggest difference is the discordant alignments)
Left reads:
Input : 15693801
Mapped : 15021392 (95.7% of input)
of these : 4844003 (32.2%) have multiple alignments (2137068 have >20)
Right reads:
Input : 15693801
Mapped : 14981461 (95.5% of input)
of these : 4836357 (32.3%) have multiple alignments (2137069 have >20)
95.6% overall read mapping rate.
Aligned pairs: 14491403
of these : 4812629 (33.2%) have multiple alignments
2107050 (14.5%) are discordant alignments
78.9% concordant pair alignment rate.
and here are the separate SE stats (also slightly different)
Reads:
Input : 12548124
Mapped : 11937315 (95.1% of input)
of these: 4800413 (40.2%) have multiple alignments (35 have >20)
95.1% overall read mapping rate.
Why should there be any difference if they are mapped in the same run versus separately? (mapped with same reference and same commands and done as per the tophat2 manual by adding the SE reads after a comma to one of the two paired end read files)
Evan
Comment