Hi all,
I am currently using HISAT2 to do read mapping. After having the mapped results, I am planning to only use reads aligned concordantly exactly 1 time (my data is paired-end RNA-seq data). I was wondering how does HISAT2 define reads aligned concordantly exactly 1 time?
Based on the website (https://ccb.jhu.edu/software/hisat2/manual.shtml), if I understand correctly, reads with YT:Z:CP and YT:Z:CP should be the reads aligned concordantly exactly 1 time. But the number I got doesn't match the number from HISAT2 log file. Below is the code I used and HISAT2 log file:
samtools view my_bam_file_after_mapping.bam | grep -e "\<NH:i:1\>" | grep -e "\<YT:Z:CP\>" > unique.mapped.sam
46673323 reads; of these:
46673323 (100.00%) were paired; of these:
17943238 (38.44%) aligned concordantly 0 times
22435343 (48.07%) aligned concordantly exactly 1 time
6294742 (13.49%) aligned concordantly >1 times
61.56% overall alignment rate
Here is the code I used to count the number of reads:
cat unique.mapped.sam | wc -l
The number I got is 53948500, which is a lot larger than 22435343.
Anyone had any idea on this one? Thank you very much!
I am currently using HISAT2 to do read mapping. After having the mapped results, I am planning to only use reads aligned concordantly exactly 1 time (my data is paired-end RNA-seq data). I was wondering how does HISAT2 define reads aligned concordantly exactly 1 time?
Based on the website (https://ccb.jhu.edu/software/hisat2/manual.shtml), if I understand correctly, reads with YT:Z:CP and YT:Z:CP should be the reads aligned concordantly exactly 1 time. But the number I got doesn't match the number from HISAT2 log file. Below is the code I used and HISAT2 log file:
samtools view my_bam_file_after_mapping.bam | grep -e "\<NH:i:1\>" | grep -e "\<YT:Z:CP\>" > unique.mapped.sam
46673323 reads; of these:
46673323 (100.00%) were paired; of these:
17943238 (38.44%) aligned concordantly 0 times
22435343 (48.07%) aligned concordantly exactly 1 time
6294742 (13.49%) aligned concordantly >1 times
61.56% overall alignment rate
Here is the code I used to count the number of reads:
cat unique.mapped.sam | wc -l
The number I got is 53948500, which is a lot larger than 22435343.
Anyone had any idea on this one? Thank you very much!
Comment