Dear all,
I'm new to RNA-seq analysis. I'm Trying out htseq-count to get the raw counts for downstream analysis by EdgeR or DESeq. I am getting some warning messages from HTSeq and I am not sure if I can ignore it, or what I should do about it.
While running htseq-count:
I got thousands of warning messages like these:
Warning: Read HWI-ST845:13032020JBACXX:8:1101:6938:66256 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST845:13032020JBACXX:8:1101:7019:18774 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
After looking into those reads that produce warning messages and from other posts on this forum, I think it is because some reads miss their mate.
For example:
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr1 24615683 2 50M chr2 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:2 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
My guess is that these are what HTSeq complained about. However, is this something I need to worry about? Why do I get those unpaired missing mates?Do I simply disregard those warning messages?
--
Some additional information about the alignment I am looking at:
80760824 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
79363333 + 0 mapped (98.27%:nan%)
80760824 + 0 paired in sequencing
40378231 + 0 read1
40382593 + 0 read2
77654682 + 0 properly paired (96.15%:nan%)
78100006 + 0 with itself and mate mapped
1263327 + 0 singletons (1.56%:nan%)
354942 + 0 with mate mapped to a different chr
66138 + 0 with mate mapped to a different chr (mapQ>=5)
Last few lines of the results from HTseq-count:
a 0
l7Rn6 168
no_feature 15754100
ambiguous 109560
too_low_aQual 0
not_aligned 387345
alignment_not_unique 11975435
I'm new to RNA-seq analysis. I'm Trying out htseq-count to get the raw counts for downstream analysis by EdgeR or DESeq. I am getting some warning messages from HTSeq and I am not sure if I can ignore it, or what I should do about it.
While running htseq-count:
Code:
python -m HTSeq.scripts.count names_sorted.sam genes.gtf
Warning: Read HWI-ST845:13032020JBACXX:8:1101:6938:66256 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWI-ST845:13032020JBACXX:8:1101:7019:18774 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
After looking into those reads that produce warning messages and from other posts on this forum, I think it is because some reads miss their mate.
For example:
Code:
grep "HWI-ST845:130320:D20JBACXX:8:1101:6938:66256" names_sorted.sam
HWI-ST845:13032020JBACXX:8:1101:6938:66256 129 chr2 22588080 2 50M = 32387601 0 AGGGGGTTCGATTCCTTCCTTTCTTATTTTACTTTTACATAGGTTGGTTC @@CFFFDFHHAFHIGJIJIGHJIJJIJGIJJGHIIJIHDHGHIFHIJEHG MD:Z:50 NH:i:3 HI:i:1 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
HWI-ST845:13032020JBACXX:8:1101:6938:66256 65 chr2 32387601 40 50M = 22588080 0 CTGCAGGGGGACAGTGAGCAGAGATGGGGCAGGGATCAAGTTCTGAGTTG CCCFFFFFHHHHHJGHGIIJJIIIIIJJJFJIJJGHJJJJCGIJJJJGIJ MD:Z:50 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40 X2:i:0
HWI-ST845:13032020JBACXX:8:1101:6938:66256 145 chrM 6846 2 50M chr2 32387601 0 GAACCAACCTATGTAAAAGTAAAATAAGAAAGGAAGGAATCGAACCCCCT GHEJIHFIHGHDHIJIIHGJJIGJIJJIJHGIJIJGIHFAHHFDFFFC@@ MD:Z:50 NH:i:3 HI:i:3 NM:i:0 SM:i:2 XQ:i:40 X2:i:40
My guess is that these are what HTSeq complained about. However, is this something I need to worry about? Why do I get those unpaired missing mates?Do I simply disregard those warning messages?
--
Some additional information about the alignment I am looking at:
Code:
samtools flagstat mybam.bam
0 + 0 duplicates
79363333 + 0 mapped (98.27%:nan%)
80760824 + 0 paired in sequencing
40378231 + 0 read1
40382593 + 0 read2
77654682 + 0 properly paired (96.15%:nan%)
78100006 + 0 with itself and mate mapped
1263327 + 0 singletons (1.56%:nan%)
354942 + 0 with mate mapped to a different chr
66138 + 0 with mate mapped to a different chr (mapQ>=5)
Last few lines of the results from HTseq-count:
a 0
l7Rn6 168
no_feature 15754100
ambiguous 109560
too_low_aQual 0
not_aligned 387345
alignment_not_unique 11975435
Comment