It's still unfixed (as of TopHat 2.1.1) and unlikely to be fixed at all, since TopHat is not really being developed anymore (the developers focus on HISAT2, its successor).
Did you use TopHat via bcbio-nextgen by any chance? That fixes the unmapped reads file for you automatically; other frameworks may do the same.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Yes, it is a bug in TopHat. Didn't they fix it in TopHat2? I recently used it and the flags were alright.
Leave a comment:
-
It's a bug in TopHat (all versions); it doesn't set the 0x8 bit ("next segment in the template unmapped") when both reads are unmapped.
This is one of the issues TopHat-Recondition fixes (https://bmcbioinformatics.biomedcent...859-016-1058-x , https://github.com/cbrueffer/tophat-recondition).
Leave a comment:
-
Crazy TopHat unmapped reads
Did anyone have the same problem with these unmapped reads?
Leave a comment:
-
Your aligner seems to have a bug, the flags should be 77 and 141 if both mates are unmapped.
Leave a comment:
-
What's going on here?
I don't know if I am understanding the correct meaning of the reads index in sam files.
This information is present in the Flags description:
'Next, we have the cases when only one read in a pair is mapped.
69 - 0001000101 - First read in pair. This read is unmapped but its mate is mapped.
133 - 0010000101 - 2nd in pair. Read unmapped but mate is mapped.'
Soooo, does it means that If I have a read with 133 or 69, its paired read can't be present in the unmapped reads file, ok?
I am assuming that reads with the same index (in this case "M03092:8:000000000-AG2GN:1:2117:2591:14346") are paired. Am I correct? If I am wrong I understood what happened but I'd like to know what are these lines with same index.
Following this line of thought (same index, paired reads), why are there so many lines of my unmapped paired reads like this?
M03092:8:000000000-AG2GN:1:2117:2591:14346 69
M03092:8:000000000-AG2GN:1:2117:2591:14346 133
Can anyone explain what's going on with these reads?
Leave a comment:
-
That is true seq_lover. Which combinations you consider "all good" and which ones "weird" depends on how you constructed your library. Thank you for putting it so succinctly.
Leave a comment:
-
I assume kgulukota is trying to give example for mate pair library (solid) and swbarnes2 is giving example for paired end (illumina). I think both of them are correct. Please correct me if I am wrong.
Leave a comment:
-
Personally, the ones and zeros aren't helpful to me. I don't think of 147 as "0010010011", but as "128+16+2+1", and I remember what all those numbers stand for. And in most contexts, having both reads map in the forward direction or both map in the reverse direction is not all good, it's weird.
The four good numbers to remember are 64+16+2+1=83, 64+32+2+1=99, 128+16+2+1=147 and 128+32+2+1=163. Something is very wrong if you ever see both 128 and 64 together, and with most current technologies, you should see 16 or 32, but not both. If you see both, or don't see either, your reads are paired strangely.
Leave a comment:
-
SAM flag idioms
"There are 10 types of people in this world: those who assimilated binary numbers and those who didn't."
I definitely belong to the 10'th type and hence SAM Flags are a chore. They may be a very compact way of communicating a lot of info about an alignment, but how do we humans learn them? I know it is kind of nerdy to actually look through SAM files but, what can I say? Mea culpa.
Anyway, this post is my attempt to understand them like a natural language i.e. recognize some idiomatic representations in flags. If you already know these, you are a "binar" and way ahead of us humans on this topic.
You can use this handy little web page for specific flags:
However, to "speak SAM", we must know these flags without having to refer to a web page for each line. So, here are some simple idioms.
Unpaired Reads
For unpaired reads, the flags are very easy to recognize because there are only 3 values:
- 4 - 0000000100 - means "this is an unpaired read and is not mapped".
- 16 - 0000010000 - "this unpaired read is mapped in the reverse orientation".
- 0 - 0000000000 - "this unpaired read is mapped in the forward orientation".
Paired Reads
For paired reads, 0'th bit HAS to be set. Hence all flags for paired reads HAVE to be odd. In other words, all even-numbered flags other than the above three (0, 4 and 16) are meaningless. (Good progress. We can recognize non-sense words. Writing a Jabberwocky poem with these flags is left as an exercise for the reader).
For paired reads all flags in the intervals [65-127] and [193-255] relate to the first read of a pair. All other (odd) flags refer to the second read in a pair.
"All Good"
Some values mean "all good" i.e. that both reads in the pair have aligned:
- 65 - 0001000001 - this is first read in pair and both reads aligned the forward strand.
- 129 - 0010000001 - This is second read of pair and both reads aligned the forward strand.
NOTE: 67 (0001000011) and 131 (0010000011) also mean the same as 65 and 129 with the added assurance that "the pair is properly aligned" meaning that they mapped within a proper distance from each other.
- 113 - 0001110001 - "this is the first read of a pair, both reads in pair were flipped and both mapped".
- 177 - 0001110001 - "this is the second read of a pair, both reads in pair were flipped and both mapped".
Other times only one of the reads in a pair is flipped though both of them map:
- 81 - 0001010001 - "this is the first read of pair, both reads mapped, we had to flip this read, but mate is in forward orientation".
- 161 - 0010100001 - "this is second read, this one is forward but we flipped its mate and both reads mapped".
NOTE: 163 (0010100011) and 83 (0001010011) are the same as 161 and 81 except "it is in a proper pair".- 97 - 0001100001 - "this is first read, its mate is flipped but this is forward. Both mapped".
- 145 - 0010010001 - "this is second read. it is flipped but its mate is not. Both mapped".
NOTE: 99 (0001100011) and 147 (0010010011) are the same as 97 and 145 except with "proper mapping in pair".
"All Bad"
At the other end of the spectrum we have "all bad" i.e. neither the read nor its mate mapped:
77 - 0001001101 - First in pair, both reads in pair unmapped. "All bad"
141 - 0010001101 - Second in pair and "all bad".
- Exercise: Just like with 20, AnnoyingAlign puts flags of 93 or 125 on all unmapped pairs. What other flags can AnnoyingAlign use to maximize user annoyance?
- Exercise: Why are 79 and 143 particularly good words for Jabberwocky?
Next, we have the cases when only one read in a pair is mapped.
- 69 - 0001000101 - First read in pair. This read is unmapped but its mate is mapped.
- 137 - 0010001001 - second in pair. Read is mapped but mate is unmapped.
- 73 - 0001001001 - First read in pair. This read is mapped but its mate is not.
- 133 - 0010000101 - 2nd in pair. Read unmapped but mate is mapped.
Can you again see why number of reads with flag of 69 must be the same as the number of reads with flag of 137?
There are of course many other combinations. The purpose here is not to enumerate them but to simply have some fun with the structure of these flags.
What is your favorite flag? Do you have other ways of remembering what these things mean as you look through SAM files?Tags: None
- 4 - 0000000100 - means "this is an unpaired read and is not mapped".
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: