Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi choy, and thanks for your reply. Your observation seems to be true ("=' given by TopHat despite mapping to diff chromosomes). I'll try to correct these errors in my files. Would definitely be great to have TopHat give the right SAM expressions.
-
Hi Boel,
I stumbled over this as well. I think Picard can handle these correctly, but I think there is a bug in TopHat that causes these to be reported incorrectly.
What I have noticed is that TopHat always uses the '=' symbol for the 2nd mate's reference ID. So that even if the mate maps to a different chromosome, it is still marked as the same chromosome in TopHat. A lot of these potentially could unnoticed by Picard as long as the position of the mate is less than the chromosome size. However, Picard complains when it (inevitably) encounters a 2nd mate that violates chromosome size boundaries.
Am I correct in observing this?
Currently I just throw these reads away. Is there a better way to handle it? I suppose it would be possible to sort by read name and repair the mate chromosome for these alignments.
Overall, it would be great to see better SAM compatibility in TopHat.
Leave a comment:
-
Identical fragments, different chromosomes, picard MarkDuplicates
Hi All,
I have RNA seq data from ~ 20 samples, 2x72, Solexa, about 20-25 million fragments per sample.
When trying to run picard's MarkDuplicates I got this error back:
Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 2278214, Read name WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0, Mate Alignment start (195002931) must be <= reference sequence length (181748087) on reference chr2
If looking at the read-pair that caused this error:
grep WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 accepted_hits.sam
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 113 chr1 195002931 255 72M = 3420320 0 AGAAAAAAATCCACCACCACCACCACCACCAAAAGGAACTACCCCACTGTGATGTAGGGCTGTAGAGGGGGG ###?BBB??'>=/=>2>A/AA7BB9BBBDBEGFEDEDBEDBEEFFCFDEEEEFFEDGGFGGGGGGGGGGGGG NM:i:1
WICMT-SOLEXA_100409_61E8NAAXX:2:17:3572:14759#0 177 chr2 3420320 255 72M = 195002931 0 TTTTTTTTTTCTTTGAGACAGGGTTTCTCTGTGTAGCCTTGGCTGTCCTGGAACTCACTCTGTAGACCAAGC GDEEEEDEEDGFEFGGGEGGGGGEGFGGGGGGGGGG?GGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGG NM:i:2
The problem is that I have fragments where the different ends map to different chromosomes. In this case this causes an error because the first end maps on pos 195002931 (on chromosome 1), and chromosome 2, which the second end maps to, is not that long.
Is there a way to inform picard to swallow these alignments? Would be good if the SAM format would include the chr mapping for the pair as well. Picard does not disregard other non-proper pairs.
Or should I just not use fragments where the different ends map to diff chromosomes? How do you usually treat this?
Thank you,
BoelTags: None
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:35 AM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
Today, 07:35 AM
|
||
Started by seqadmin, Yesterday, 09:43 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:43 AM
|
||
Started by seqadmin, Yesterday, 08:36 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:36 AM
|
||
Started by seqadmin, 01-17-2025, 09:38 AM
|
0 responses
36 views
0 likes
|
Last Post
by seqadmin
01-17-2025, 09:38 AM
|
Leave a comment: