Hi Everyone:
I had a conceptual question regarding "strand discordant loci." In the local realignment video tutorial posted by GATK (http://www.broadinstitute.org/videos...ed-realignment), there was a slide on this concept whereby a unique albeit artefactual SNP cluster seems to be biased towards the ends of either forward or reverse strands due to the presence of an indel. It is not quite clear to me why exactly there is this strand bias but gatk will use this information to tag it as a suspicious region.
Let's say we have two identical reads (same sequence) which map to the edge of an indel, but one read is read from the forward strand (sequencing towards the indel let's say) and then another read is read from the reverse strand (sequencing away from the indel let's say). Are these two reads not equally likely to map to the same position on the indel? Does it have something to do with the fact that bases at the end of the reads tend to have lesser qualities (and thus a greater chance of mapping to an indelic region with artefactual mismatches?)
Regardless of this strand discordance, I understand that a high SNP density is also a huge flag for "suspicious" regions requiring realignment but I'd like to understand where this strand bias comes from.
Thanks,
MC
I had a conceptual question regarding "strand discordant loci." In the local realignment video tutorial posted by GATK (http://www.broadinstitute.org/videos...ed-realignment), there was a slide on this concept whereby a unique albeit artefactual SNP cluster seems to be biased towards the ends of either forward or reverse strands due to the presence of an indel. It is not quite clear to me why exactly there is this strand bias but gatk will use this information to tag it as a suspicious region.
Let's say we have two identical reads (same sequence) which map to the edge of an indel, but one read is read from the forward strand (sequencing towards the indel let's say) and then another read is read from the reverse strand (sequencing away from the indel let's say). Are these two reads not equally likely to map to the same position on the indel? Does it have something to do with the fact that bases at the end of the reads tend to have lesser qualities (and thus a greater chance of mapping to an indelic region with artefactual mismatches?)
Regardless of this strand discordance, I understand that a high SNP density is also a huge flag for "suspicious" regions requiring realignment but I'd like to understand where this strand bias comes from.
Thanks,
MC