Header Leaderboard Ad


Ambiguous bases in Illumina sequence



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ambiguous bases in Illumina sequence


    I have used a Miseq for several years for bacterial whole genome resequencing.

    In my most recent run, I found a section of my read mapping in which some fraction reads had a specific ambiguous base. So not an 'N', but rather a 'K' in this case. (K corresponds to G or T.) I typically trim my reads to remove the ends of the reads that have quality scores of below Q30, FWIW.

    I have never encountered this type of ambiguity before, so I think it represents something unusual.

    As I interpret this, it means that both G and T nucleotides were read in this position in individual clusters. Since the clusters are seeded by a single DNA molecule, this cluster should not be able to be both G and T at this position...

    The region where this was located was a little unusual in that it was a short region of duplicated sequence (~100 nt) with an imperfect homology at the ends. The ambiguous K nucleotide is at the site where the homology is not perfect and one end should have a G and the other should have a T.

    So my question is whether something during cluster generation could generate this ambiguity at this position? Is it possible there is some template switching during cluster generation that could account for this, particularly considering the long homology present in this molecule? Has anyone else seen something like this before?

    Thank you :-)