Hello! I've got some problems with understanding what k-mer abundance is. For example, I have several reads :
1) 1000 G 16 0.0538958973815 150
2) 1000 G 24 6.78469203859e-05 136
3) 1000 G 33 9.75299480547e-05 159
4) 1000 G 23 0.0538958973815 178
....
where : 1000-position, G - nucleotide obtained at this position, 16(24,33,23) - read quality according to the Phred scale, 0.0538958973815 - k-mer abundance, 150 - mapping score generated by Novalign.
k-mer abundance is a metric describing the uniqueness of the 13bp region surrounding the read. How can I calculate the k-mer abundance ? And how can I separate low quality sequencing calls from real calls using read quality, k-mer abundance and mapping score?
1) 1000 G 16 0.0538958973815 150
2) 1000 G 24 6.78469203859e-05 136
3) 1000 G 33 9.75299480547e-05 159
4) 1000 G 23 0.0538958973815 178
....
where : 1000-position, G - nucleotide obtained at this position, 16(24,33,23) - read quality according to the Phred scale, 0.0538958973815 - k-mer abundance, 150 - mapping score generated by Novalign.
k-mer abundance is a metric describing the uniqueness of the 13bp region surrounding the read. How can I calculate the k-mer abundance ? And how can I separate low quality sequencing calls from real calls using read quality, k-mer abundance and mapping score?