I'm a computer scientist, so assume I have zero biology knowledge other than what a nucleobase is. I'm currently studying kmc2 (k-mer counting 2). The algorithm takes as input a FASTA or FASTQ file and a k parameter and counts the frequency of each k-mer.
Maybe here someone can tell me something about this algorithm? It's really unclear to me.
-what Minimizers and Signatures do the same thing, what's the actual difference?
-What super k-mers are?
-what's the use for the x parameter (k,x) mers?
-whitch exactly is the criteria for splitting in subarrays R_0, R_1{R_A, R_C, R_G, R_T}?
An explanation, maybe with a simple example (20-length read, k ~=6) would be great.
References:
Maybe here someone can tell me something about this algorithm? It's really unclear to me.
-what Minimizers and Signatures do the same thing, what's the actual difference?
-What super k-mers are?
-what's the use for the x parameter (k,x) mers?
-whitch exactly is the criteria for splitting in subarrays R_0, R_1{R_A, R_C, R_G, R_T}?
An explanation, maybe with a simple example (20-length read, k ~=6) would be great.
References:
Comment