Hi forum,
I am reading the paper "Fast Compuation and Applications of Genome Mappability". I'm on page 8 and I'm starting to wonder things:
Question 1
Where do the k-mers come from?
I mean, i could just download the reference genome and cut out pos chr1:1-100 and then I have my k-mer 100 for i, right?
But sometimes I get the impression that the authors are talking about k-mers that come from sequencing experiments and that makes me wonder wether I get something wrong ?
Question 2
Mappability-values are based on k-mers that start at "position = i". So if my mappability-score for position i is 1 (= unique) it does absolutely not mean, that this position cannot be part of a repeat, right?
Because: if for example the k-mer 100 that starts at position i-50 (thus this k-mer includes position i) has multiple matches, then it means that position i IS part of a repeat.
But this contradicts my imagination of a mappability score: I thought the sense of a mappability score for k=100 of "position = i" [B]was exactly this:[\B] to indicate wether (or how likely it is) that position i is part of a repeat!
Now on page 8 comes the pileup mappability into play. I have the impression that this is exactly concering the problem I have described but I'm not sure because of the following.
I do not understand the introducing sentence of the pileup-mappability: "Before proceeding, we observe that our definition of mappability should be refined if we are to deal with pileups".
What does that mean "if we are to deal with pileups", I do not get what they are referring to ?
The sentence seems like suddenly a reason (What is this reason and why wasn't it there from the beginning?) has occured to define pileup mappability. To me the reason was there from the beginning and there never was a reason to define mappability via k-mers that only start at position i ... ?
It seems like "the reason" has something to do with heterozygosity but I do not get the link ?
I hope I've made clear where my problem of understanding lies (?).
I am reading the paper "Fast Compuation and Applications of Genome Mappability". I'm on page 8 and I'm starting to wonder things:
Question 1
Where do the k-mers come from?
I mean, i could just download the reference genome and cut out pos chr1:1-100 and then I have my k-mer 100 for i, right?
But sometimes I get the impression that the authors are talking about k-mers that come from sequencing experiments and that makes me wonder wether I get something wrong ?
Question 2
Mappability-values are based on k-mers that start at "position = i". So if my mappability-score for position i is 1 (= unique) it does absolutely not mean, that this position cannot be part of a repeat, right?
Because: if for example the k-mer 100 that starts at position i-50 (thus this k-mer includes position i) has multiple matches, then it means that position i IS part of a repeat.
But this contradicts my imagination of a mappability score: I thought the sense of a mappability score for k=100 of "position = i" [B]was exactly this:[\B] to indicate wether (or how likely it is) that position i is part of a repeat!
Now on page 8 comes the pileup mappability into play. I have the impression that this is exactly concering the problem I have described but I'm not sure because of the following.
I do not understand the introducing sentence of the pileup-mappability: "Before proceeding, we observe that our definition of mappability should be refined if we are to deal with pileups".
What does that mean "if we are to deal with pileups", I do not get what they are referring to ?
The sentence seems like suddenly a reason (What is this reason and why wasn't it there from the beginning?) has occured to define pileup mappability. To me the reason was there from the beginning and there never was a reason to define mappability via k-mers that only start at position i ... ?
It seems like "the reason" has something to do with heterozygosity but I do not get the link ?
I hope I've made clear where my problem of understanding lies (?).