How 'N-blocks' in a genome are normally handled on a Local and/or global alignment computation?
I have been reading about this but seems I can't find what the 'best' practice should be. I see some tools that remove all N-blocks from a chromosome before doing the alignments and others assign a penalty whenever a nucleotide is compared against a 'N' nucleotide.
As far as I understand (and I may be wrong) 'N' is an unknown nucleotide that has not being decoded. Thus in theory could be any one. However, if I assigned a match on a large N-block region, all individual reads are mapped incorrectly(?) into that region since the alignment see an exact match with all the highest scores into that region.
Should N-blocks be simply trimmed out of the chromosome? Is there any other methodology to follow when doing alignments between reads and chromosomes (Assembly)?
Sorry if this is common knowledge, but turns out that searching for 'N-blocks' is not a wise search key since it matches almost everything that is not relevant to this topic.
thanks.
I have been reading about this but seems I can't find what the 'best' practice should be. I see some tools that remove all N-blocks from a chromosome before doing the alignments and others assign a penalty whenever a nucleotide is compared against a 'N' nucleotide.
As far as I understand (and I may be wrong) 'N' is an unknown nucleotide that has not being decoded. Thus in theory could be any one. However, if I assigned a match on a large N-block region, all individual reads are mapped incorrectly(?) into that region since the alignment see an exact match with all the highest scores into that region.
Should N-blocks be simply trimmed out of the chromosome? Is there any other methodology to follow when doing alignments between reads and chromosomes (Assembly)?
Sorry if this is common knowledge, but turns out that searching for 'N-blocks' is not a wise search key since it matches almost everything that is not relevant to this topic.
thanks.
Comment