Seqanswers Leaderboard Ad

**Brian Bushnell** · 06-15-2015, 12:17 PM

Just store kmers and counts, and look for branches. When you have a kmer X, there are 4 possible next kmers. Assuming Y is the last K-1 bases of X. Then the 4 possible next kmers are:
YA
YC
YG
YT

So, starting from X, just look up the 4 possible next kmers and get their counts. You'll end up with something like this:

YA 0
YC 100
YG 0
YT 1

In that case, you have a branch - the next base should be C or T. But T only has a count of 1, so it's probably C; with 101x coverage of a region, it's not unlikely to have 100 correct copies and 1 error.

However, it also might not be an error (though it probably is). It could be that there are 100 times as many YC kmers in the genome as compared to YT kmers, and both are correct. How you decide whether or not this is an error is up to your heuristics. But regardless, the optimal implementation is generally to store kmer counts in a way they can be quickly looked up (generally by hashing), and only store them for kmers that actually occur.

If you decide YT is a single substitution error, then for reads containing YT (which in this case would be only 1 read), you would replace the T with a C.

**ArtificialBreeze** · 06-15-2015, 12:26 PM

Oh, I didn't think of that, it is true that this works since we have left and right k-1 mers, which I completely forgot.
Thanks very much !

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

de novo assembly with de Bruijn colored graphs

Comment

Comment

Latest Articles

ad_right_rmr

News