Here is the answer to your question I found in a helpful mpileup tutorial (link): “We don’t want to trust SNPs at sites with super high coverage, because they might be represent variation between variable copy number repeats, i.e., the reads that map to this location in the reference are actually from duplicated sites in your sample; you can–and should–change this parameter based on the kind of coverage you have in your dataset, e.g., -D500.”
What I'm wondering is how exactly you figure out what number to use? Is there any rule of thumb for what to do with your coverage information to know how many reads are too many?
Hope you figured your stuff out

Leave a comment: