I use mpileup files to call variants for subsections in the genome at a time with a set of custom python/awk scripts. This works well for unique mapping reads, but if I have reads that map to the genome more than once, this information gets lost during mpileup generation.
I have written some scripts to count the 'mappability' of a given read, as my mapper of choice, bowtie, does not report this in the NH or IH tags, but am struggling a bit incorporating this 'correction factor' in to mpileups.
From what I can make out in the manuals samtools might not be able to do so. Am I missing something, or is there perhaps a different pileup-like format I could use to extract 'normalised'-coverage and variants?
I have written some scripts to count the 'mappability' of a given read, as my mapper of choice, bowtie, does not report this in the NH or IH tags, but am struggling a bit incorporating this 'correction factor' in to mpileups.
From what I can make out in the manuals samtools might not be able to do so. Am I missing something, or is there perhaps a different pileup-like format I could use to extract 'normalised'-coverage and variants?
Comment