We have sequenced clonal populations of 20 bacterial lines.
Sequencing was done on Illumina platform in two batches.
For the first run, 10 lines were chosen and each was sequenced at 40x coverage
In the second run , the other 10 lines were sequenced at 150x coverage.
Now, I performed SNV detection in these 20 samples, using three different tools, BRESEQ, MAQ and VARSCAN. Each line was compared against a near isogenic E.Coli reference.
Now the issue in hand is:
While in lines sequenced at 40x, SNV are detected at 100% frequency. In other words, whenever a variant is detected in a particular line, it has 100% frequency. Hence its a binary form, either the line has a SNV at 100% frequency or it doesn't.
But, this is not the case with high coverage (>150x) samples. Here, what I see a lot of marginal predictions for variants, i.e variants called have a frequency ranging from 10% to 99 %.
This is surprising, because in a single sample, the bacteria are clonal and hence chances of multiple variants are very low.
I have checked the mapping quality of reads in the pileup and it looks good.
Any suggestions will be highly appreciated.
Sequencing was done on Illumina platform in two batches.
For the first run, 10 lines were chosen and each was sequenced at 40x coverage
In the second run , the other 10 lines were sequenced at 150x coverage.
Now, I performed SNV detection in these 20 samples, using three different tools, BRESEQ, MAQ and VARSCAN. Each line was compared against a near isogenic E.Coli reference.
Now the issue in hand is:
While in lines sequenced at 40x, SNV are detected at 100% frequency. In other words, whenever a variant is detected in a particular line, it has 100% frequency. Hence its a binary form, either the line has a SNV at 100% frequency or it doesn't.
But, this is not the case with high coverage (>150x) samples. Here, what I see a lot of marginal predictions for variants, i.e variants called have a frequency ranging from 10% to 99 %.
This is surprising, because in a single sample, the bacteria are clonal and hence chances of multiple variants are very low.
I have checked the mapping quality of reads in the pileup and it looks good.
Any suggestions will be highly appreciated.
Comment