hi, all
I have try to use the formula below to estimate genome sizes:
Coverage=(peak k-mer)×L/(L-K+1)
Genome size=total_bases/Coverage
But for some dataset, I got several duplication reads, should I exclude these duplicated reads from the dataset before kmer analysis?
The genome contain more than 70% repeats, at 100X depths, how much duplication reads would be expected?
I have try to use the formula below to estimate genome sizes:
Coverage=(peak k-mer)×L/(L-K+1)
Genome size=total_bases/Coverage
But for some dataset, I got several duplication reads, should I exclude these duplicated reads from the dataset before kmer analysis?
The genome contain more than 70% repeats, at 100X depths, how much duplication reads would be expected?