Hi everyone, how can we actually estimate the genome size if there does not exist a reference genome or any genome that is significantly close enough to your sample?
One genome estimation method used by the BGI in assembling the Giant panda genome is to use the 17-mer. I don't quite get their idea, would anyone help explain in this?
From their supplementary, " Distribution of 17-mer frequency in the raw sequencing reads. We used all reads from the short insert-size libraries (<500bp). The peak depth is at 15X. The peak of 17-mer frequency (M) in reads is correlated with the real sequencing depth (N), read length (L), and kmer length (K), their relations can be expressed in a experienced formula: M = N * (L – K + 1) / L. Then, we divided the total sequence length by the real sequencing depth and obtained an estimated the genome size of 2.46 Gb."
FYR, the paper is titled as "The sequence and de novo assembly of the giant panda genome"
One genome estimation method used by the BGI in assembling the Giant panda genome is to use the 17-mer. I don't quite get their idea, would anyone help explain in this?
From their supplementary, " Distribution of 17-mer frequency in the raw sequencing reads. We used all reads from the short insert-size libraries (<500bp). The peak depth is at 15X. The peak of 17-mer frequency (M) in reads is correlated with the real sequencing depth (N), read length (L), and kmer length (K), their relations can be expressed in a experienced formula: M = N * (L – K + 1) / L. Then, we divided the total sequence length by the real sequencing depth and obtained an estimated the genome size of 2.46 Gb."
FYR, the paper is titled as "The sequence and de novo assembly of the giant panda genome"
Comment