I am working with Illumina paired-end data with read length (l) 101 to generate de novo assembly. I am trying to predict genome size as per given formula, M = N * (L-K+1)/L.

The M value I found using Jellyfish which is around 3591. But I am confused since I have paired end file with 80284125 reads in each file, i.e. 80284125*2. When I have calculated genome size using only 80284125*101, I got expected genome size around 1.78MB.

But I am not sure whether genome size should be calculated how I have calculated above or using (80284125*101)*2, since data is paired end.

Also, it would be of great help if you tell how to get error k-mer distribution value.

