Hi,
I am working with Illumina paired-end data with read length (l) 101 to generate de novo assembly. I am trying to predict genome size as per given formula, M = N * (L-K+1)/L.
The M value I found using Jellyfish which is around 3591. But I am confused since I have paired end file with 80284125 reads in each file, i.e. 80284125*2. When I have calculated genome size using only 80284125*101, I got expected genome size around 1.78MB.
But I am not sure whether genome size should be calculated how I have calculated above or using (80284125*101)*2, since data is paired end.
Also, it would be of great help if you tell how to get error k-mer distribution value.
Regards,
Mandar
I am working with Illumina paired-end data with read length (l) 101 to generate de novo assembly. I am trying to predict genome size as per given formula, M = N * (L-K+1)/L.
The M value I found using Jellyfish which is around 3591. But I am confused since I have paired end file with 80284125 reads in each file, i.e. 80284125*2. When I have calculated genome size using only 80284125*101, I got expected genome size around 1.78MB.
But I am not sure whether genome size should be calculated how I have calculated above or using (80284125*101)*2, since data is paired end.
Also, it would be of great help if you tell how to get error k-mer distribution value.
Regards,
Mandar
Comment