Dear all,
I have run SOAP de Novo to assemble a nematode genome.
SOAP de Novo output a stats file .scafStatistics which I do not understand. I am especially confused about the N50 values. Why are there 2 values?
Here is the output:
<-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->
Size_includeN 76698637
Size_withoutN 65796073
Scaffold_Num 15259
Mean_Size 5026
Median_Size 160
Longest_Seq 1079942
Shortest_Seq 100
Singleton_Num 11712
Average_length_of_break(N)_in_scaffold 714
Known_genome_size NaN
Total_scaffold_length_as_percentage_of_known_genome_size NaN
scaffolds>100 15047 98.61%
scaffolds>500 4059 26.60%
scaffolds>1K 3004 19.69%
scaffolds>10K 688 4.51%
scaffolds>100K 216 1.42%
scaffolds>1M 1 0.01%
Nucleotide_A 18790176 24.50%
Nucleotide_C 14159122 18.46%
Nucleotide_G 14204857 18.52%
Nucleotide_T 18641918 24.31%
GapContent_N 10902564 14.21%
Non_ACGTN 0 0.00%
GC_Content 43.11% (G+C)/(A+C+G+T)
N10 488263 12
N20 319382 32
N30 235181 60
N40 182496 96
N50 138908 144
N60 102282 210
N70 73846 298
N80 43901 429
N90 5795 899
NG50 NaN NaN
N50_scaffold-NG50_scaffold_length_difference NaN
<-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->
Size_includeN 66764916
Size_withoutN 66764916
Contig_Num 69780
Mean_Size 956
Median_Size 458
Longest_Seq 33978
Shortest_Seq 100
Contig>100 69392 99.44%
Contig>500 33098 47.43%
Contig>1K 20004 28.67%
Contig>10K 138 0.20%
Contig>100K 0 0.00%
Contig>1M 0 0.00%
Nucleotide_A 19146203 28.68%
Nucleotide_C 14420728 21.60%
Nucleotide_G 14387230 21.55%
Nucleotide_T 18810755 28.17%
GapContent_N 0 0.00%
Non_ACGTN 0 0.00%
GC_Content 43.15% (G+C)/(A+C+G+T)
N10 6141 779
N20 4338 2094
N30 3326 3858
N40 2586 6144
N50 2011 9076
N60 1536 12880
N70 1122 17959
N80 755 25179
N90 410 37034
NG50 NaN NaN
N50_contig-NG50_contig_length_difference NaN
Number_of_contigs_in_scaffolds 58068
Number_of_contigs_not_in_scaffolds(Singleton) 11712
Average_number_of_contigs_per_scaffold 16.4
I have looked all over for the answer but didnĀ“t manage to find it.
All the best,
Sophie
I have run SOAP de Novo to assemble a nematode genome.
SOAP de Novo output a stats file .scafStatistics which I do not understand. I am especially confused about the N50 values. Why are there 2 values?
Here is the output:
<-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->
Size_includeN 76698637
Size_withoutN 65796073
Scaffold_Num 15259
Mean_Size 5026
Median_Size 160
Longest_Seq 1079942
Shortest_Seq 100
Singleton_Num 11712
Average_length_of_break(N)_in_scaffold 714
Known_genome_size NaN
Total_scaffold_length_as_percentage_of_known_genome_size NaN
scaffolds>100 15047 98.61%
scaffolds>500 4059 26.60%
scaffolds>1K 3004 19.69%
scaffolds>10K 688 4.51%
scaffolds>100K 216 1.42%
scaffolds>1M 1 0.01%
Nucleotide_A 18790176 24.50%
Nucleotide_C 14159122 18.46%
Nucleotide_G 14204857 18.52%
Nucleotide_T 18641918 24.31%
GapContent_N 10902564 14.21%
Non_ACGTN 0 0.00%
GC_Content 43.11% (G+C)/(A+C+G+T)
N10 488263 12
N20 319382 32
N30 235181 60
N40 182496 96
N50 138908 144
N60 102282 210
N70 73846 298
N80 43901 429
N90 5795 899
NG50 NaN NaN
N50_scaffold-NG50_scaffold_length_difference NaN
<-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->
Size_includeN 66764916
Size_withoutN 66764916
Contig_Num 69780
Mean_Size 956
Median_Size 458
Longest_Seq 33978
Shortest_Seq 100
Contig>100 69392 99.44%
Contig>500 33098 47.43%
Contig>1K 20004 28.67%
Contig>10K 138 0.20%
Contig>100K 0 0.00%
Contig>1M 0 0.00%
Nucleotide_A 19146203 28.68%
Nucleotide_C 14420728 21.60%
Nucleotide_G 14387230 21.55%
Nucleotide_T 18810755 28.17%
GapContent_N 0 0.00%
Non_ACGTN 0 0.00%
GC_Content 43.15% (G+C)/(A+C+G+T)
N10 6141 779
N20 4338 2094
N30 3326 3858
N40 2586 6144
N50 2011 9076
N60 1536 12880
N70 1122 17959
N80 755 25179
N90 410 37034
NG50 NaN NaN
N50_contig-NG50_contig_length_difference NaN
Number_of_contigs_in_scaffolds 58068
Number_of_contigs_not_in_scaffolds(Singleton) 11712
Average_number_of_contigs_per_scaffold 16.4
I have looked all over for the answer but didnĀ“t manage to find it.
All the best,
Sophie
Comment