Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird stats output of SOAP de Novo

    Dear all,

    I have run SOAP de Novo to assemble a nematode genome.
    SOAP de Novo output a stats file .scafStatistics which I do not understand. I am especially confused about the N50 values. Why are there 2 values?

    Here is the output:

    <-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->

    Size_includeN 76698637
    Size_withoutN 65796073
    Scaffold_Num 15259
    Mean_Size 5026
    Median_Size 160
    Longest_Seq 1079942
    Shortest_Seq 100
    Singleton_Num 11712
    Average_length_of_break(N)_in_scaffold 714

    Known_genome_size NaN
    Total_scaffold_length_as_percentage_of_known_genome_size NaN

    scaffolds>100 15047 98.61%
    scaffolds>500 4059 26.60%
    scaffolds>1K 3004 19.69%
    scaffolds>10K 688 4.51%
    scaffolds>100K 216 1.42%
    scaffolds>1M 1 0.01%

    Nucleotide_A 18790176 24.50%
    Nucleotide_C 14159122 18.46%
    Nucleotide_G 14204857 18.52%
    Nucleotide_T 18641918 24.31%
    GapContent_N 10902564 14.21%
    Non_ACGTN 0 0.00%
    GC_Content 43.11% (G+C)/(A+C+G+T)

    N10 488263 12
    N20 319382 32
    N30 235181 60
    N40 182496 96
    N50 138908 144
    N60 102282 210
    N70 73846 298
    N80 43901 429
    N90 5795 899

    NG50 NaN NaN
    N50_scaffold-NG50_scaffold_length_difference NaN

    <-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->

    Size_includeN 66764916
    Size_withoutN 66764916
    Contig_Num 69780
    Mean_Size 956
    Median_Size 458
    Longest_Seq 33978
    Shortest_Seq 100

    Contig>100 69392 99.44%
    Contig>500 33098 47.43%
    Contig>1K 20004 28.67%
    Contig>10K 138 0.20%
    Contig>100K 0 0.00%
    Contig>1M 0 0.00%

    Nucleotide_A 19146203 28.68%
    Nucleotide_C 14420728 21.60%
    Nucleotide_G 14387230 21.55%
    Nucleotide_T 18810755 28.17%
    GapContent_N 0 0.00%
    Non_ACGTN 0 0.00%
    GC_Content 43.15% (G+C)/(A+C+G+T)

    N10 6141 779
    N20 4338 2094
    N30 3326 3858
    N40 2586 6144
    N50 2011 9076
    N60 1536 12880
    N70 1122 17959
    N80 755 25179
    N90 410 37034

    NG50 NaN NaN
    N50_contig-NG50_contig_length_difference NaN

    Number_of_contigs_in_scaffolds 58068
    Number_of_contigs_not_in_scaffolds(Singleton) 11712
    Average_number_of_contigs_per_scaffold 16.4

    I have looked all over for the answer but didnĀ“t manage to find it.

    All the best,
    Sophie

  • #2
    The first one is scaffold N50, the second is contig N500. Look for
    <-- Information for assembly Scaffold 'SoapOutput-SB372.scafSeq'.(cut_off_length < 100bp) -->
    resp
    <-- Information for assembly Contig 'SoapOutput-SB372.contig'.(cut_off_length < 100bp) -->
    to see what type of sequences does the statistics refer to.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Developments in Metagenomics
      by seqadmin





      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
      09-23-2024, 06:35 AM
    • seqadmin
      Understanding Genetic Influence on Infectious Disease
      by seqadmin




      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
      09-09-2024, 10:59 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 04:51 AM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 10-01-2024, 07:10 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-30-2024, 08:33 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-26-2024, 12:57 PM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Working...
    X