Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • N50 statistic & L50 statistic

    Can anyone tell me the difference between N50 and L50 when using in genome assembly.
    Tank you!

  • #2
    Originally posted by moolder View Post
    Can anyone tell me the difference between N50 and L50 when using in genome assembly.
    Tank you!
    There are different definitions...

    Nature Reviews Genetics 13, 329-342 (May 2012) | doi:10.1038/nrg3174
    A beginner's guide to eukaryotic genome annotation
    Mark Yandel1 & Daniel Ence


    Genome assemblies are composed of scaffolds and contigs. Contigs are contiguous consensus sequences that are derived from collections of overlapping reads. Scaffolds are ordered and orientated sets of contigs that are linked to one another by mate pairs of sequencing reads.

    Scaffold and contig N50s
    By far the most widely used statistics for describing the quality of a genome assembly are its scaffold and contig N50s. A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. The scaffold N50 is calculated in the same fashion but uses scaffolds rather than contigs. The longer the scaffold N50 is, the better the assembly is. However, it is important to keep in mind that a poor assembly that has forced unrelated reads and contigs into scaffolds can have an erroneously large N50. Note too that scaffolds and contigs that comprise only a single read or read pair — often termed 'singletons' — are frequently excluded from these calculations, as are contigs and scaffolds that are shorter than ~800 bp. The procedures used to calculate N50 may therefore vary between genome projects.


    Percent gaps
    Another important assembly statistic is its percent gaps.Unsequenced regions between mate pairs in contigs and between scaffolds are often represented as runs of 'N's in the final assembly. Thus two assemblies can have identical scaffold N50s but can still differ in their percent gaps: one has very few gaps, and the other is heavily peppered with them. Estimates of gap lengths are often made based on library insert sizes and read lengths; when these are available, the number of 'N's in these gaps usually, but not always, represents the most likely estimate of that gap's size; sometimes, all gaps are simply represented by a run of 50 'N's regardless of their size.
    Another (I guess more welcomed?) definition would be:
    N50 is the number of contigs (sorted by length from longest to shortest) whose length when summed up covers 50% or more of the genome assembly. Let's say your total assembly length is 10mb and you have 5 contigs/scaffolds of length 5mb, 4mb, 3mb, 2mb and 1mb. If you want the N50 you sort the contigs/scaffolds by length and sum them up until you cover 50% or more of the assembly, that would be 5mb + 4mb = 9mb which is 60%. The N50 would then be 2, because you need just 2 contigs/scaffolds to cover 50% or more of the assembly. The L50 would be the length of the smallest contig/scaffold in the N50 set, i.e. the length of the last contig/scaffold added to cover 50% or more of the assembly. Therefore, the L50 of this example would be the contig/scaffold length of 4mb.
    Both metrics can be applied to assemblies with just contigs or scaffolded contigs. If you have scaffolds and want contig metrics, you need to partition each scaffold into contigs again by breaking them up where the NNNN junctions occur.
    Last edited by Guest; 03-28-2013, 03:35 AM.

    Comment


    • #3
      WARNING!!!!

      The post above has L50 and N50 reversed. I'm very concerned that people will be misled.

      Please consult other sources for N50 and L50!

      Comment


      • #4
        Originally posted by dgordon View Post
        WARNING!!!!

        The post above has L50 and N50 reversed. I'm very concerned that people will be misled.

        Please consult other sources for N50 and L50!
        How is it reversed if I mentioned two different explanations - the first source I quoted explains N50 length and the other source explains N50 as count and L50 as length? Both cases are covered.

        There's an entire discussion on this, which explanation is right:
        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        and people are clearly divided.

        Even if there was (but there really isn't) a consensus that N50 indicates length and L50 count, go look at genome assembly reports and papers and you will still find it "the other way round" from what you also think is right:
        Tools for exploring the Phytozome collection of green plant genomes

        Scaffold N50 (L50) = 51 (993 Kbp)
        Contig N50 (L50) = 1476 (37.6 Kbp)

        Advances in Botanical Research publishes in-depth and up-to-date reviews on a wide range of topics in plant sciences. Currently in its 74th volume, the series features several reviews by recognized experts on all aspects of plant genetics, biochemistry, cell biology, molecular biology, physiology, and ecology. This volume features reviews on the advances in knowledge for the main traits important in fruit trees and forest trees, the advances in tools and resources for genetics and genomics in these species, and the knowledge developed in three rather separated communities of researchers: forest, fruit trees, and grapevines. Provides an update of the knowledge related to plant biology for the main traits for forest and fruit trees Provides an update about the tools available for the study of this category of plants Gives a general view of research results obtained in two separate research communities, fruit trees and forest trees

        For the scaffolds the N50 is 5 and the L50 is 57.5Mb;

        Please consult literature before correcting someone with your subjective opinion.

        The best way to go about this, whether you're using N50 or L50 for one or the other is to be more descriptive like by including "???50 scaffold length" and "???L50 scaffold count". Whether you prefer N or L in that case everybody will understand what you're referring to.
        Last edited by Guest; 09-25-2015, 07:43 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X