Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • N50 and N90 contig size refer to?

    What is the general explanation of N50 and N90 contig size?
    Regarding the "Instructions for scaffolding MIRA 454 contigs & 25KB paired-end data with BAMBUS.
    Based on the MIRA Assembly Info/Bambus Scaffold info, can I know what is the N50 & N90 contig size refer to?
    How can I obtain this value and how to calculate the N50 & N90 contig size?
    Thanks a lot for all of your explanation and suggestion.

  • #2
    The N50 contig size is a weighted median value and defined as
    the length of the smallest contig S in the sorted list of all
    contigs where the cumulative length from the largest contig to
    contig S is at least 50% of the total length.

    cheers,
    Sven

    Comment


    • #3
      Hi,

      Thanks for your info.
      Do you have any idea about N90?
      That means the N50 contig size, I just choose and calculate the smallest contig S in the sorted list of all contigs?
      Thanks again for your explanation

      Originally posted by sklages View Post
      The N50 contig size is a weighted median value and defined as
      the length of the smallest contig S in the sorted list of all
      contigs where the cumulative length from the largest contig to
      contig S is at least 50% of the total length.

      cheers,
      Sven

      Comment


      • #4
        Originally posted by edge View Post
        Do you have any idea about N90?
        I'd say,

        The N90 contig size is a weighted median value and defined as
        the length of the smallest contig S in the sorted list of all
        contigs where the cumulative length from the largest contig to
        contig S is at least 90% of the total length.

        :-)

        Sven

        Comment


        • #5
          Thanks for your suggestion

          I found out that sometimes the maximum contig size will exact same with the N50 contig size.
          Can I know what is the reason ?
          Thanks ya. I'm still new with bioinformatics. Learning process now.thus facing more problem

          Originally posted by sklages View Post
          I'd say,

          The N90 contig size is a weighted median value and defined as
          the length of the smallest contig S in the sorted list of all
          contigs where the cumulative length from the largest contig to
          contig S is at least 90% of the total length.

          :-)

          Sven

          Comment


          • #6
            N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
            N90 is 90%.
            If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
            Code:
            perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}'  contigs.fa
            Last edited by BENM; 10-07-2009, 01:38 AM.

            Comment


            • #7
              Hi BENM,

              I just try the code that you give it to me.
              It can't work d.
              Do I miss anything or the code got problem?
              After I run the code,the output result is empty d
              Thanks for your help ^^
              Originally posted by BENM View Post
              N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
              N90 is 90%.
              If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
              Code:
              perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if($count>=$total/2){$half=$x[j];print "N50: $x[j]\n" if ($half==0);}elsif($count>=$total*0.9){print "N90: $x[j]\n";exit;}}'  contigs.fa

              Comment


              • #8
                Originally posted by edge View Post
                Hi BENM,

                I just try the code that you give it to me.
                It can't work d.
                Do I miss anything or the code got problem?
                After I run the code,the output result is empty d
                Thanks for your help ^^
                hi edge,

                I am sorry for a little mistake, you can type the below code into a perl script:
                Code:
                #/usr/bin/perl -w
                use strict;
                my ($len,$total)=(0,0);
                my @x;
                while(<>){
                	if(/^[\>\@]/){
                		if($len>0){
                			$total+=$len;
                			push @x,$len;
                		}
                		$len=0;
                	}
                	else{
                		s/\s//g;
                		$len+=length($_);
                	}
                }
                if ($len>0){
                	$total+=$len;
                	push @x,$len;
                }
                @x=sort{$b<=>$a} @x; 
                my ($count,$half)=(0,0);
                for (my $j=0;$j<@x;$j++){
                	$count+=$x[$j];
                	if (($count>=$total/2)&&($half==0)){
                		print "N50: $x[$j]\n";
                		$half=$x[$j]
                	}elsif ($count>=$total*0.9){
                		print "N90: $x[$j]\n";
                		exit;
                	}
                }
                or run this command as before:
                Code:
                perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa
                Last edited by BENM; 10-07-2009, 01:40 AM.

                Comment


                • #9
                  Thanks BENM,
                  It is worked nice now ^^
                  I very thanks for your help.

                  Comment


                  • #10
                    Hi BENM,

                    Do you have used MIRA software before?
                    I facing some problem about how they calculate the N50 or N90 about their assembly output result

                    Originally posted by BENM View Post
                    hi edge,

                    I am sorry for a little mistake, you can type the below code into a perl script:
                    Code:
                    #/usr/bin/perl -w
                    use strict;
                    my ($len,$total)=(0,0);
                    my @x;
                    while(<>){
                    	if(/^[\>\@]/){
                    		if($len>0){
                    			$total+=$len;
                    			push @x,$len;
                    		}
                    		$len=0;
                    	}
                    	else{
                    		s/\s//g;
                    		$len+=length($_);
                    	}
                    }
                    if ($len>0){
                    	$total+=$len;
                    	push @x,$len;
                    }
                    @x=sort{$b<=>$a} @x; 
                    my ($count,$half)=(0,0);
                    for (my $j=0;$j<@x;$j++){
                    	$count+=$x[$j];
                    	if (($count>=$total/2)&&($half==0)){
                    		print "N50: $x[$j]\n";
                    		$half=$x[$j]
                    	}elsif ($count>=$total*0.9){
                    		print "N90: $x[$j]\n";
                    		exit;
                    	}
                    }
                    or run this command as before:
                    Code:
                    perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa

                    Comment


                    • #11
                      I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                      Comment


                      • #12
                        *_out.unpadded.fasta should be your firend when calculating contig sizes.

                        As BENM mentioned there is a lot of info in the info_assembly.txt

                        Sven

                        Comment


                        • #13
                          Hi,

                          Do you know what is the difference of usage of *_out.padded.fasta and *_out.unpadded.fasta?
                          As I know *_out.padded.fasta all are lower capital and *_out.unpadded.fasta all are upper capital. Both of them are the exactly same content.
                          According to *_info_assembly.txt, I try to calculate the figure inside like N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                          Unfortunately, the figure I find out can't match with the *_info_assembly.txt
                          Thus I feel quite confusing about the way they calculated N50,N90,etc at *_info_assembly.txt

                          Originally posted by BENM View Post
                          I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                          Comment


                          • #14
                            Hi sklages,
                            Thanks for your suggestion.
                            I face some problems when try to find out the N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                            The figure I find out can't match with the *_info_assembly.txt
                            Do you have any idea to calculate the N50,N90,minimum contig size and maximum contig size at *_info_assembly.txt ?

                            Comment


                            • #15
                              Hi BENM,
                              If I got a long list of contents:
                              scaff_123 20
                              scaff_223 60
                              scaff_122 1000
                              scaff_125 15
                              scaff_23 30
                              scaff_13 26
                              scaff_230 50
                              scaff_153 500
                              scaff_173 200

                              Based on the column two,
                              Do you have any idea how to calculate the N50 and N90 from this long list of contents?
                              I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
                              Thanks again for your help

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 09:45 AM
                              0 responses
                              201 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-03-2024, 08:54 AM
                              0 responses
                              212 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-02-2024, 03:00 PM
                              0 responses
                              194 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X