Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mmmm
    Senior Member
    • Jul 2013
    • 131

    n50 or nodes

    in denovo assembly- do we go for the highest n50 or the smallest no. of genertaed nodes?

    also, velvetoptimizer might be good in optimizing the parameters-
    I never used it and can you advise me how to call the velvetoptimizer perl script which is present within contrib directory- what is the command please?
  • mmmm
    Senior Member
    • Jul 2013
    • 131

    #2
    also, is there a limit for the kmer (for 250 reads)- is it ok to try up to 200 or above??

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #3
      The L50 (length of the contig at which 50% of the assembly is in contigs at least that long) is important. The total number of contigs or nodes is not.

      As for kmer length, just use whatever length gives the best continuity; the longest you can use depends on read length, read quality, sequencing depth. I think Velvet needs to be compiled with an indicator of maximum kmer length, though, so if you try ~200 and it fails to run you may need to recompile. Also, it's generally good to avoid using even kmer lengths due to palindromes.

      Comment

      • Geneie
        Junior Member
        • Apr 2014
        • 4

        #4
        A good paper on the subject of K-mer optimisation is:

        Zerbino, D, R. ( 2010). Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics. 11.

        This will inform you of the best practises when assessing your de novo assembly. Below is the information required for the velvet optimiser

        Home page:



        Manual:



        The manual will provide you with the command needed and parameters.

        Comment

        • mmmm
          Senior Member
          • Jul 2013
          • 131

          #5
          for 250 read- is there is a limit for kmer value (as in the paper, it is mentioned that the best kmer should be between 21bp and average read length-10) so for 250 read - kmer should be between 21 and 100??

          Comment

          • Geneie
            Junior Member
            • Apr 2014
            • 4

            #6
            K-mer length is an important parameter, the higher your k-mer length the less likely you are to see this sequence by chance, however as k-mer length is increased towards the read length coverage will drop and more gaps will appear. It is kind of a trade-off.

            Maybe start with a K-mer of half your read length.

            125.

            Then do an assembly using k-mer lengths of 95, 115, 135, 145. Have a look and see what happens, this should help you to understand what is going on under the hood when you change these parameters (provided you have the computational availability).

            No. of nodes and N50 kind of one hand in hand, if you have less contigs you are likely to have a higher N50. So pick the optimum k-mer length that assembles your genome into the smallest number of contigs and a good N50. Hope this helps.

            Comment

            • mmmm
              Senior Member
              • Jul 2013
              • 131

              #7
              yes-this very useful- I will give it a go- thank you

              Comment

              • Geneie
                Junior Member
                • Apr 2014
                • 4

                #8
                Good luck. I would definitely encourage reading as much literature as you can on the subject of genome assembly there are many reviews out there. This advice is in no way comprehensive and to trust your assembly you will have to look at the coverage across the contigs to ensure there is no spikes. as well as maybe a comparison (mummer) to a closely related genome?

                Comment

                • mmmm
                  Senior Member
                  • Jul 2013
                  • 131

                  #9
                  thanks again- I would like to try velvetoptimizer script - but how to install bioperl (I have perl already)

                  Comment

                  • Geneie
                    Junior Member
                    • Apr 2014
                    • 4

                    #10

                    Comment

                    • mmmm
                      Senior Member
                      • Jul 2013
                      • 131

                      #11
                      now how can I be sure that the assembly I got is the best?- I tried different kmer then picked the one with the highest N50 and adjusted exp_cov then cov_cutoff then ins_length until I get the highest N50-
                      how can we be sure that this is the best assembly?- should I map the raw reads to the contigs file??

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        Originally posted by mmmm View Post
                        should I map the raw reads to the contigs file??
                        Yes. A better assembly will have a higher mapping rate, higher pairing rate, lower ambiguous mapping rate, and lower error rate.

                        Comment

                        • mmmm
                          Senior Member
                          • Jul 2013
                          • 131

                          #13
                          after mapping raw reads to the denovo assembled contigs- "using 2 different kmer- 27 and 85- from the below statistics which assembly is better (is it when kmer is 85- as annotation of both of the contigs show slightly different results)?- and I am not sure which one I should depend on?

                          Kmer (85)_remapping statistics:
                          768224 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          766555 + 0 mapped (99.78%:-nan%)
                          768224 + 0 paired in sequencing
                          384185 + 0 read1
                          384039 + 0 read2
                          760726 + 0 properly paired (99.02%:-nan%)
                          765163 + 0 with itself and mate mapped
                          1392 + 0 singletons (0.18%:-nan%)
                          3816 + 0 with mate mapped to a different chr
                          3576 + 0 with mate mapped to a different chr (mapQ>=5)

                          Kmer (27): mapping statistics:
                          773803 + 0 in total (QC-passed reads + QC-failed reads)
                          0 + 0 duplicates
                          772153 + 0 mapped (99.79%:-nan%)
                          773803 + 0 paired in sequencing
                          387022 + 0 read1
                          386781 + 0 read2
                          763731 + 0 properly paired (98.70%:-nan%)
                          770783 + 0 with itself and mate mapped
                          1370 + 0 singletons (0.18%:-nan%)
                          8196 + 0 with mate mapped to a different chr
                          7755 + 0 with mate mapped to a different chr (mapQ>=5)

                          Comment

                          • mmmm
                            Senior Member
                            • Jul 2013
                            • 131

                            #14
                            why total reads are different- although the same fastq files are used (but only different kmers)- I think it does not make sense??

                            Comment

                            • mmmm
                              Senior Member
                              • Jul 2013
                              • 131

                              #15
                              any advice please in this regard??- thanks

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...