Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • reg the validation of velvet assembly

    Hi i did velvet denovo assembly for a species which does not have its previous genome information , so how can I validate my assembly?

  • #2
    That's the 64 million dollar question.

    You'll have to talk to the biologists in your collaboration to see what is know that you can check (e.g. any ESTs or other previously sequenced bits like important genes, experimentally charactered genome size, GC percentage), and what related organisms you might be able to compare it to.

    Specific to velvet, I'm sure there is plenty of good advice in the documentation and mailing list archive about common pitfalls etc.

    Comment


    • #3
      I agree with Peter, this is the question we'd all love to be able to answer with confidence!

      This is a question of having multiple pieces of evidence to give you a confidence level as to your assembly. With current technology it is still impossible to "prove" an assembly is correct, but you can get pretty damn close.

      Optical mapping is a complementary technology which might be helpful for indepedent verification of contig order (particularly large contigs >100kb).

      Sequencing with another technology, particularly 454 might give some clues as to the extent of misassemblies. Paired-end 454 data will be even more helpful.

      You could do de novo assembly with other assemblers and see if they agree, but this is probably weak/ circumstantial evidence.

      Another method of verifying an assembly is to design primers to amplify the entire genome in overlapping segments, say 10kb and check them on a gel. This of course relies on you having a finished genome sequence to check with.

      You might find an easier question to answer is "what level of assembly accuracy will permit me to answer my scientific question?"

      Comment


      • #4
        You [the OP] might be oversimplifying a tad.

        Any assembly will be composed of:
        • correct contigs
        • fragmented contigs
        • chimeric contigs
        • spurious contigs

        and may suffer from:
        • missing contigs


        I would consider chimeras and spurious contigs to be distinguished by length - spurious contigs are an artifact of the debruijn method and are very short. I don't think chimeras are very common in Velvet compared to other assemblers - any ambiguity normally results in fragments.

        Velvet assemblies performed under high stringency (high kmer, high cvCut) conditions will minimize chimeric, fragmented and spurious contigs at the expense of more missing contigs.

        To validate a de-novo short read assembly, especially a transcriptome which by its very nature will never form long contigs, you need to decide whether you are willing to accept some bad with the good or insist on just the good and get less of it. This is a classic signal-to-noise problem.

        One way to judge an assembly is to run Velvet under varying parameters and see if the results converge. If you get wildly different results you can examine which contigs are spliced or fragmented under different settings and make your own judgments from there.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment


        • #5
          Good answer Zigster!

          I'd add the final possibility of "correct" contigs containing consensus errors due to transposed nucleotides in repeats which have been resolved using paired-end information, as discussed in my blog post at http://pathogenomics.bham.ac.uk/blog...nome-assembly/

          Comment


          • #6
            hi zingster and nicklomen .. thanx for your replies ..it was very use full .. My idea to validate is, if we have the sanger sequences of the species what we are assembling then we can do a blast against the assembled contigs of solexa and then we can take the assembly which has the maximum sanger sequences covered in the blast (for eg more than 90 percent) as a valied assembly ..what do you think?

            Comment


            • #7
              sounds good i just wish there was some kind of standardized report that would organize this for you
              --
              Jeremy Leipzig
              Bioinformatics Programmer
              --
              My blog
              Twitter

              Comment


              • #8
                CLC's whitepaper summaries many of the above points concisely:

                Section 4 "Measuring quality"
                --
                Senthil Palanisami

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X