Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • denovo assembly evaluation

    Dear all,

    I made a denovo assembly of a paired-end ddRAD dataset without any reference genome nor previous genomic knowledge. As you know, a good practice is to make several assembly using different parameters (such as -n -m -M in Stacks), to find the settings that best fits your data and needs. I would kindly ask you if you have any suggestions on how to qualify the diverse assemblies without any reference genome and generations (usually used to estimate the error), if there is any software (quast tool?) you would reccomend.

    Thank you in advance,

    Regards

    Francesca

  • #2
    I do recommend Quast; it's easy to use and provides some useful statistics on assembly continuity, but without a reference it is not really a complete solution, and is mainly useful for continuity statistics (N50/L50).

    We typically use BBMap to evaluate the quality of assemblies that lack a reference, as it provides a nice summary of error statistics. If you map the reads to each assembly, it will tell you:

    %of reads mapped (higher is better)
    %of reads properly paired (higher is better)
    %of reads that mapped ambiguously (lower is better)
    %of reads that matched the reference perfectly (higher is better)

    ...and also the overall error rate, and rates of each individual error type (substitutions, insertions, and deletions) on a per-base and per-read level. In each case, of course, lower is better. You can also use it to directly output per-contig coverage stats (with the covstats=file flag), which is sometimes useful for spotting collapsed repeats or contaminant contigs.

    Comment


    • #3
      Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.

      Comment


      • #4
        Originally posted by FrancescaRaffini View Post
        Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.
        That's possible, but I don't know. I have never worked with ddRAD data, and I am unaware of it being used at JGI.

        Comment


        • #5
          I don't think Quast can be used for ddRAD data. What did you use to "assemble" your data - Stacks? I would evaluate the data based on number of polymorphic (where >2 samples are homozygote for each allele) "stacks"/loci covered by at least 10x coverage in 2/3 of your samples. With a few thousand such loci, you have a pretty nice dataset. If not, there is a long list of things than can go wrong (especially in the wet lab)...

          @ BB: Typical ddRAD protocols produce quite small fragments (100-250 bp) representing small islands with strictly defined borders (restriction enzyme digested), so you rarely get any contigs larger than possible read pair overlap when pushing the data through a de novo assembler.

          Comment


          • #6
            Thanks for the explanation. It does not sound like standard methods of assembly and assembly evaluation are relevant here.

            Comment


            • #7
              Thank you to both for yuor suggestions.

              Sarvidsson, yes, I am using Stacks. Since I will perform diverse denovo assembly using diverse parameter, I need some qualitative measures (e.g. error, I can't measure it using most known methods since I don't have a refernce genome, replicates or generations) to estimate how good is the assembly with that paramenters. Your method looks interesting, but I am afraid it is not enough alone to evaluate the assembly quality.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X