Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • denovo assembly evaluation

    Dear all,

    I made a denovo assembly of a paired-end ddRAD dataset without any reference genome nor previous genomic knowledge. As you know, a good practice is to make several assembly using different parameters (such as -n -m -M in Stacks), to find the settings that best fits your data and needs. I would kindly ask you if you have any suggestions on how to qualify the diverse assemblies without any reference genome and generations (usually used to estimate the error), if there is any software (quast tool?) you would reccomend.

    Thank you in advance,

    Regards

    Francesca

  • #2
    I do recommend Quast; it's easy to use and provides some useful statistics on assembly continuity, but without a reference it is not really a complete solution, and is mainly useful for continuity statistics (N50/L50).

    We typically use BBMap to evaluate the quality of assemblies that lack a reference, as it provides a nice summary of error statistics. If you map the reads to each assembly, it will tell you:

    %of reads mapped (higher is better)
    %of reads properly paired (higher is better)
    %of reads that mapped ambiguously (lower is better)
    %of reads that matched the reference perfectly (higher is better)

    ...and also the overall error rate, and rates of each individual error type (substitutions, insertions, and deletions) on a per-base and per-read level. In each case, of course, lower is better. You can also use it to directly output per-contig coverage stats (with the covstats=file flag), which is sometimes useful for spotting collapsed repeats or contaminant contigs.

    Comment


    • #3
      Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.

      Comment


      • #4
        Originally posted by FrancescaRaffini View Post
        Thank you very much Brian Bushnell. I would kindly ask you if your package was already used with ddRAD.
        That's possible, but I don't know. I have never worked with ddRAD data, and I am unaware of it being used at JGI.

        Comment


        • #5
          I don't think Quast can be used for ddRAD data. What did you use to "assemble" your data - Stacks? I would evaluate the data based on number of polymorphic (where >2 samples are homozygote for each allele) "stacks"/loci covered by at least 10x coverage in 2/3 of your samples. With a few thousand such loci, you have a pretty nice dataset. If not, there is a long list of things than can go wrong (especially in the wet lab)...

          @ BB: Typical ddRAD protocols produce quite small fragments (100-250 bp) representing small islands with strictly defined borders (restriction enzyme digested), so you rarely get any contigs larger than possible read pair overlap when pushing the data through a de novo assembler.

          Comment


          • #6
            Thanks for the explanation. It does not sound like standard methods of assembly and assembly evaluation are relevant here.

            Comment


            • #7
              Thank you to both for yuor suggestions.

              Sarvidsson, yes, I am using Stacks. Since I will perform diverse denovo assembly using diverse parameter, I need some qualitative measures (e.g. error, I can't measure it using most known methods since I don't have a refernce genome, replicates or generations) to estimate how good is the assembly with that paramenters. Your method looks interesting, but I am afraid it is not enough alone to evaluate the assembly quality.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X