Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • maasha
    replied
    I should say Biopieces is pretty nifty for this task:




    You simply do:

    Code:
    read_fasta -i contigs.fna |
    grab -e "SEQ_LEN>=200" |
    analyze_assembly -x
    and get:

    Code:
    N50: 9082
    MAX: 52038
    MIN: 200
    MEAN: 4170
    TOTAL: 3057214
    COUNT: 733
    ---

    Leave a comment:


  • bastianwur
    replied
    There are tools like CGAL and RSEM-EVAL, which calculate the likelyhood of the reads belonging to the actual assembly. That might help when you're having more than 1.

    Since sometimes the size of the assembly can vary too, I also like to have an estimate of the genome size beforehand, tools to use are kmerspectrumanalyzer or kmergenie.

    And depending on how fragmented you can/want to get with the data: A most likely correct genome (not necessarily contigous) will be to take the consensus from all your assemblies, and break the contigs if they're not agreeing.

    <s>If you arrive at a chromosome, and you have a prokaryote, then you need to take a look at the GC skew of the chromosome to detect obvious misassemblies.</s> scratch that, didn't see the transcriptome part.
    EDIT: Eh, no strike through tags in this forum?

    Leave a comment:


  • nepossiver
    replied
    Originally posted by Brian Bushnell View Post
    For more advanced statistics, particularly if you have a reference and are evaluating different assembly methodologies, I recommend Quast because it also does alignment to the reference to calculate the number of misassemblies.
    Their (excellent, I love SPAdes and QUAST) group is developing rnaQUAST, to evaluate transcriptome assemblies. Version 0.1.1 (current version at the time of my message) has a bug, though, reference transcriptome file naming has to strictly follow:

    Code:
    name.extension
    I could not use a reference which had:

    Code:
    name.middle.extension
    Last edited by nepossiver; 05-13-2015, 08:06 AM. Reason: added rnaQUAST link.

    Leave a comment:


  • Brian Bushnell
    replied
    Old thread, but BBMap has a stats.sh program that will summarize basic assembly stats (N50, L50, distribution of contig sizes, GC%, etc); it's very fast even on assemblies with millions of contigs, and extremely easy to use:

    stats.sh contigs.fasta

    For more advanced statistics, particularly if you have a reference and are evaluating different assembly methodologies, I recommend Quast because it also does alignment to the reference to calculate the number of misassemblies. Also, even if you don't have a reference, it does neat things like gene prediction. Not sure how that feature would work on a transcriptome, though.
    Last edited by Brian Bushnell; 05-12-2015, 06:26 PM.

    Leave a comment:


  • student-t
    replied
    There're a few solutions to calculating metrics for an assembly.

    1. https://github.com/ajmazurie/velvet-stats
    2. Biopieces
    3.http://korflab.ucdavis.edu/datasets/...athon_stats.pl
    4. abyss-fac

    I don't recommend 1-3. The documentation is bad, I didn't have the time to go through the source code. Biopieces required a multi-stage workflow, which I think it's a very stupid idea.

    Use abyss-fac, don't waste your time. On a Mac, install it via "brew install abyss"

    Leave a comment:


  • nepossiver
    replied
    Originally posted by ssing View Post
    *incidence of chimeric transcripts
    hi ssing,

    how do you calculate chimeric transcripts? Do you have a reference genome? My problem is, I don't, and I don't know of a good way to find chimeric contigs in my assemblies.

    thanks

    Leave a comment:


  • ssing
    replied
    Hi LizBent,

    I have been working on the exact same problem and have come up with some metrics to estimate the quality of a transcriptome in the absence of a ref genome. Some stats that I have used are:
    *n50
    *percent annotated to my closest reference
    *percent of annotated proteins that have (what seem to be) premature stop codons
    *percent of reads used/percent of paired reads used
    *contiguity & completeness (see http://www.nature.com/nrg/journal/v1...l/nrg3068.html)
    *incidence of chimeric transcripts

    As for calculating simple metrics like n50, max contig size, etc, I use the command line program abyss-fac, which is available as part of the general ABySS package.

    Good luck!

    Leave a comment:


  • LizBent
    started a topic De novo transcriptome quality metrics?

    De novo transcriptome quality metrics?

    Hi everyone

    I'm going to be making several de novo transcriptome assemblies (using different software), and I wish to compare them. What metrics are best for this? I don't have a reference genome.

    Also, is there a software package for generating these metrics from output files? I've currently tried running Trinity, and I get a lot of output files, but none that seem to summarize the number of contigs, their length, etc. How can I calculate this from a fasta file of assembled contigs?

Latest Articles

Collapse

  • seqadmin
    Quality Control Essentials for Next-Generation Sequencing Workflows
    by seqadmin




    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

    Nucleic Acid Quality Control
    Preparing for NGS starts with isolating the...
    02-10-2025, 01:58 PM
  • seqadmin
    An Introduction to the Technologies Transforming Precision Medicine
    by seqadmin


    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
    01-27-2025, 07:46 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-05-2025, 10:34 AM
0 responses
104 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-03-2025, 09:07 AM
0 responses
83 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Working...
X