Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • prs321
    Member
    • Jun 2013
    • 96

    How do I go about evaluating my assembly?

    I have made an assembly.

    Here are my tasks. I do not know where do go from step 1 and I do not even know how to attempt step 2 and step 3.

    1. Align assembly to reference genome.

    Grab coordinates of the set of sequences that aligns to the reference genome and grab coordinates of the set of sequences that DO NOT align to the reference genome.

    I used to MUMmer for this and got a .coords file



    2. Take sequences that did not align and map them against a given plasmids database. Differentiate between nuclear genome and plasmids. Then take what's left over and map against a virulent gene database to see what the virulent genes are.

    I was told to use BLAST for this but I have no idea what to do.

    If there are still unaligned sequences left over, then I have to use a new reference to align remaining a sequences.



    3. Gene annotation, obtain gene locations



    4. SNP calling


    Edit:

    I have Step 4 down.
  • ctseto
    Member
    • Oct 2013
    • 44

    #2
    BLAST would be online (through NCBI) for single fastas, or downloading and compiling blast on your end along with the database, and running a search of your contigs against the database.

    Curious if something like metaphlan, phytophlan or Kraken against your assemblies (and your raw reads, just to check) would tell you what you have. Of course, "clade-specific markers" and Kmer search is prone to some degree of noise.

    Comment

    • yueluo
      Member
      • Aug 2013
      • 82

      #3
      How did you make your assembly(de-novo or reference-guilded) ?
      Is this a meta project?

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        If you have a reference (and it appears that you do), I recommend QUAST; it's quite effective!


        Even if you don't have a reference, it still tells you things like the number of predicted genes of size>=X; better assemblies tend to have more longer genes and fewer short genes.

        Also, you could try ALE (Assembly Likelihood Evaluator), which does not need a reference and estimates the correctness of an assembly from a sam file, based on statistics of variations, coverage, and insert size:


        ALE is not designed to evaluate the quality of a single assembly, but rather, the relative quality of multiple assemblies from the same set of reads. But that's still quite useful when you have several assemblies and need to pick the best one.

        EST capture is also a good method when you have EST data.

        You can also capture metrics like the percent of source reads that align to the assembly, and the rate of substitutions/insertions/deletions in those reads. The higher the mapping rate, and the lower the error count, the better the assembly is. For this you should use a normal aligner, not mummer.
        Last edited by Brian Bushnell; 01-23-2014, 10:08 PM.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        15 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        107 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        125 views
        0 reactions
        Last Post SEQadmin2  
        Working...