Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kga1978
    Senior Member
    • Nov 2010
    • 100

    Suggestions for aligners?

    Hi all,

    We have been trying out various aligners over the last many months, but I wanted to get some input from the community on suggested aligners. The genomes we have are as follows:

    Genome:
    Viral (RNA-based)
    10kb genome
    High divergence (5-10% normal, sometimes 15%)

    Considerations:
    We don't care about memory (small genome)
    Aligner must be able to deal with divergence and gaps (few)
    We don't care about (super) speed
    Preferably scalable and 'pipeable' (samtools, picard, GATK, etc.).

    Goals:
    Consensus sequence for 'standard' phylo studies
    High-coverage (>150x) for intra-host SNP detection

    We mostly use illumina 50bp SE sequencing, but also have 100bp PE and ~500bp 454 sequencing. We have been testing multiple different aligners and have focused on the following:

    BFAST
    BWA
    SSAHA2
    NovoAlign
    Mosaik (mostly v1, but v2 just came out)
    Stampy

    So far we have had most success with Mosaik and NovoAlign in terms of specificity and sensitivity on the Illumina platform. For 454 we have only used Mosaik for now (which works well - but homopolymer and CAFIE errors have to be cleaned up manually). For these two tools we generally use a hash-size of 6 and a divergence of 0.1. For Mosaik we also specify an 'act' threshold of 10. We have tweaked several other parameters, but have found them to influence the alignments very minimally.

    I was wondering if anybody would have further insights or suggestions? We are currently scaling up production so any suggestions and comments would be most welcome.
  • kga1978
    Senior Member
    • Nov 2010
    • 100

    #2
    Great question! I do wonder if anybody would have some input...?

    Comment

    • mchaisso
      Member
      • Apr 2008
      • 84

      #3
      Try bwa-sw, it is shown to work well for 454 reads and divergent genomes.

      Comment

      • gsgs
        Senior Member
        • Oct 2009
        • 139

        #4
        I've been using MAFFT for influenza


        but recently figured out that it is overkill for most
        practical problems, when you have a common ancestor-sequence,
        a referrence that you can align to.

        I've built a list of ~200 index-sequences to which I can quickly
        compare new flu- lists (e.g. genbank updates) without alignment,
        to which group they belong.
        Then I group the sequences accordingly and do index-base-alignment
        for each group separately.

        Or I align first without gaps (easy+fast) and then filter out (discard)
        sequences that are distant, didn't align well.
        They may have insertions,deletions or sequencing errors.
        These ~1% of sequences are then omitted or aligned with
        MAFFT (slower) in a 2nd step.

        I did this for e.g. for all avian PB2 or such, 10000 sequences
        of length 2280, max.~400 mutations
        Last edited by gsgs; 12-15-2012, 10:05 PM.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        14 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        48 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        107 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        125 views
        0 reactions
        Last Post SEQadmin2  
        Working...