Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Using Mosaik to assemble bacterial genome 454 sequencing

  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Mosaik to assemble bacterial genome 454 sequencing

    Hi all,

    I decided to check out the alignment package Mosaik to create an assembly of a bacterial genome that we are working on. Usually we just use Newbler to create de novo assemblies (and in fact we already have). We've sequenced 12 strains of the same species, using 454 titanium (not paired end). We then, after assembly, closed two of the genomes on the bench with PCR. I'd like to reduce the number of contigs in the other strains by using the closed genomes as reference sequences. Well, also I'd like to get the assemblies into SAM format, since Newbler doesn't support that as output yet.

    Mosaik is the first one I've been looking at, but I'm having an issue. I create the reference using one of the closed genomes (fasta file consisting of a single contig, no quality information) with this command:
    ./MosaikBuild -fr B475.fasta -oa B475.dat

    Then I create the input file for the sequence fragments from one of our runs (leading sequence i.e. MIDs etc stripped):
    ./MosaikBuild -fr B476.fasta -st 454 -out B476.dat -fq B476.qual

    Both of the above commands appear to work fine, however using the command:
    ./MosaikAligner -in B476.dat -out B475_B476_aligned.dat -ia B475.dat

    Nets this problem (end of output):
    Alignment statistics (mates):
    # failed hash: 1774 ( 35.9 %)
    # filtered out: 3169 ( 64.1 %)
    total: 4943
    total aligned: 0 ( 0.0 %)

    MosaikAligner CPU time: 39.200 s, wall time: 40.548 s

    If I change some of the stats to be more forgiving, i.e. add the flags:
    -hs 12 -mm 10

    None of the sequences "failed hash", but they are still all filtered out. Am I doing something obviously wrong? The Alignment statistics (mates) title worries me, since this isn't mated pair reads, just single ends. Ideas?


  • #2
    I have the same problem here, even if using illumina paired-end reads. Someone knows how to solve this issue?


    • #3
      You might also look at various assemblers designed specifically for this, such as MIRA.