No announcement yet.

Short read alignments between species

  • Filter
  • Time
  • Show
Clear All
new posts

  • Short read alignments between species


    I have some Illumina paired end genomic reads from a plant species without a genome sequence, so I wanted to align them to a related genome. I tried using bowtie ( --seedmms 3 --maqerr 250) but I am getting very few alignments (<5% paired ends, and ~10% for each end separately). I tried to use the -v option to increase the mismatches but the limit seems to be 3 (same as the seed mismatches acc to the manual). I guess my genetic distance is too great...

    Do people have a preferred aligner when aligning to a reference from another species, or would I be better off assembling the reads de novo and aligning them afterwards?

    Thanks, SD

  • #2
    If you want to continue using bowtie you should increase the allowed error to something much higher.

    I have also used mosaik for this kind of thing as there you can allow many more MM or specify an allowed %. One issue you will face is that to get enough reads mapping you will likely have to increase the allowed MM to such a degree that mapping becomes so ambiguous that the whole thing can be of questionable value.

    I think would go with your second option of a de novo assembly and aligning the assembled contigs. However, that's a whole other world of pain and your success will highly depend on how much Illumina data you have and what combinations of library insert sizes and very much on the polymorphism rate of your species. Here are some very brief comments on some of the available assemblers:

    MIRA: Probably not worth trying unless your genome is very small because it has such high memory requirements

    SOAPdenovo: Many people report OK results but you will likely get a very large number of very short contigs. The documentation is terrible and the maillist is far from the best because the developers don't seem to read it.

    ABySS: Great mailing list and gives about the best result. The developers are really helpful and users on the mailing list will help will anything from newbie to advanced issues.

    Velvet: OK for small genomes but has really high RAM requirements otherwise (but not as bad as MIRA).

    Celerea(Caborg): I can't say because I haven't reied it yet but they recently added full support for Illumina data.

    clc: commercial so maybe not an option. Also a a rather mysterious black box but it is incredibly fast and has amazingly low RAM requirements (but a black box so who knows how they manage this).


    • #3
      LASTZ has a specific module to perform exomapping, called FEAST..;


      Either use BWA (which accept more SNPs than bowtie and can manage indels) with relaxed states
      Francois Sabot, PhD

      Be realistic. Demand the Impossible.


      • #4
        Since you already tried mapping and got few alignments, I don't think assembly will produce a better result. However, as francois.sabot said, you can try relaxing the mapping criteria (higher mismatches, larger gaps), and imo you better off using the hash-based mapping tool (maq,rmap,etc.) for that purpose.