Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thsuk1
    Junior Member
    • Jun 2010
    • 7

    Why do we use mapping programs instead of blast for mapping to a reference?

    Hi guys,

    I am wondering why people use mapping programs such as bwa and maq for mapping to a reference? I think blast also search mapping positions with some mismatches and INDELs.
    Sorry for a foolish question, but what is reason?
  • NicoBxl
    not just another member
    • Aug 2010
    • 264

    #2
    it's faster than blast for small sequence (cpu and memory optimized)

    Comment

    • ffinkernagel
      Senior Member
      • Oct 2009
      • 110

      #3
      Also, they're more sensitive.
      Blast typically needs a number of 'high scoring segment pairs' to even start considering an alignment.

      Comment

      • Zigster
        Jeremy Leipzig
        • May 2009
        • 117

        #4
        Blast is just too slow - 100 million reads against a big genome would take days even on a large cluster.

        Blat is fine for 454 reads.
        --
        Jeremy Leipzig
        Bioinformatics Programmer
        --
        My blog
        Twitter

        Comment

        • malachig
          Senior Member
          • Aug 2010
          • 117

          #5
          blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

          As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...

          Comment

          • KevinLam
            Senior Member
            • Nov 2009
            • 204

            #6
            Originally posted by malachig View Post
            blastn for DNA alignments can be sensitive if the right parameters are chosen (small word size in particular). It can find an alignment of a 42-mer with a multiple mismatches AND gaps. For example, using blastn with a word size of 11 to align 42-mers to a database of all human transcripts finds alignments with up to 6 mismatches and 2 gaps. Some next-gen aligners have arbitrary limits on the number of mismatches in a single read. Furthermore some next-gen aligners will fail to find an alignment if a mismatch or gap (or more than one of these) occurs within the beginning of the read, as this portion is used as a seed. Another advantage of blast is that all alignments are returned. If a read has 1000 alignments, 1000 alignments are reported. Another advantage is the ability to perform sub-string alignments. If the first or last read base positions of an Illumina run have very high error rates (e.g. the first three bases of many reads in a run are garbage), you may need to trim the reads to get successful alignment with some next-gen aligners. These aligners tend to be focused on aligning the entire read length. blast will find an alignment and report what position within the read that the alignment start and ends. Another advantage of BLAST is a more sensible treatment of N's. Some of the next-gen. aligners store bases in 2-bit format. Meaning they can only internally represent A,T,C,G. The solution is to randomly assign N's to one of the other bases, a solution that some may find imperfect.

            As the other posts have indicated. All of these apparent advantages are trumped by the computational issue. BLAST is simply too slow. Speed is the main driving force behind the recent proliferation of aligners. And many of the advantages of BLAST suggested above are gradually being addressed by next-gen aligners...
            Good summary!
            Might I add that some of the limitations of short read mappers can also be addressed post mapping like using GATK's Local realigner
            http://kevin-gattaca.blogspot.com/

            Comment

            • lh3
              Senior Member
              • Feb 2008
              • 686

              #7
              Blast has other problems for short reads in addition to speed. Let's take 32bp reads as a little extreme example (32bp reads are rarely produced nowadays). By default, blast finds 11-mer exact hits as seeds. If two mismatches happen to occur at the 11th and the 22nd position, blast will not be able to find the hit. It cannot achieve the full sensitivity by eland/maq/bwa/soap2 (by default, bowtie does not guarantee full sensitivity). Although blast can find 3,4,5-mismatch hits by chance (again not fully sensitive), these hits are more likely to be artifacts especially when 2-mismatch hits are not guaranteed to be found. Slightly modified eland can also find a fraction of 3-mismatch hits.

              Another problem with blast lies right in its local alignment. Suppose a true mutation occurs at the 4th bp of a read. Blast will trim off the first 4bp in alignment (by default, match=1 and mismatch=-3). Then you will see more reference bases mapped than alternate bases. This is reference bias. Although global-local alignment like eland has other problems (e.g. unalignable indels), it is less affected by this bias.

              The two problems will be greatly alleviated by longer reads. For 100bp reads, I would guess the problems above are minor, but for 32bp reads, those short read aligners are better in almost all ways (faster, more sensitive and less bias). As to N, capable aligners (e.g. novoalign) do not have any problem with that. They may take the advantages of ambiguous base like R. I do not know if blast will do.

              If we build index for the genome, the very inefficiency of blast comes from the fact that it loads only ONE read into memory, scans through the whole genome and then output. Most of scan is a purely waste of time. A better way to use blast is to concatenate multiple short sequences into one. Speed can be dramatically improved, although still much slower than modern aligners. I think the blast group have already noticed this trick in blast+.
              Last edited by lh3; 08-27-2010, 09:03 AM.

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 06:09 AM
              0 responses
              7 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              33 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              43 views
              0 reactions
              Last Post SEQadmin2  
              Working...