Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best aligner for 20% mismatched?

    I have 101 base reads and expect up to 20 mismatches to reference. My reads are not pairs. I have tried bwa bwasw -a 1 -b 1 -T 60 but it only aligns 1.5% of the reads. And those have only a couple mismatches. I know from other tests ~ 30% should be aligned with 20 mismatches. Is this just something bwa is not designed for? What would be a better aligner? Or am I not using the right settings?

  • #2
    For 20 mismatches per reads I would prefer something that is not based on Burrows-Wheeler, especially if you are expecting indels. Even for 101 bp long reads this amount of mismatches is pretty high.

    Comment


    • #3
      Maybe you try bfast or ssaha. They are not very fast but should perform ways better on your data. Bfast seems to be faster (from what I heard) but I think ssaha is a good startingpoint to get a first estimate of the alignment rate, because its very easy to use. Maybe you just try as subset at the beginning (100-1000kreads), because it's really not that fast.

      Comment


      • #4
        Some time ago I used mosaik for a very similar task.

        Comment


        • #5
          Originally posted by moritzhess View Post
          Maybe you try bfast or ssaha. They are not very fast but should perform ways better on your data. Bfast seems to be faster (from what I heard) but I think ssaha is a good startingpoint to get a first estimate of the alignment rate, because its very easy to use. Maybe you just try as subset at the beginning (100-1000kreads), because it's really not that fast.
          Thanks, I will give those a try. Absolutely, I will work with a small subset until I find the tool and settings I need.

          Comment


          • #6
            Originally posted by szilva View Post
            For 20 mismatches per reads I would prefer something that is not based on Burrows-Wheeler, especially if you are expecting indels. Even for 101 bp long reads this amount of mismatches is pretty high.
            Thanks. I have estimates that divergence between reads and reference averages ~ 15%, so I expect to find unique alignments out to 20% or so. My gut tells me BW- or suffix array based aligners wouldn't do well with this, but I wasn't able to confirm that from BWA docs.

            Comment


            • #7
              Hi,

              you could give RazerS (part of Seqan) or Mosaik a try. RazerS for example let's you specify an identity threshold for the whole read length.

              Andreas

              Comment


              • #8
                Feederbing, you can try novoalign. It will allow up to 10 high quality mismatches. Also have a look at some of the trimming options that could improve your mapping rate.
                Do you have a good idea of the quality profile to see where quality starts dropping off ? FastqC is a good tool for examining this.

                Comment


                • #9
                  zee, just to be clear, the reason I expect so many mismatches is because of evolution, not sequencing quality.

                  Comment


                  • #10
                    Originally posted by feederbing View Post
                    zee, just to be clear, the reason I expect so many mismatches is because of evolution, not sequencing quality.
                    My 2p... For this type of alignment, i.e. sequence diversity due to evolution, it might be that good old blast is the best option, provided it won't take forever to do the job (and anyway with ~20 mismatches per read I guess pretty much all the aligners out there will be quite slow)... You could try to align a random sample of reads and see how it goes.

                    Good luck
                    Dario

                    Comment


                    • #11
                      The alignments may not be all that accurate, even if you could find them in a reasonable amount of time.

                      Comment


                      • #12
                        I think you should go with BFAST with the super small mask like (11111111) to find candidate local alignments. Then do the match step.

                        The good thing with bfast is that it will match the read all the way without any kind of clipping.

                        Comment


                        • #13
                          101bp, 20% mismatches. I believe you will have lots of misalignments if you are aligning against human (fine if against a small genome). If you want to do that anyway, I would vote ssaha2.

                          BTW, to map high error rate with bwa-sw, you should decrease "-T" and increase "-z" to 10 or 100. Your setting may even make bwa-sw less sensitivity than the default setting. Nonetheless, even for -z100, probably bwa-sw would not work well for 100bp+20% mismatches.

                          For mammalian genomes, another option is BWT-SW. If you have short reference genome, you may try cross_match, fasta and SSE2-based smith-waterman.

                          If you have high coverage, you should assemble the reads first and then do alignment. That will be much better.

                          Comment


                          • #14
                            It might be worth giving vmatch a try. In my opinion, vmatch is a very versatile tool, and it has an '-identity' option which allows you to specify a minimum identity between matches (though I've never used it for this type of application).

                            Comment


                            • #15
                              Maybe Heng can correct me, but isn't bwasw for longer reads and should not be used for 100bp reads, especially with that expected error rate? (At least that's what I remember from his paper.)

                              As for increasing the -z value, I barely see improvements for values above 10 and the run time for higher values is not really worth it. It sometimes helps to rerun the program with the remaining reads to get more aligned.

                              Anybody has experience with SOAP2?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM
                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-24-2024, 06:58 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-23-2024, 08:43 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X