Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA-SW / 454 / software options

    Hi,

    I need to map ~ 400 000 454 reads onto a reference genome. The mean length is 310 bp. The reads contain repetitions as well as the genome. The goal is to obtain variations (SNP and Indels).

    I would like to tune the bwasw algorithm using some of the options proposed by the software:

    bwasw bwa bwasw [-a matchScore] [-b mmPen] [-q gapOpenPen] [-r gapExtPen] [-t nThreads] [-w bandWidth] [-T thres] [-s hspIntv] [-z zBest] [-N nHspRev] [-c thresCoef] <in.db.fasta> <in.fq>

    OPTIONS:
    -a INT Score of a match [1]
    -b INT Mismatch penalty [3]
    -q INT Gap open penalty [5]
    -r INT Gap extension penalty. The penalty for a contiguous gap of size k is q+k*r. [2]
    -t INT Number of threads in the multi-threading mode [1]
    -w INT Band width in the banded alignment [33]
    -T INT Minimum score threshold divided by a [37]
    -c FLOAT Coefficient for threshold adjustment according to query length. Given an l-long query, the threshold for a hit to be retained is a*max{T,c*log(l)}. [5.5]
    -z INT Z-best heuristics. Higher -z increases accuracy at the cost of speed. [1]
    -s INT Maximum SA interval size for initiating a seed. Higher -s increases accuracy at the cost of speed. [3]
    -N INT Minimum number of seeds supporting the resultant alignment to skip reverse alignment. [5]

    But I do not know what options to use and what values to put in the options.

    Does anybody have experience with a similar project? In that case, what parameters did you apply?

    What would be a minimum score to apply?

    Thanks in advance.

    Best regards,

    Sabrina.

  • #2
    use the default

    Comment


    • #3
      Originally posted by lh3 View Post
      use the default
      And with that you win the price for most concise and to-the-point answer of this month...

      Comment


      • #4
        Actually I should have said more (so I cannot claim that price). BWA, especially BWA-SW, is designed in such a way that the default works well with the majority of typical input. BWA-SW automatically adjusts its mapping strategy based on the input. You can see from its paper that for simulated reads ranging from 100 to 10,000bp and error rate from 2% to 10%, only the default is used.

        Comment


        • #5
          Originally posted by lh3 View Post
          Actually I should have said more (so I cannot claim that price). BWA, especially BWA-SW, is designed in such a way that the default works well with the majority of typical input. BWA-SW automatically adjusts its mapping strategy based on the input. You can see from its paper that for simulated reads ranging from 100 to 10,000bp and error rate from 2% to 10%, only the default is used.
          I can confirm that it works quite well with the default on various read lengths. Great job Heng!

          Comment


          • #6
            BWA-SW / 454 / multiple hits

            Does anybody know how to get BWA-SW report multiple hits? It is not listed among the options offered by bwa bwasw. Thanks a lot.

            Comment


            • #7
              Sorry. BWA-SW cannot output multiple hits. Partly this is why it is fast.

              Comment


              • #8
                by multiple hits, do we mean equally good multiple mappings of a read, or best, second-best and so on multiple hits of a read.. I thought BWA can do the former with XA tag!
                --
                bioinfosm

                Comment


                • #9
                  Originally posted by lh3 View Post
                  Sorry. BWA-SW cannot output multiple hits. Partly this is why it is fast.
                  It seems that bwaswdoes output multi-hits,or I misunderstand what you said.
                  I'm mapping 454 reads with BWA-SW and find many multi-hits alignments in SAM file.Here is an example:
                  Code:
                  F1GKWGA02HLEN2	16	chr6	170249370	159	230S31M1I5M4D115M20S	*	0	0	cgtacggaacgaacttactacgactacctaccacacacncaccacacacncncacacacacacacacactccacacgacacacacacncncacacacacacacacacncncacacacacacacacactcacacacgacacacacacncncacacacacacacacncgntcgacagncncacagnctcncanacacacanacgtctcactangcacacagctcncgacctagnaccacacagctcacgactgcaccacacagcctcacagnacacacagctcncaactgnaccACACAGCTCACGACAGCACCACACAGCTCACGACAGCACCACACAACTCACAACTGCACCACACAAGCTCACAACAGCGCCACACAGCTCGAGGATCCAGAATTCTCCAG	,,,,0,,,,,,,,,,00030,,,,0,,,3,,,0000,,!,0000059--!-!..96657777997---,-----1---15111993-!-!115------5555=8--!.!..<<<<<<<<<==.-------222---2222295--!-!//66<988899==3-!..!.-.-.7:!8!88=3.!-.-!3-!-28883-!.-.--.-2...!.2589:::87-!,.---733!2115857332:2--///47231559==::22433744!//666777676!666944!444==??44498555ABBBDACCFFFFFFFFFFFFFFFFF===FFFCCCFFFFF:::DFFFFHIIHIBBBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFF	AS:i:107	XS:i:40	XF:i:3	XE:i:4	XN:i:0
                  F1GKWGA02HLEN2	16	chr1	202692627	1	77S50M275S	*	0	0	cgtacggaacgaacttactacgactacctaccacacacncaccacacacncncacacacacacacacactccacacgacacacacacncncacacacacacacacacncncacacacacacacacactcacacacgacacacacacncncacacacacacacacncgntcgacagncncacagnctcncanacacacanacgtctcactangcacacagctcncgacctagnaccacacagctcacgactgcaccacacagcctcacagnacacacagctcncaactgnaccACACAGCTCACGACAGCACCACACAGCTCACGACAGCACCACACAACTCACAACTGCACCACACAAGCTCACAACAGCGCCACACAGCTCGAGGATCCAGAATTCTCCAG	,,,,0,,,,,,,,,,00030,,,,0,,,3,,,0000,,!,0000059--!-!..96657777997---,-----1---15111993-!-!115------5555=8--!.!..<<<<<<<<<==.-------222---2222295--!-!//66<988899==3-!..!.-.-.7:!8!88=3.!-.-!3-!-28883-!.-.--.-2...!.2589:::87-!,.---733!2115857332:2--///47231559==::22433744!//666777676!666944!444==??44498555ABBBDACCFFFFFFFFFFFFFFFFF===FFFCCCFFFFF:::DFFFFHIIHIBBBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFF	AS:i:42	XS:i:41	XF:i:1	XE:i:1	XN:i:0
                  As I think if a read has multi-location,the Mapping Quality shall be assigned 0.Is that right?

                  Comment


                  • #10
                    These are chimeric hits, each hit corresponding to a different part of the read.

                    Comment


                    • #11
                      What does term XF:i:N mean?

                      As BWA manual shows,
                      XF Support from forward/reverse alignment
                      and from mapping result I can find XF:i:0 to XF:i:3 .
                      Does XF:i:0 means forward/reverse ?And how about others?
                      Thank you!

                      Comment


                      • #12
                        Originally posted by holywoool View Post
                        As BWA manual shows,
                        XF Support from forward/reverse alignment
                        and from mapping result I can find XF:i:0 to XF:i:3 .
                        Does XF:i:0 means forward/reverse ?And how about others?
                        Thank you!
                        The paper describes that the reverse-reverse alignment is not always performed:
                        "In implementation, we do not apply the reverse–reverse alignment if the best alignment contains, by default, 5 or more seeds."

                        Please read the paper carefully, since there are many gems.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-25-2024, 11:49 AM
                        0 responses
                        19 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-24-2024, 08:47 AM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        62 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X