Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • biterbilen
    Junior Member
    • Jun 2009
    • 6

    BWA concise format output -edit distance wrong

    Hi Heng,

    We ran BWA on simulated data of length 40/80nc which has max 2 errors in the prefix 20nc sequence and a total of 3 errors with the following parameters.

    $HOME/bin64/bwa-0.5.4/bwa aln -i 0 -n $(($n+1)) -o $(($n+1)) -e $(($n+1)) -N -t 8 -l 20 -k 2

    First we couldn't get any hits with $n=2 although the initial number of errors are supposed to <=3.
    After running with $n=3 we got hits with edit distance >=4 when the alignment is done for the BWA output region, these turned out to be 3 edit distance hits. (See 2 examples below; I can provide more if necessary)

    Is is due to a bug in the software?


    >chr10|101603578|101603617|-|seqTGA_7580|39ins-30ins-39ins 5 5
    chr10 -101603576 4
    chr10 -101603575 4
    chr20 +52093242 4
    chr10 -101603577 4
    chr10 -101603578 4
    >chr10|102009379|102009418|+|seqTGT_3501|39ins-6ins-14ins 2 2
    chr10 +102009377 4
    chr10 +102009376 4


    chr10|101603578|101603617|-|seqTGA_7580|39ins-30ins-39ins (43 nc) 43..1 chr10 101603578..101603617
    chr10
    errors: 3 orientation: -
    ANTNCATCGTCANTCATCATCTGCATCATCATCATCATCATCA
    | | |||||||| ||||||||||||||||||||||||||||||
    A-T-CATCGTCA-TCATCATCTGCATCATCATCATCATCATCA

    chr10|102009379|102009418|+|seqTGT_3501|39ins-6ins-14ins (43 nc) 1..43 chr10 102009379..102009418
    chr10
    errors: 3 orientation: +
    GCTAAANCAAGTGTNGTACCGGGTTGTGGGAACGCAACGATNA
    |||||| ||||||| |||||||||||||||||||||||||| |
    GCTAAA-CAAGTGT-GTACCGGGTTGTGGGAACGCAACGAT-A


    Biter Bilen
    PhD Student
    Zavolan Lab
    Biozentrum
    Klingelbergstrasse 50
    4056 Basel Switzerland
  • lh3
    Senior Member
    • Feb 2008
    • 686

    #2
    Do you map to the human genome? Is it possible to give a small test set to me?

    Also note that bwa disallows gaps within 5bp towards either end of a read and the gap in these regions is likely to lead to more mismatches. Partly this strategy is good for speed and partly for accuracy as well.

    Comment

    • biterbilen
      Junior Member
      • Jun 2009
      • 6

      #3
      Hi,

      Yes, I map to the human genome.

      I set -i to 0; why should we expect the problem is because of the gaps at the terminals of the tags? I also wonder why the tool gives >5 edit distance matches since the maximum edit distance should be 5 when -n is set to 4 which are 4+1 edit distance hits for non repetitive tags.

      In the attachment is an archive for a fastq file, a samse formatted concise output format file by BWA, and the min edit distance alignment file for the given fastq file. I used precisely this command

      $HOME/bin64/bwa-0.5.4/bwa aln -i 0 -n 4 -o 4 -e 4 -N -t 8 -l 20 -k 2

      Biter Bilen
      PhD Student
      Zavolan Lab
      Biozentrum
      Klingelbergstrasse 50
      4056 Basel Switzerland
      Attached Files

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      28 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Working...