Seqanswers Leaderboard Ad

**jlmlj** · 02-05-2010, 12:55 PM

Hi all,

Although nobody’s replied my post yet, I like to share some testing results of using different parameters of BWA, maybe this could be helpful for somebody or somebody could help me with these inputs.

The purpose of my testing is to allow more mismatches to see if I could have more alignments (particularly alignments with repeats) in human reference genome. I modified parameters with 6 different combinations in BWA, surprised to me that I had very similar results: 49% unique alignments, ~4% multiple alignments, and about 47% reads failed to align.

The combination I used for tests are as below:
-M 1
-k 6
-k6 -l32 -m1
-n6 -l32 -m1
-l32 –k20 –m1 (for this test, I liked to go extreme on –k to see what happened, however, it turned out with nothing changed)

I took a look at the unaligned reads. Some could be aligned by BLAT although some were not. Some of ones that could be aligned by BLAT have repeat markers. It seems I do lost some true alignments. I am wondering why I could not have these true alignments using BWA… Any help would be appreciated if you have a clue!

**lh3** · 02-05-2010, 01:13 PM

try

bwa aln -n 7 -l 1000000

This will be very slow.

**jlmlj** · 02-05-2010, 01:26 PM

Originally posted by lh3 View Post

try

bwa aln -n 7 -l 1000000

This will be very slow.

Thank you so much, very excided to get the feedback from the author of this beautiful software!

I am going to try it now.

I know -n is the max number of differences (mismatches + gaps) for the whole read length, and -l is to take the first INT as seed. However, why you set INT for -l so large, like "1000000"? Thanks in advance for the explanation!

updates:
I have run your parameters for 20mins, it seems the progress is very very slow: it's been staying at the process of the first step:
[bwa_aln_core] calculate SA coordinate... (I only have 1 line for the progress)
And it's used up all 30 nodes on our cluster. So I am thinking if it is possbile to decrease a bit the number for -l...
Thanks!

**lh3** · 02-05-2010, 07:36 PM

-l 10000 effectively disables seeding. You may try "aln -n 5". But for reads with low quality, bwa may be very slow. Its algorithm is not designed for this case.

**jlmlj** · 02-08-2010, 08:52 AM

Originally posted by lh3 View Post

-l 10000 effectively disables seeding. You may try "aln -n 5". But for reads with low quality, bwa may be very slow. Its algorithm is not designed for this case.

Hi lh3,

Thank you very much for the reply! So in this test, I disable the seed, BWA allows 7 mismatches for the total 75 read length, even for those low-quality bases, am I correct?

The test has done, it took ~49hrs with 30-node cluster. However I still have results very similar to what I had in previous tests, which means I have 48% reads failed to align to anything in the human reference genome. (I counted "XT:A:U" as unique matches, and "XT:A:R" as repeat matches in the output SAM files).

The results confuse me a lot: we should have much more repeat matches in the human genome. I am trying to figure out what unaligned reads are? It would be appreciated very much for any suggetion!

**davetang** · 08-24-2010, 03:01 AM

Dear jlmlj,

I used the parameters (bwa aln -n 7 -l 1000000) and I was able to align a read that had 5 mismatches to the reference. Running bwa on the default settings didn't report this alignment. So perhaps you can try taking one or two individual unaligned reads and do your tests again? Just a suggestion, if you haven't already done this.

As a more general note, I'm new to next-gen sequencing so I'd just like to point out something I found out. When I was looking at the sam file for this alignment, the CIGAR string was 27M and that looked like a mistake to me because I knew there were mismatches in the alignment. So I looked up the documentation, and found out that the "M" can be a sequence match or mismatch. It wasn't intuitive to me, so just thought I'd point it out.

Cheers,

Dave

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

How BWA handles mismatches?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News