Seqanswers Leaderboard Ad

**lh3** · 09-13-2011, 07:39 PM

I already said even with a sensitive setting, bwa-sw may not work well. But at least with a proper setting, it should map more than 1.5% of reads.

SOAP2 would not work well. Neither bowtie/bwa-short. We also need to tune hash table based implementations. We have 1 error per 5bp in average, while these mappers typically use 11-14bp seeds: we may not find even one correct seed hit.

**gprakhar** · 09-13-2011, 10:40 PM

Originally posted by feederbing View Post

I have 101 base reads and expect up to 20 mismatches to reference. My reads are not pairs. I have tried bwa bwasw -a 1 -b 1 -T 60 but it only aligns 1.5% of the reads. And those have only a couple mismatches. I know from other tests ~ 30% should be aligned with 20 mismatches. Is this just something bwa is not designed for? What would be a better aligner? Or am I not using the right settings?

You could try Stampy,
http://www.well.ox.ac.uk/project-stampy

From the Stampy webpage,

Stampy excels in the mapping of reads that contain sequence variation relative to the reference, in particular for those containing insertions or deletions. It can map reads from a highly divergent species to a reference genome for instance.

I have used it to do Human Re-sequencing data alignment, it is very accurate, though have not tried it with divergent species.

**lh3** · 09-14-2011, 05:09 AM

So far as I know stampy does glocal alignment, but for cross-species alignment, we more like to use local alignment. In addition, stampy uses 1-mismatch 15-mer seeds, 5bp skip. I doubt this will work well for 20% divergence. I guess "highly divergent" refers to ~5-10% divergence (human-chimp has 1%).

**feederbing** · 09-15-2011, 11:53 AM

Originally posted by lh3 View Post

BTW, to map high error rate with bwa-sw, you should decrease "-T" and increase "-z" to 10 or 100. ...

Thanks, I'll try that (though, as you mention, it may still not work well).

**feederbing** · 09-15-2011, 12:08 PM

Originally posted by cdry7ue View Post

I think you should go with BFAST with the super small mask like (11111111) to find candidate local alignments.

Thanks. I have been running some tests with BFAST. I had initially posted a question about generating masks, http://seqanswers.com/forums/showthr...0855#post50855 . I've made some progress after posting that.

I think the right approach with BFAST is not to make a short no zero mask, but instead to make long masks, following the advice about the number of 1s in the bfast guide, but to only use masks with spaces. Following the guide, my masks should have 21 1s. A mask of that length with no zeros is not going to find much with 15 to 20% divergence. It might find a few things that are very highly conserved, nothing more. The mask search procedure assumes it should include a no zero mask as the starting point, I think for this problem that is a poor assumption, that mask will find little and just slow it down. I've made some progress by just using one of the longer masks with many zeros.

While BFAST is ok for speed on my test sample, and it gives me about the number of alignments I expect, the alignments are poor, with many gaps. I should be able to control this with the alignment scores, I haven't tried that yet.

**feederbing** · 09-15-2011, 12:12 PM

Thanks for all the suggestions. I apologize for not replying sooner. I missed the email notification for any posts after my reply to zee. I will try all the tools suggested, on my sample.

**cdry7ue** · 09-16-2011, 06:51 AM

Yes,
There is a way to set up a matrix of penalties for the smith Watermann step.
Also using a large mask with several zeros would mean that you are probably only dealing with substitution type changes, and not anticipating gaps.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News